ramblings on PHP, SQL, the web, politics, ultimate frisbee and what else is on in my life
back

PostgreSQL getting with the program

PostgreSQL is a sleeping giant that is waking up. And instead of wondering around sleepy, they seemed to be jumping forwards in what seems leaps to the other guys, but are just natural steps for them. Heh, I seem to be in a dramatic poetic mood today, but I just wanted to get across with how impressed I am with what is going on with the PostgreSQL community, ever since version 8.0. To me PostgreSQL 8.0 was so critical since with it one of the key obstacles to more wide spread adoption was removed: There was finally a native easy to install version of PostgreSQL for windows. Not that I know many people that deploy on windows, but I do know a ton that develop on windows (which included me back then).

Anyways, since then they have been adding more and more features at a pace that is mind boggling compared to what MySQL seems to be doing with much greater ressources. They are opening up as the same time too, which only seems to be accelerating their development pace or at the very least their potential for adoption due to greater transparency towards their end user base. I mentioned this already in a comment to Baron's blog.

Now one of the issues imho that plagued PostgreSQL was the attitude that databases are for DBA's who should rule data land. That developers should ask their DBA for permission, wisdom and hand tuned queries etc. That data integrity always comes first and that generally priority is flexibility and that the core product should stay as lean as possible. While all of this is probably a good idea in a perfect world, it did not fit the real world that well. Many of us do not have DBA's to talk to, or they are busy with other stuff of they have grown grumpy because they have to admin Oracle and DB2. Things can and should be easy if its possible to make them easy. A native easy to install PostgreSQL distro for windows is one example of this. Adding non standard stuff like LIMIT and ENUM is another example. Being able to look at the planned roadmap on a wiki is another example. Autovaccum was obviously another big step. Looking to add automatic generation of PostgreSQL's own rule syntax to get updateable views with no additional effort is another.

Now something that might even be bigger than all of the above (with the exception of the windows port) is the addition of native replication. While some people in the MySQL community have already begun questioning replication, a lot of us are feeling "not so fast"! PostgreSQL has long maintained that replication is not a one size fits all thing and adding all known approaches would be way too much bloat in the core distro. Now for 8.4 they have announced plans to add simple replication (cold stand-by only) with 8.5 bringing read-only slave capabilities out of the box.

Now you might be thinking, 8.3 was only released in February 2008, 8.5 will come out in 2012 at the earliest. You are thinking in MySQL terms. 8.0 came out in early 2005, 8.1 late in the same year. One year later in late 2006 8.2 followed. Now look over the release notes and notice that none of these release skimmed on feature additions. Yet MySQL in the same time frame managed to release 5.0 in late 2005, after having release 4.1 a year before. Since then we have been waiting on 5.1 (wtf is a "near-final release candidate"), while features like internal support for foreign keys will not appear in 6.0 not until 6.1. Thats scary.

Now something that I am not sure about, but I keep hearing is that PostgreSQL still needs more manual tuning before getting good (good as in not optimal but enough for most shops to get by well without having to go out and hire a DBA) performance. MySQL is more known (is this a euphemism for a "urban myth"?) to not require so much manual tuning (though of course if you do manual tuning, you will have to end up tuning all of the storage engines you use separatly). I wonder if open sourcing EnterpriseDB's DynaTune would solve this (or atleast help in debunking the myth with a shiny placebo) and if the growth that PostgreSQL might see as a result would offset EnterpriseDB's loss in terms of a differntiator of their proprietary offering ..?

UPDATE [27/06/2008]:
Use better URL's to summarize new features in MySQL and added links or 5.1, 6.0 and a 6.1 outlook.

Comments



Re: PostgreSQL getting with the program

"MySQL is more known (is this a euphemism for a "urban myth"?) to not require so much manual tuning (though of course if you do manual tuning, you will have to end up tuning all of the storage engines you use separatly)"

Any software deployed on any serious scale demands tuning. We can close our eyes and pretend it's not so, but that does not change the fact of the matter.

"I wonder if open sourcing EnterpriseDB's DynaTune would solve this (or atleast help in debunking the myth with a shiny placebo) and if the growth that PostgreSQL might see as a result would offset EnterpriseDB's loss in terms of a differntiator of their proprietary offering ..?"

Have you used this product? Was it worth it's money?

Roland Bouman

Re: PostgreSQL getting with the program

Requires tuning == employment opportunity.

I agree with you about the progress of PostgreSQL. It is very impressive. I am curious whether the existing trigger-based replication schemes have a lot of overhead. I would guess that they do and I will also guess that benchmarks are usually run without them enabled. Non-trigger based schemes should make the cost much less.

Re: PostgreSQL getting with the program

locking mechanism is better than oracle(row level locking), but using log file to maintain transaction record is slower and disruptive when vaccum is begin to start....not suitable in high OLTP....

Re: PostgreSQL getting with the program

Several years ago, our company needed to migrate off MSSQL7.

At the time (about four years ago) , we were using features of MSSQL that did not exist in MySQL but did exist in PostgreSQL. We've been happily using PostgreSQL ever since, moving from 7.4 to 8.1. Some great features of PostgreSQL are transaction support, enforced constraints, delete constraints, row and statement level triggers and user-defined functions. It's also been convenient that whenever we seem to hit a feature wall in the DB, there is a new Postgres version out that supports what we need ;)

Over the years, many of these features have been added to MySQL yet they still feel more refined and stable in PostgreSQL.

I'd be quite interested to hear how Firebird compares to PostgreSQL as it was a consideration when we chose databases but I've really not used it at all.

Cost of trigger based replication

Mark:

I can't speak for other systems, but the cost of Slony is definitely not zero: it's a measurable drag on performance (generally at least a constant 5%), and if you get into trouble, it can cause some very serious performance problems.

That said, part of the reason for this is the tremendous flexibility Slony offers. For instance, you can replicate only some tables. You can replicate different sets of tables to different target systems. You can replicate some tables from server A to server B, and different tables from server B to server A. You can use Slony to solve the dreaded PostgreSQL upgrade problem (add a replica with your new database version, then when it's time to upgrade, switch the replication direction. You get to fall back this way, too, so your upgrade is less dangerous).

Slony also handles chained replicas very well: if you replicate from A to B and B to C, and B fails, you can just tell C to talk to A, and it all works seamlessly. Well, as seamlessly as the rest of Slony.

Slony is very strict, so that it cannot lose data. (The downside of this is that DBA errors are really, really painful.)

It allows you to do nifty tricks on the target systems, such as storing a trigger on the target table so that you can reshape your data on the replica for data warehouse purposes.

So, if you need that kind of power, and want all these bells and whistles, Slony is probably worth the performance cost because of its flexibility. If all you want is some safety that comes from having a standby, use the built-in Point in Time Recovery features of Postgres. If you want simple replication to another node where you will perform some queries, then something like Londiste ot Command Prompt's [disclosure: my employer] Replicator will probably be better for you. Also, if you want ease of administration, look for something else: Slony's user-space tools are abysmal.

sid:

Oracle uses a different kind of MVCC, not row level locking. They use a logfile, too: that's what the rollback segment is. PostgreSQL's MVCC is very similar to innodb. Vacuum can be expensive, but the 8.3 release has made massive improvements in this area, so if you have only looked at previous releases on this topic, you probably need to look again. HOT vastly reduces the need for vacuum in several cases, and the autovacuum daemon now is smart enough to kill itself if your system is under load. But there are cases where vacuum is still a painful problem, I freely admit. Still, moving the "garbage collection" out of the main transaction path and onto another process is, in my opinion, an excellent strategy.

Before you can post a comment please solve the following captcha.
your name