ramblings on PHP, SQL, the web, politics, ultimate frisbee and what else is on in my life
back

Surrogate keys and everything you never knew to ask

I am not sure why this thread ended up on the pgsql hackers list, but nobody took any measures from stopping it and instead a very interesting and detailed discussion of the pros and cons of surrogate keys in RDBMS schema design ensued. So this thread should be of interest to anyone using an RDBMS be it MySQL, PostGreSQL, SQLite or any other alternative.

This blog post should not be mistaken as an intivation to post to the hackers mailinglist as I think most points have been raised at this point. If you do feel you have something to add please make sure you have actually read through the thread and the therein linked blog posts. Just putting in this disclaimer so that I do not get my head chopped off for posting the thread link :-)

As a quick primer: WTF is a surrogate key?

A surrogate by definition is an artificial or synthetic replacement for a natural product. So in the database world a surrogate key is often an integer id that is added to a table that already has a unique column (like a user table where you already have a column with a unique user name). Now this is quite common in most schemas and its a practice few people put much thought into. Bad bad SQL schema design .. though of course there are some cases where its ok to do so .. read the thread if you want to find out the grey thin line on this topic :-)

In other news .. it seems like my "Rübli Torte" cake project worked out nicely. Unfortunately the cake needs to sit for atleast 2 days before it may be consumed. Patience young one. Patience.

Comments



Re: Surrogate keys and everything you never knew to ask

That was a very interesting read.

I must say I was a bit ambiguous about real keys versus surrogate keys. I was agreeing with Martijn van Oosterhout in the discussion that sometime 'you just don't have a real key'. Josh Berkus gave some excellent arguments however, and they convinced me. If you only have a surrogate key and no real key, you will have a problem.

Martijn's argument basically is 'a person is not unique in any obvious way, so you have to have a surrogate key'. I very much agree to Josh's point that if that is true, then no user will ever be able to distinguish between those two persons so they *have* to have something that identifies them. But this might as well be a kind of 'personal identification number', but that's actually just a surrogate key as well.

Still, for practicality, I would opt to use surrogate keys. If only to make my joins easier (and probably faster) and keep them readable. And to prevent having to pass multiple values around in urls etc. all the time to identify the record.

Re: Surrogate keys and everything you never knew to ask

I very much remember taking over and having to debug an application where surrogate keys were used all over the place for a travel related site. So instead of using the common 2 and 3 letter country and city codes it used integer id's in all relating tables. It meant that while debugging or looking through logs you always had to join up other tables in order to know what places were actually involved. It made me think twice about surrogate keys though I do tend to employ them fairly often just to get a more common structure in all my tables for clarity.

Re: Surrogate keys and everything you never knew to ask

Rüeblitorte... mmmmh :-)

Re: Surrogate keys and everything you never knew to ask

I very much like the quote on the blog mentioned:

"I am not saying that you should avoid autonumber surrogate keys like an SCO executive."