ramblings on PHP, SQL, the web, politics, ultimate frisbee and what else is on in my life
back

Migrations: The answer in my head

Avid readers of my blog know that I have been pondering how to best approach schema evolution. I ported the schema management from Metabase to PEAR::MDB2_Schema. I even gave a talk that admittedly only did a good job of defining the problem and solutions that all suffered from severe limitations. Now for the first time I am starting to feel somewhat good about an approach to migrations. I wrote a post to the Doctrine developers mailinglist detailing the key pieces that I want to add to Doctrine's migrations. The idea is to use the migrations approach made popular by Ruby on Rails but improve this concept with several key additions.

First up, there is one method for both "up" and "down". The idea being that the provided API is capable to understand that a "createTable" in "up" mode is a "dropTable" in "down" mode. Obviously this requires a fairly complete API to manipulate the database. Since its impossible to support everything under the sun, people can of course write a little if-statement to handle the cases that cannot be magically resolved. Also I will probably also add SQL-string-based alternative methods. This will allow something like:


<?php
function upDown($upDown) {
  $this->createTable($upDown, 'foo', '(bar integer CHECK (did > 100)');
}
?>

This would be enough information in order to create the table in "up" mode and drop the table in "down" mode. Normally Doctrine migrations would expect an array defining the fields and another one defining table options. To enable the above I just need to check if the parameter is an array or a string. This should make it possible to cover a lot more migration scenarios with automatic "down" support. For all other cases people can of course write straight SQL, though in that case they have to deal with the "down" side of things themselves. By checking the database driver used, one can also make the migrations portable or tweak a specific RDBMS with some special DDL.

The next thing I think is critical is to support some kind of automatic way to generate initial DDL to create a database for a new installation. I use the sfDoctrinePlugin from symfony has a frontend to Doctrine to be able to manage my schema in the YAML format. Now the next obvious thing one would wish for is to get automatic migrations when modifying the schema file. I will solve this by adding a method that can compare to schema files and generate the basic migration code. This should minimize the amount of manual work one has to do.

Another thing I will add to the migrations is a paranoia mode which will dump any data before doing modifications. Hopefully most of that can again be automated. So before dropping a column, all data from that column would be backed up. Now when reversing this, it could automatically be restored. This is necessary because only very few RDBMS support transactional DDL. PostgreSQL is one of the few. Not even Oracle supports this let alone MySQL.

Next up is adding support for migration parameterization. The application we just took over that I mentioned in my previous blog post has another tricky spot. A lot of configuration of runtime code has to be done inside the database on a per page ID basis. As you might have guessed the page ID is generated via autoincrement by creating a page in the admin tool. Obviously the ID can and will be different depending on the server the page is created on. As a result the migration script will need to be parameterized. Therefore I will introduce a two pass execution model. In the first pass all migrations that need to be executed will check if all their required parameters are defined inside an external configuration file. Only once all migrations report that they are ready to go, will the actual migration process be started. This could also be used to work around the lack of schema support which requires prefixing tables etc.

I think with a solution as described above one can always generate an initial DDL SQL file from the schema definition. Migrations would mostly be generated. Even when hand writing one would only need to write a single "up" script and get the "down" for free. People that prefer to write out RDBMS optimized SQL would also benefit at least a little bit. The scripts can be parameterized for added flexibility. To top it all off one would get a poor mans rollback for DDL. What do you guys think? Seems like a great project over Christmas during which I will probably also resync MDB2 code that was forked inside Doctrine.

Comments



Re: Migrations: The answer in my head

As Captain Picard always said: go ahead :)

Sounds like a really great enhancement of the migrations support. I always dreamed about a way to automagically get migration instructions by comparing two schemas. Although I still don't use YAML schemas, this could be a good reason to start using it :)

BTW: I'm thinking about creating an installer plugin for symfony so it could create a 1-2-3 installer script for any symfony project. So this migration enhancement could also be used by this installer plugin. :)

Re: Migrations: The answer in my head

What's been on my mind lately is how to easily keep track of what the schema looked like at certain times -- schema change management.

The migrations framework is great, because it captures the actions that change the schema. However, it fails where the MDB2_Schema shines -- by version-controlling the actual schema, you know exactly when the schema changed, and what was changed about it.

Of course, you don't have the actions, and aren't capturing the actual commands used, which could be useful if you want a centralized deployment of the schema change (whether the deployment is to 100 machines at the same time, or, say, 10 QA machines one day and 90 production machines a week later).

So from a change control management standpoint, neither solution is particularly great. :-\

Re: Migrations: The answer in my head

I've always thought too much time was spent on web frameworks trying to abstract away the sql for schema migrations. In rails I typically recommend using only sql for your migrations, and doing both DDL and DML management within that migration. You may not get the down for free, but it typically isnt that difficult to generate it either. You can then tag database versions by manipulating a "schema metadata" table, that keeps a list of different schemas in it so you know where you stand. (An alternative method, using comments on the database itself to "tag" schema releases, is another option)

For managing full schema snapshots, you need a good schema dumping tool, that will do deterministic dumps (most standard dump tools don't live up to this), so you can do a combination of dump/svn and use standard diff tools to track the difference. If you have that, you should be able to take a version of the database, run the sql migrations against it, and then use the dump tool to create a copy of the full schema snapshot, if you can't then you have to isolate the differences and resolve them.

Honestly for many database systems that much work just isn't worth it, but I have worked in the type of system where you have disparate applications running that will only work with a specific version of the database schema, so you need to go through the effort. What would be nice is getting tools to ease all of that process for the DBA first, and worry about hiding the sql for application developers later, IMHO.

Re: Migrations: The answer in my head

Well straight SQL fails when you need some logic. The example of parameterization already illustrates this to some level, though I guess you can also define parameters in SQL, but validating that they are actually set will be a bit more tedious. But in some cases it might simply not fly. Also the idea is also to get downgrade scripts for free, which I think is not a must, but nice to have. Furthermore the generated SQL is already capable of handling FK creation and dumping in the proper order. Anyways I have started to be convinced that ORM's are the way to go and anything else is premature optimization. So if I configure the model I might as well get something out of it for free. Like a initial DDL and pregenerated basic migrations.

The key thing that I would like is to get some simple audit tool to register all DDL and manipulation queries so that I can easily figure out what I need in my migrations. Then again weeding out the right manipulation queries to include will not be easy unless that logging facility can somehow differentiate. I guess doing the logging on the client side (for example connecting your phpMyAdmin through the MySQL Proxy) to do the logging could be one way, since that way you could differentiate between DML that was as part of DDL changes via phpMyAdmin and the DML that was run as part of testing inside the application, eventhough both DML went to the same server. Another alternative would be using your ORM to add some comment inside all generated DML statements so that they could be weeded out from a general server log could also work.

Before you can post a comment please solve the following captcha.
your name