ramblings on PHP, SQL, the web, politics, ultimate frisbee and what else is on in my life
back 1  2  »  

Deploying app updates to a cluster

So William was asking on twitter how to best deploy symfony apps to a cluster of servers. There are actually some nice deployment tools inside the symfony cli that ease deployment to a single server, but that doesn't really cover the cluster case. Actually I assume that if you have a cluster of servers the best deployment strategy should probably be optimized against your specific use case. But let's make this question a bit more general: How do you deploy updates to your PHP apps running a clustered setup? What architecture do you pick? How do you keep the site running with as few limitations as possible during the update? How do you distribute the new code? How do you clean and prime your caches? How do you handle DB changes? How do you ensure that the DB and code changes do not get in the way of each other?

Obviously the choice of RDBMS can play a big role here. MySQL likes replication setups, so I can envision making the site read only, disabling replication, update master schema, take a slave out of the load balancer, update code, reenable replication to that slave, repeat. With PostgreSQL, Oracle and RDBMS like that you tend to scale with just getting a bigger DB server (and having a warm failover server). Guess that simplifies things a bit (especially since PostgreSQL even supports transactional DDL, which should help with timing the deployment of the DB changes). Or maybe just make sure that DB changes never cause BC breaks.

Anyway, I am looking to just collect approaches here in the comments. So please ideally post links to slides or detailed blog postings below. I am sure this could become a very useful link collection for the PHP community. Thanks :)

Comments



Re: Deploying app updates to a cluster

User (eg developer or tester) makes sure all DB-patches are run (we never have patches that breaks BC so this often happends days before code deploy). Our DBMS MySQL, vertically scaled. The only problem is running DDL-changes on large tables, but this can be solved by turning off replication in a master-master setup and running the DDL-change on the passive server, after which you turn replication back on switch active-passive.

To deploy code user (eg developer/tester) runs shell script on "primary" php/apache server.
- Script updates local build-folder with phing script and supporting tools from SVN.
- Script runs phing target "deploy". phing...
... asks which branch and revision to deploy
... creates new folder with date and revision in name
... runs SVN clean-up, switch, update on local repo.
... exports from local SVN to created folder
... saves xml metadata of SVN-repo for later use
... creates static CSS-files using csscaffold through php-cli (using custom application bootstrap)
... merges JS-files using php-cli and application config
... removes debug/log statements from js using sed
... minifies JS+CSS using yuicompressor
... arms filecache/memcache (using realpath in memcache key where there could be conflicts)
... rsync dir to other php/apache-servers (ALOT faster than running entire deploy script on every server)
... attach static/log -dirs using ln (on all servers)
... set rights to dirs/files using chmod (on all servers)
... link webroot to dir (on all servers)
... clean APC on all servers using php-cli (we use apc.stat=0)
... arm remaining caches by requesting important pages (not a biggie)

There are obviously alot of code behind this, both in the application to support js/css-wrangling and cache arming, but also as phing-targets. The perhaps most useful custom phing target is one that runs phing-targets on remote servers. All network communication is over SSH using password-less key-login.

Sometimes we need to clean the Varnish cache also, but this is rare enough to do manually.

Regards,
Patrik

Re: Deploying app updates to a cluster

PatrikS sums it up pretty well, so I will just write about what I feel missing:
The deployment procedure is highly dependant on the site profile.
As the CAP theorem states, you can't achive the following three guarantees at once:
- Consistency
- Availability
- Partition Tolerance

So if your system isn't partitioned, you can be consistent and avaiable at the same time (but your response time can vary if you have to do online schema modification).

If you can risk inconsistency(key clashing or serving old/invalid data for a short amount of time), you can do partition (replication/sharding) and availability at the same time.

If you can sacrifice availability, you can achieve consistency and partition tolerance at the same time (site goes to maintenance/read only, and you update your db.)

Depending on the use case, the best solutions imho the
- master-master replication: you can achieve HA and online schema migration PLUS the chance to revert the db migration if something went wrong (failover the original primary master) .
- master-slave replication: partial HA(ready-only on slave) + online schema migration.

About the www nodes:

If you have load balancer with failover backup in front of your www nodes, you can easily manage "atomic" code updates:
- you remove some www nodes from the active loadbalancer's pool.
- you add them to the passive lb's pool.
- update the code on the passive nodes (rsync can work, but I prefer svn export, and I store the last N revision on the www-node, so in case of revert, I can just switch the vhost's documentroot's symlink to the working revision)
- failover from the active lb to the passive lb
- if everything is ok, then update and move the www nodes from the now passive lb to the active, if not, then failover back to the old revision.

The DB migration should go before the code deployment, if possible (do backward compatible shema changes), if not, then you have to go maintenance(if you don't have replication) or read only (master-slave) or if you use Multi Master, you can do it on the passive Master(ensure that the active don't start replicating the schema changes yet) and when you failover the load balancer you have to failover to the passive Master(and if everything seems fine, you can start the replication on the now passive Master)

For the DB migration I like the ruby-on-rails way:
http://guides.rubyonrails.org/migrations.html
- Your db migration is code not sql statement.
- You write your up AND down method.
- When you change revision the deploy mechanism apply your up (or down, depending on that you up or downgrade) methods between the current and the target revision.

For the js/css minify you can use a smarty pre/post-commit hook, so you don't have to do magic in you deployment procedure, and you can check/test your code against the minified version (I got caught in the past with some bug, where the minify script screwed our js script, and it was a PITA to debug)

You should watch out for your cache layer(memcache, filecache, varnish, etc.), you should either rebuild the whole, or create some smart invalidator.

I'm sure I missed some points, but this is my 2 cents.

Tyrael

Re: Deploying app updates to a cluster

I use Fabric for deployments. It works well because I can address multiple types of environments (testing, staging, production, db clusters, etc, etc) and assign as many servers as I want to the clusters. Then it's just a matter of getting the files onto the server and changing permissions, etc. For that, I store read-only git repositories that I then pull the latest tag. I've also been known to use rsync for some projects.

When it comes to keeping things running, it really depends on the update. Sometimes I'll set up a maintenance page through the script, or other times it's something that can be hot-swapped for the user. For large projects, I can keep 100% uptime by using a Blue-Green deployment architecture. This means that I can move public traffic to the Green servers while I update and verify the functionality of the Blue servers, then switch once everything is good. This doesn't work too well when deploying database changes.

When it comes to caching, sometimes I let the users warm the cache through natural use of the site. If I can't afford that, then I'll create a script (with an associated config) that will hit the application behind the scenes to warm the cache. This can be done really easily with a Blue-Green deployment.

For the database, it's tricky. When it comes to schema changes, I use DBDeploy to create up and down deltas. I haven't found any particularly effective way to deploy to production without having to throw up a maintenance page.

If it's just record changes (read: apps that store configuration in the database; Magento or Drupal) then the solution isn't too bad. It takes some development time (or the Enterprise Edition of Magento) to create a staging system, which stores each change to the database as an event. When you execute the staging deployment, it will go in sequential order to update your database. This kind of system also becomes tremendously happy when your client needs to edit a bunch of content on the staging environment prior to deploying live.

For a completely different solution, I've worked on projects where we just deployed to a single master application server and then used a bash script to rsync the code to the mirrors. It works quite well for smaller deployments where you only have ~15 (or less) app servers. It's easy and cheap with an acceptable lag time on mirror updates.

I hope that at least gets you some ideas!

- Rob

Re: Deploying app updates to a cluster

App codebase lives in a NFS mount. The site maintenance switch, database update, and SVN update commands are scripted in a shell script so that once it starts it runs without a break. Updates usually only take a few seconds during low traffic periods, and few (if any) users will notice the maintenance page.

Re: Deploying app updates to a cluster

RSYNC+ANT

Re: Deploying app updates to a cluster

If we are speaking only about source code (not database updates), The solution I use is the following one:
I use mercurial (hg) as central repository in an integration server. Whenever a commit changes to the integration server I pull changesets + update local copy from integration hg to clustered production servers with a simple bash + ssh script. It's a pretty straightforward solution

Re: Deploying app updates to a cluster

"App codebase lives in a NFS mount. "
In my experience NFS was a bottleneck, and can cause weird errors (locking problems, nfs server disconnects, but the file system not noticing, so every process hangs/zombified)
I would like to hear your experience.

@Sebs
@Gonzalo Ayuso
How are you guys guaranteeing that the upgrade is atomic (the modified files are updated at once)?

Tyrael

Re: Deploying app updates to a cluster

I never set this up, but I'd prefer using DRDB over NFS anytime, as a possible downtime of the file host won't affect the servers. One less thing to worry about.

Re: Deploying app updates to a cluster

Via twitter I was pointed at http://www.eschrade.com/page/deployment-rsync-4c2130a7

There does seem to be a common theme here, one that I also follow in the projects I do.

1) DB changes are the hardest, but ideally you can keep things BC, that takes the pain out of DB changes
2) deploy code in one step, flip the switch to the new code in a second step
3) deployment should clear caches (this includes end users browser caches for static assets aka give all static assets truly static names, changed assets lead to changed urls) and prime caches if you need
4) plan for rollback and hotfix scenarios, but also for scheduled maintenance
5) script the process, but make sure you give sufficiently verbose output

1  2  »