ramblings on PHP, SQL, the web, politics, ultimate frisbee and what else is on in my life
back

Circular master-master replication

One interesting solution to scalability and HA is to implement a circular master-master replication setup directly on the frontend servers. Obviously there is a reasonable limit as to how many servers you can have in such a ring, since the lag with which changes propagate will increase linearly as you add more servers to the ring. However according to an article by Giuseppe a 10 server ring is reasonable. Especially if you make your sessions sticky on a per frontend server basis, you actually have an elegant solution against the old issue of replication lag, where a user does not see changes he has made in subsequent requests, since his changes were written to a remote master server which have not propagated to the slave he is reading in the subsequent request. Latency should also improve as you do not have to open up a remote connection in the user request.

If you read through the above mentioned article you will also see that setting up circular replication is not even all that hard with MySQL thanks to the auto_increment_increment/auto_increment_offset settings. With these you can set this kind of thing up with essentially no changes to your application. You can keep adding new servers as you go until the lag of replication gets beyond acceptable limits. I guess one disadvantage is that if you have servers with different write performance, you will end up increasing the lag, where with a single master you would only need to improve write performance on a single master in order to reduce replication lag. Overall this approach still seems appealing especially since even with just two servers you get some fail over but with essentially the same configuration as you will be using as you add more servers.

What I am wondering though is if you have an chance of connecting multiple rings so that you can replicate between them, optimally in both directions. For example if you want to have one ring per data center. I guess you can easily replicate in one direction, but is there a way to do 2-way master-master replication between 2 or more rings?

Comments



Re: Circular master-master replication

There is this small problem of UNIQUEs and UPDATEs when you have more than one writer.

srv2 execute UPDATE tbl SET name = "foo" WHERE id = 1 and and srv7 sets the same to "bar" at the same time. Both push it into the replication. When everything is propaged srv2 will have "bar" and srv7 will have "foo".

UNIQUEs are more obvious: Create a user-table, username is UNIQUE, 2 parallel INSERTs into the servers, both are fine, replication goes around, *boom*.

2 way replication fails as long as a slave can only have 1 master.

Re: Circular master-master replication

Good point. So as a result when writing to a UNIQUE column one would need to fall back to using XA?

MySQL Master-Master Replication

Lag can be a big issue if you have data that must be displayed sequentially. Even if you bind front-ends to specific backends you can get data out of sync. I experimented with this a couple of times and found that poor SQL can really cause headaches.

For example, an extended table lock can jam up the entire system such that you get a backlog of information. The user posts some data only to find on the next log another 12 posts come in.

Instead of circular replication, I often work with the developers to break up the data and deploy multiple master-masters with read-only slaves hanging off of each one. This approach is how livejournal scales out. They realized they never needed to do a JOIN between two different user accounts, so there is no reason to have a huge database. Just run a bunch of master-master pairs and use another database to keep track of where the accounts are located.

Even with master-master on GigE links and fast servers, we've seen issues with time-stamps and other updates. In some cases we were able to ignore those errors but in many cases the routines had to be re-written to get master-master to work well.

Re: Circular master-master replication

The lag issue is well known and it seems we all agree on this. What hasn't been mentioned is the performance impact. Master-Master ring does allow for more writes, more connections, and some form of HA, however, you've essentially doubled (more like 1 1/2 times due to replication thread being single threaded and serialized ~ this contributes to the lag) the load on each master as it's now a master and a slave. For the most part this only affects the resources, ie. disk i/o and memory allocation.