ramblings on PHP, SQL, the web, politics, ultimate frisbee and what else is on in my life
back

So whats the deal with gap locking?

So the other day I stumbled over the slides of the InnoDB talk at this years MySQL users conf and I noticed "gap locking" somewhere in the middle. I have never heard of "gap locking" so I was quite intrigued by what that might be. From what I understand its InnoDB's solution to implementing REPEATABLE READ (though until MySQL 5.1 it seems this feature is also enabled for READ COMMITTED). I guess its a fairly unique approach and from my current understanding to be feasible it expects short running transactions, which luckily are quite common in web applications that most of us care about.

The problem InnoDB is trying to solve here are situations where overlapping transactions cause phantoms. Now the first surprise was when I read that gap locking is implemented by locking the relevant ranges on the index or more particular this statement: "Every SELECT, UPDATE, DELETE in InnoDB uses an index to find the rows to return or to operate on". I am not sure if I truly understand the full implications of this, since obviously not all columns can be expected to be included inside an index. I do know that InnoDB has an implicit primary key if not stated explicitly, but in order for the gap locking to work I guess it requires an index on the actual columns you are doing the gap lock on!?! Maybe this just means that gap locking only works for columns that do have an index, though I do not see this mentioned in the slides.

Now in some cases you may not appreciate the behavior of forcing INSERTs etc to wait until all the gap locks are released, especially since even a transaction that caused a gap lock may still have to wait, since its not an exclusive lock! There is an option to disable this called "innodb_locks_unsafe_for_binlog". As the name implies if you are doing replication you might run into problems if you disable this. Again as I stated earlier according to the presentation once MySQL 5.1 is out gap locks will be removed from READ COMMITTED since with 5.1 you can use row-based logging to work around phantom issues in replication setups.

I would appreciate it if someone could shed some light on the open question on if gap locking is limited to explicitly indexed columns. I would also be interested in hearing if other RDBMS also implement this approach.

Comments



Re: So whats the deal with gap locking?

1. Without an Index there is no gap. Gaps are based on a order of the values, without the index this order would have to be generated at run-time and would have be consistent between 2 queries.

2. Without an Index you have nothing to put the lock on. This is important for queries like: SELECT * FROM tbl WHERE id > 100 FOR UPDATE in cases where we want to safely insert id = 101 later. Next key|Gap Locking is locking the "next"-pointer index-record, not the actual values.

Re: So whats the deal with gap locking?

Gap locking works to prevent phantom rows (a phantom row being an inserted row that matches the WHERE condition of a previous SELECT) basically for 2 reasons:

Firstly, an INSERT must update all indexes. So a gap lock on any index prevents the entire row from being inserted.

Secondly, if no index is used in the SELECT, then a table scan must be done. In this case InnoDB will lock the entire table, which will prevent all INSERTs.