I spotted an interesting blog post over on Planet OSDB about prepared statements gotchas. It illustrates very well the issue that prepared statements have been plagued with since their inception: namely that they can severely hurt performance (even though they are considered to improve performance by most people).
Just briefly: Why do we even care for prepared statements? For stateless web applications the benefits are mainly protection against SQL injection and better readability. In some rare cases in theory also better performance if the same statement is executed multiple times in the same request. The disadvantage is that for most web applications queries are rarely executed more than once and therefore the separation of parsing/planning and execution just means that there is an additional roundtrip to the database. Also in very long running database sessions the prepared query plan could get outdated, which could also spell disaster for performance. or in other words: prepared statements are no silver bullet.
The problem described in the above blog post is of course since the idea behind prepared statements is to separate the parsing and planning stage from the actual data that is to be used in the WHERE clause of the query. Modern cost based optimizers keep detailed statistics about data distribution, which is difficult to leverage if there is no data available during the planning stage. This blog post however focuses on another aspect that is not yet relevant to MySQL, since there is no support for partial indexes in MySQL just yet. However RDBMS like PostgreSQL do support partial indexes. Again since the data is missing during the planning stage, the database is unable to determine if an existing partial index matches.
So what can be done to fix this if we do want to stick with prepared statements? On the user side one thing one can do is move the relevant filtering data into the prepared statement itself. This of course means that you loose the benefit of the prepared statement for the relevant pieces of data, but it might be an acceptable balance. Of course if this data contains user supplied data, you now need to manually make sure that the data is safe.
The other approach is simply switching to emulated prepared statements, which most database layers support (like PDO). This also fixes the issue of adding a second round trip to the database for every execution. However it means you put yourself at the mercy of the parser embedded in the database layer you are using. Actually in the case of PDO there is little chance to get around that anyways. Worse yet, the PDO parser is known to be buggy by design.
On the database one approach that has been taken by Oracle since I think 9i is to delay the planning stage until the first execution. The query plan is then optimized for the first data set. I do not know how this works in detail, as I presume that if the plan uses a partial index as a covering index and the next data set is not contained in the partial index, that this approach would fail. However it gets the job done for people that just want to use prepared statements for one shot execution to get SQL injection protection and better readability.
Another approach that I do not think any RDBMS has taken (probably because its either hard or not feasible .. or both) is to keep multiple query plans around. In this scenario in the initial planning phase the database would also store some simple parameters that define if an existing query plan will match or not. If the plan does not, then a new plan is generated, while the old plan is kept around. This way a single prepared statement could eventually keep multiple query plans around. This would also solve the issue with partial indexes being used as covering indexes.
Yet another approach, which would be less interesting for short running sessions as are common in PHP applications, would be that if the query is executed and the performance is slower than expected, that this then triggers a replanning. Potentially also keeping the old plan for old data sets. This would also solve the issue of long running sessions having out of date plans.
Overall it seems for stateless web applications the best approach is using a database layer with good prepared statement emulation support. I hope that PDO, will one day fit this criteria. Until then the MDB2 parser seems to do a much better job. Though MDB2 is of course implemented in PHP, which carries its own performance overhead and is not E_STRICT compliant due to the PHP4 compatibility.
I use MDB2 with mysqli and prepared to avoid sql injection stuff.
Your last paragraph talks about using MDB2 as a way to avoid the performance penalty, but I believe that by using mysqli in MDB2 "real prepared statements" are used, with the problems mentioned.
Is this true?
What's your suggestion to solve this problem?
What you need to do is enable the prepared statement emulation:
This can of course also be passed in the options array when creating the MDB2 object.
Also note: There is an additional penalty with MySQL until version some version of 5.1, which also means that the query cache cannot be used at all with prepared statements.
At least in Oracle, as I am sure you know, but maybe other rdbms too, using prepared statements has other side benefits, such as easing the load on server RAM, as two queries that are syntactically the same (byte by byte) will be recognized as such by the parser and the first prepared statement execution plan will be reused by the second query, even if the second one is executed on a separate php page (it is kept in the rdbms cache). This will not happen if the selector parameters in the where clause are hardcoded in the sql...
Actually I was not aware of this particular cache. But these kinds of caches always hold the danger of out dated plans if you have a lot of DML going on that radically change data distribution.
I don't think that this kind of guideline could be a good general advice. It depends a lot on the current application and more on which database do you use. I worked on sites (70-80 million hit/day) where one of the database server was MSSQL. We always thought about using prepared statements but always had some other most important todo. Then once an engineer from MS came to audit the database and one of his most important advice was to use prepared statements while MSSQL is really optimized for this type of usage. One of my question was that it is not a problem that the query plan is generated, but table statistics change in time, and maybe this plan is getting outdated, and he answered that MSSQL is watching how the statistics are changing and it is seems noticeable it recreates the prepared statements. So for example with MSSQL there is a lot of advantages to use them and after this audit we made some test and it was not rare to get 20-30% performance improvement running the same query in prepared statements. And I can imagine that Oracle could have some smarts things too with prepared statements.
And I am not fan of this RDBMS, we use MySQL at ustream.tv, but that was my experience. Like in other areas in informatics it is not easy to say general rule of thumbs, the best is to test more solutions and choose one more appropriate for you.
(probably because its either hard or not feasible .. or both)
is to keep multiple query plans around.
Oracle 11g does exactly that.