ramblings on PHP, SQL, the web, politics, ultimate frisbee and what else is on in my life
back

Flexibility and raw speed vs. reliable performance

I was talking to a friend at a party (I know it sounds pathetic, but we also talked about non work related stuff) about a java banking application he is working on. The application needs to run with whatever RDBMS is already installed at the client. He noted that they have the most trouble with IBM DB2. He did speak favorably about Apache Derby aka IBM cloudspace, but then again he actually likes java, so go figure ;-)

One of the issues is that appearently DB2 is so configurable that it becomes very difficult to be able to determine if the queries will even run at the clients site. I guess you can configure all sorts of buffers and thresholds that if set too aggressively will simply prevent their application from running at all. Now you could say thats the DBA's fault. But the fact of the matter is that they are trying to sell an application and it does not help things if this requires the DBA to change parameters that affect other applications. I guess the point is that there is something like too much flexibility of you also want portability across different installations of the same RDBMS.

Now I am getting to the topic of a previous blog post, where Dan (DB2 lover by heart and profession) posted a comment about DB2's greatness. An issue they also have run into, is that they in fact are not getting robust query plans from DB2. For example if they delete/add a few thousand records performance could actually slow down to a crawl for quite sometime for any queries on that table. Manually running runstats fixes the problem. To make matters worse there is obviously no JDBC API call to do this.

The only solution they came up with, and that IBM acknowledged as a good idea, is to use a stored procedure to be able to execute runstats. So while its nice that you can automate runstats it does not necessarily solve your problems because it means that there is a time gap where performance might be horrible until runstats finds the CPU cycles to update the stats.

Reading Ulf's comments on MaxDB is that they take a different approach and actually do all necessary rearranging immediatly. While this might not give them the best possible performance, it does mean that they can offer a constant level of performance. A feature especially important for 24/7 shops. So you could say they are predictably slow, but thats worth alot!

As such I also recommended my friend to more closely look at the execution plans their queries generate to make sure that the given query plan performs equally fast in all possible real world scenarios, even if that means a slow down in some situations.

Comments



DB2's greatness :)

Hmm. First of all, as of DB2 V8.2.2 (aka V8.1 FixPak 9), it is possible to run RUNSTATS through the ADMIN_CMD stored procedure: http://publib.boulder.ibm.com/infocenter/db2luw/v8/topic/com.ibm.db2.udb.doc/admin/r0012547.htm So your friend could simply call that stored procedure through JDBC every time they issue DML -- but I think that would be quite insane.

I stand by my previous comment, of course: DB2 Version 8 offers autonomics -- autoconfiguration for things like buffer pools, agents, all the options you mentioned, based on your system and the profile of your application. And the runstats feature can be part of the automatic database health maintenance activities. It's actually not at all hard to set these.

Enabling automatic runstats collection:
http://publib.boulder.ibm.com/infocenter/db2luw/v8/topic/com.ibm.db2.udb.doc/admin/t0011763.htm

Configuration Advisor:
http://publib.boulder.ibm.com/infocenter/db2luw/v8/topic/com.ibm.db2.udb.doc/admin/c0007586.htm

However, if having the system automatically run runstats on a regular basis won't solve the problem, I suggest your friend get look into creating a materialized query table (MQT). This is essentially a physical view: you create a table based on a SELECT statement, and every time one of the underlying tables is modified via DML, the change goes into a staging table. If you create it as a "refresh immediate" table, then the changes are immediately visible in the MQT at the cost of some locking; if you create it as a "refresh deferred" table, then the changes will be visible in the MQT after you issue the REFRESH TABLE statement. Oracle offers the same sort of feature, if I'm not mistaken.

An intro to MQTs:
http://publib.boulder.ibm.com/infocenter/db2help/topic/com.ibm.db2.ii.doc/ad/ciiperft.htm