ramblings on PHP, SQL, the web, politics, ultimate frisbee and what else is on in my life
back

Hints, table stats and the all knowing DBA

There was a brief discussion on adding ways to hint selectivity in #postgresql this morning. Often the database has trouble figuring out a proper plan for a given query. One solution to this problem is to let the RDBMS analyze the data in the table. But even that falls flat in some situations, even if it has reasonably up to date table stats. Maybe the effort is just too high to make it worthwhile to figure things out at runtime.

Even generating those stats can take ages if you have a lot of data. In that case you can sometimes get by with analyzing sample data, sometimes this does not work though. Also some applications, I hear SAP is one of them, loves temporary tables. So these tables are just filled during some operation and then purged again. However in order to generate proper table stats you need to run the analyze on the table in exactly the state that is most relevant to performance. Otherwise you can totally obliterate your performance!

Adding explicit hints to the SQL is a popular "solution" to this problem, provided in some way or another for various things by most RDBMS. So MySQL lets you hint what index you want to see used, you can tell Oracle that join method to prefer etc. It is an ugly "solution" at best though. What happens when things change? Suddenly there are all those hints cluttered around your database.

I suggested a third alternative. What if your DBA could just hand tweak the tables stats? That could get rid of a number of use cases for table hints. It would also work around some of the issues described above with table stat generation. And after all your DBA knows best (tm)! I never heard of any RDBMS supporting this kind of feature. I was ready to give myself a pat in the back and closed my eyes imagining the world of change I just brought to the RDBMS world.

Well that bubble quickly burst when I was pointed at the pg_statistics catalog table in PostGreSQL. Here you have all the table stats to poke at and modify. Very cool!

Comments



Re: Hints, table stats and the all knowing DBA

SQL optimizer hints (synonmous with "evil") are usually required by databases that have less-than-robust cost-based optimizers -- and you're absolutely right, people make careers by sprinkling hints throughout Oracle SQL, then continue their careers by changing those hints as the schema changes.

Hand-tweaking stats in the database is a little less evil, and can work for databases that have good cost-based optimizers -- but what you really want is a statistics update process that doesn't use up much CPU while staying current. In the DB2 world (you _knew_ that was coming) you can set runstats (and all other maintenance functions, like online backups, table reorgs, etc) to automatically run in the background at a maximum CPU threshold so that the majority of your server horsepower is still available for your database.

For statistics, there is a complimentary option: randomly sampling your data can generate statistically reliable stats while requiring a fraction of the analysis time. This is absolutely necessary when you're looking at data warehouses in the tens of terabytes; if you're willing to live with a tiny margin of error, then it's all happiness for you.

Oh yeah, and DB2 has the best cost-based optimizer in the database world; IBM invented the concept decades ago (thanks Pat Selinger) and has refined and relied on it to drive their databases ever since. There's a discussion with Pat, and James Hamilton of SQL Server (formerly DB2 lead architect), about cost-based optimizers at http://www.acmqueue.com/modules.php?name=Content&pa=showpage&pid=297.