ramblings on PHP, SQL, the web, politics, ultimate frisbee and what else is on in my life

Short mental note on fulltext indexes

I rarely use MySQL Fulltext indexes. Their performance is just not good enough, so often its better to just stick with "LIKE" or move to something else like Sphinx, Lucene etc. The only nice thing about them is the ability to compute a match "rank". Well anyways I had to write a new search plugin for a project that is based around MySQL Fulltext indexes and a match rank and all as well .. except that for some reasons some words just would not produce any results. As I was trying to find a pattern I finally noticed that in my test data some words were used in most rows and exactly those were not matching. Obviously it makes sense to exclude automatically any words that have a very high hit ratio. And indeed the documentation states that by default all words with a hit ratio of over 50% are excluded. Doh!


Re: Short mental note on fulltext indexes

Another annoyance is the minimum word length default of 5.

Re: Short mental note on fulltext indexes

If you really need to search for such common words, it would be better to search using IN BOOLEAN MODE instead, which does not observe the 50% threshold.

Re: Short mental note on fulltext indexes

The nice thing about the FULLTEXT is, that it's very configurable.

So for example, the minimum word length is one of those options. I am not sure right now if indeed you need to edit the source or if there is a my.ini configuration option, but you can change it to "3" or "2" or whatever - that I am sure of.

Aside from that Peter provided a good tip, and I want to second that.

Last but not least, how did you come to discover that "LIKE" was better than "MATCH() AGAINST()"? I can hardly believe that.

Before you can post a comment please solve the following captcha.

your name: