Author: Andrew Aksyonoff
Aimed at: Web developers interested in text search
Pros: Good introduction to text search
Cons: Limited technical coverage, lacks an index
Reviewed by: Ian Elliot
A slim book by the designer of Sphinx with the subtitle "From installation to relevance tuning" sounds useful to its potential users. Is it essential?
Sphinx is a free-to-use, open source, search engine and it isn't surprising if you haven't heard of it. It is used by some high profile websites, however, including Craigslist serving over 50 million queries per day, Slashdot, Scribd, Mozilla and so on.
It is special in that it is designed to index database content rather than collections of documents. Although you can make it index general documents if you are prepared to do some work. You might think that indexing a database is something a database should do for itself, but a very common use of a database is to store the full text of a website and this needs more than just basic database indexing. Sphinx is all about implementing full text search on the documents stored in the database. Currently it works with MySQL, MSSQL, PostgreSQL, and it supports ODBC connectivity.
This is a book on using Sphinx written by its designer and as such you would expect an inside view. The book opens with a really nice chapter on the theory of text search. It provides an overview of why text search is different, the sorts of idea that are central to implementing it, and many of the linguistics ideas, such as stemming, that are part of it.
Once we get onto Chapter 2 and working with Sphinx things aren't quite as well explained. The getting started instructions are a bit vague, but as long as you are prepared to put some effort into interpreting them you should be able to get the samples working. Although Sphinx does work under Windows the examples are all Linux based. The problem seems to be that the author doesn't have a clear idea what the complete beginner needs to know to follow how to use Sphinx.
The chapter also covers a lot of ground and perhaps it would be better split into multiple, more focused, chapters. For example, it covers building Sphinx from the source code, which could be relegated to an appendix even. A separate chapter on using the API would also be a good idea.
Chapter 3 is about basic indexing. Chapter 4 follows on with basic searching. At this point in the book you have seen most of the basic techniques of using Sphinx to search a database. Chapter 5 deals with management and fine tuning. The final chapter is on relevance ranking, including how to construct your own functions.
This is a reasonable introduction to Sphinx and it does a reasonable job of making you see what it does and how it does it. It is also worth noting that the documentation on the Sphinx website is very good and goes further a lot further than the book. It also provides examples of using Sphinx via the API in PHP right from the start.
What the book fails to do is provide any example of how text search and Sphinx might all fit together into an existing website. Of course you might well be up to the task of working it all out for yourself, but some guidance on best practices would have been nice. Ironically for a book on text search, it lacks an index
If you are thinking of using Sphinx then this book is a handy, but not essential, addition to the online documentation.