Alternatives to full text queries (part II)

Posted by Fernando Doglio on October 12, 2011

For part one, click here

What do they have in common and what makes them different?

Even though it’s hard to come up with a comparison table between all four alternatives, mainly because I can’t claim to have personal experience with all of them, the Internet has a lot of information on the subject, so I went ahead and did a bit of research on the matter.

Another point of interest to consider is that though on the long run, all four solutions provide very similar services; they do it a bit differently, since they can be categorized into two places:

  • Full text search servers: They provide a finished solution, ready for the developers to install and interact with. You don’t have to integrate them into your application; you only have to interact with them. In here we have Solr and Sphinx.
  • Full text search APIs: They provide the functionalities needed by the developer, but at a lower level. You’ll need to integrate these APIs into your application, instead of just consuming it’s services through a standard interface (like what happens with the servers). In here, we have the Lucene Project and the Xapian project.

Taking all of this into account, we can now proceed into a more in-depth discussion about our options:

Continue reading…

Alternatives to full text queries (part I)

Posted by Fernando Doglio on October 03, 2011

When it comes to data storage and data handling, we developers, are quite used to working with database engines (MySQL, Oracle, SQLite, etc) and database query languages.

Depending on the application being developed, one of these solutions can be more than enough to meet our needs and that is what we usually end up using.

But there are times, when the amount of information to be handled is so big (we’re talking about millions of rows of information) and the needed response times are so low (we’re talking about a few milliseconds time) that we need solutions designed specifically for searching large amount of information instead of generic data handling ones.

Continue reading…