On May 13, 2009, at 11:55 AM, wojtekpia wrote:


I came across this article praising Sphinx:
http://www.theregister.co.uk/2009/05/08/dziuba_sphinx/. The article
specifically mentions Solr as an 'aging' technology,

Solr is the same age as Sphinx (2006), so if Solr is aging, then so is Sphinx. But, hey aren't we all aging? It sure beats not aging. ;-) That being said, we are always open to suggestions and improvements. Lucene has seen a massive speedup on indexing that comes through in Solr in the past year (and it was fast before), and Solr 1.4 looks to be faster than 1.3 (and it was fast before, too.) The Solr community is clearly interested in moving things forward and staying fresh, as is the Lucene community.

and states that
performance on Sphinx is 2x-4x faster than Solr. Has anyone compared Sphinx to Solr? Or used Sphinx in the past? I realize that you can't just say one
is faster than the other because it depends so much on configuration,
requirements, # docs, size of each doc, etc. I'm just looking for general observations. I've found other articles comparing Solr with Sphinx and most
state that performance is similar between the two.

I can't speak to Sphinx, as I haven't used it.

As for performance tests, those are always apples and oranges. If one camp does them, then the other camp says "You don't know how to use our product" and vice versa. I think that applies here. So, when you see things like "Internal tests show" that is always a red flag in my mind. I've contacted others in the past who have done "comparisons" and after one round of emailing it was almost always clear that they didn't know what best practices are for any given product and thus were doing things sub-optimally.

One thing in the article that is worthwhile to consider is the fact that some (most?) people would likely benefit from not removing stopwords, as they can enhance phrase based searching and thus improve relevance. Obviously, with Solr, it is easy to keep stopwords by simply removing the StopwordFilterFactor from the analysis process and then dealing with them appropriately at query time. However, it is likely the case that too many Solr users simply rely on the example schema when it comes to setup instead of actively investigating what the proper choices are for their situation.

Finally, an old baseball saying comes to mind: "Pitchers only bother to throw at .300 hitters". Solr is a pretty darn full featured search platform with a large and active community, a commercial friendly license, and it also performs quite well.

-Grant

Reply via email to