On May 13, 2009, at 11:55 AM, wojtekpia wrote:
I came across this article praising Sphinx:
http://www.theregister.co.uk/2009/05/08/dziuba_sphinx/. The article
specifically mentions Solr as an 'aging' technology,
Solr is the same age as Sphinx (2006), so if Solr is aging, then so is
Sphinx. But, hey aren't we all aging? It sure beats not aging. ;-)
That being said, we are always open to suggestions and improvements.
Lucene has seen a massive speedup on indexing that comes through in
Solr in the past year (and it was fast before), and Solr 1.4 looks to
be faster than 1.3 (and it was fast before, too.) The Solr community
is clearly interested in moving things forward and staying fresh, as
is the Lucene community.
and states that
performance on Sphinx is 2x-4x faster than Solr. Has anyone compared
Sphinx
to Solr? Or used Sphinx in the past? I realize that you can't just
say one
is faster than the other because it depends so much on configuration,
requirements, # docs, size of each doc, etc. I'm just looking for
general
observations. I've found other articles comparing Solr with Sphinx
and most
state that performance is similar between the two.
I can't speak to Sphinx, as I haven't used it.
As for performance tests, those are always apples and oranges. If one
camp does them, then the other camp says "You don't know how to use
our product" and vice versa. I think that applies here. So, when you
see things like "Internal tests show" that is always a red flag in my
mind. I've contacted others in the past who have done "comparisons"
and after one round of emailing it was almost always clear that they
didn't know what best practices are for any given product and thus
were doing things sub-optimally.
One thing in the article that is worthwhile to consider is the fact
that some (most?) people would likely benefit from not removing
stopwords, as they can enhance phrase based searching and thus improve
relevance. Obviously, with Solr, it is easy to keep stopwords by
simply removing the StopwordFilterFactor from the analysis process and
then dealing with them appropriately at query time. However, it is
likely the case that too many Solr users simply rely on the example
schema when it comes to setup instead of actively investigating what
the proper choices are for their situation.
Finally, an old baseball saying comes to mind: "Pitchers only bother
to throw at .300 hitters". Solr is a pretty darn full featured search
platform with a large and active community, a commercial friendly
license, and it also performs quite well.
-Grant