Re: solr faceted search performance reason

Jonathan Rochkind Wed, 06 Apr 2011 11:07:00 -0700

On 4/6/2011 10:55 AM, Robin Palotai wrote:

Therefore, Lucene supposedly has some advanced technique for multi-field
queries other than just taking the intersection of matching documents based
on the inverted index.

I don't think so, neccesarily. It's just that Lucene's algorithms todoing this is very fast, with some additional optimizations to make iteven faster. There may be some edge cases where the optimizations takesome shortcuts on top of this -- ie, if you ask for only the first tenfacet values ordered by number of hits, in some cases solr/lucene won'teven calculate the hit counts for facet values it already knows aren'tgoing to be in the top 10. The facetting code in 1.4+ is actually kindof tangled, in that several different calculation approaches can betaken depending on the nature of the result set and schema.

But anyway, I think you're right that you could set up an rdbms schemato _conceptually_ allow very similar operations to a lucene index. Itwould be unlikely to perform as well, because the devil is in thedetails of the storage formats and algorithms, and lucene has beenoptimized for these particular cases (at the expense of not covering agreat many cases that an rdbms can cover).

In fact, while I can't find it now on Google, I think someone HAS in thepast written an extension to lucene to have it store it's indexes in anrdbms using a schema much like you describe, instead of in the filesystem. I'm not sure why they would want to do this instead of justusing the rdbms -- either lucene's access algorithms still provide aperformance benefit even when using an rdbms as the underlying 'filesystem', or lucene provides convenient functions that you wouldn't wantto have to re-implement yourself solely in terms of an rdbms, or both.Ah, here's a brief reference to that approach in the lucene FAQ:http://wiki.apache.org/lucene-java/LuceneFAQ#Can_I_store_the_Lucene_index_in_a_relational_database.3F


Jonathan

So the question is, what is this technique/trick? More broadly: Why can
Lucene/Solr achieve better faceted search performance theoretically than
RDBMS could (if so)?

*Note: My first guess would be that Lucene would use some space partitioning
method for partitioning a vector space built from the document fields as
dimensions, but as I understand Lucene is not purely vector space based.*
Thanks,
Robin

On Wed, Apr 6, 2011 at 3:15 PM, Erick Erickson<erickerick...@gmail.com>wrote:

Please re-post the question here so others can see
the discussion without going to another list.

Best
Erick

On Wed, Apr 6, 2011 at 4:09 AM, Robin Palotai<m.palotai.ro...@gmail.com

wrote:
Hello List,

Please see my question at

http://stackoverflow.com/questions/5552919/how-does-lucene-solr-achieve-high-performance-in-multi-field-faceted-search

,
I would be interested to know some details.

Thank you,
Robin

Re: solr faceted search performance reason

Reply via email to