On Wed, 2011-03-16 at 18:36 +0100, Erik Hatcher wrote:
> Sorry, I missed the original mail on this thread....
> 
> I put together that hierarchical faceting wiki page a couple
> of years ago when helping a customer evaluate SOLR-64 vs.
> SOLR-792 vs.other approaches.  Since then, SOLR-792 morphed
> and is committed as pivot faceting.  SOLR-64 spawned a
> PathTokenizer which is part of Solr now too.
> 
> Recently Toke updated that page with some additional info.
> It's definitely not a "how to" page, and perhaps should get
> renamed/moved/revamped?  Toke?

Unfortunately or luckily, depending on ones point of view, I am hit by a
child #3 and buying house combo. A lot of intentions, but no promises
for the next month or two. 


I think we need both an overview and a detailed how-to of the different
angles on extended faceting in Solr, seen from a user-perspective.

I am not sure I fully understand the different methods myself, so maybe
we could start by discussing them here? Below is a quick outline of how
I see them; please expand & correct. I plan to back up the claims about
scale later with a wiki-page with performance tests.


http://www.lucidimagination.com/solutions/webcasts/faceting @27-33 min:

- Requires the user to transform the paths to multiple special terms
- Step-by-step drill down: If a visual tree is needed, it requires one 
  call for each branch.
- Supports multiple paths/document
- Constraints on output works just as standard faceting
- Scales very well when a single branch is requested

Example use case:
Click-to-expand tree structure of categories for books.


PathHierarchyTokenizer (trunk):
Changes /A/B/C to /A, /A/B and /A/B/C.

I don't know how this can be used directly for hierarchical faceting.
The Lucid Imagination webcast uses the tokenization 0/A, 1/A/B and
2/A/B/C so they seem incompatible to me. The discussion on SOLR-1057
indicates that it can be used with SOLR-64, but SOLR-64 does its own
tokenization!?  Little help here?


SOLR-64 (not up to date with trunk?):

- Uses a custom tokenizer to handle delimited paths (A/B/C).
- Single-path hierarchical faceting
- Constraints can be given on the depth of the hierarchy but not on the 
  number of entries at a given level (huge result set when a wide 
  hierarchy is analyzed)
- Fine (speed & memory) for small taxonomies
- Does not scale well (speed) to large taxonomies

Example use case:
Tree structure of addresses for stores.


SOLR-792 aka pivot faceting (Solr 4.0):

- Uses multiple independent fields as input: Not suitable for taxonomies
- Multi-value but not multi-path
- Supports taxonomies by restraining to single-path/document(?)
- Constraints can be given on entry count, but sorting cannot be done 
  on recursive counting of entries (and it would be very CPU expensive
  to do so(?))
- Fine (speed & memory) for small taxonomies
- Scales well (speed & memory)to large taxonomies
- Scales poorly (speed)to large taxonomies and large result size

Example use case:
Tree structure with price, rating and stock


SOLR-2412 (trunk, highly experimental):

- Multi-path hierarchical faceting
- Uses a field with delimited paths as input (A/B/C)
- Constraints can be given on depth as well as entry count, but sorting
  cannot be done on recursive counting of entries (the number is there 
  though, so it would be fairly easy to add such a sorter)
- Fine (speed & memory) for small taxonomies
- Scales well (speed & memory)to large taxonomies & result size

Example use case:
Tree structure of categories for books.

Reply via email to