On Wed, 2011-03-16 at 18:36 +0100, Erik Hatcher wrote: > Sorry, I missed the original mail on this thread.... > > I put together that hierarchical faceting wiki page a couple > of years ago when helping a customer evaluate SOLR-64 vs. > SOLR-792 vs.other approaches. Since then, SOLR-792 morphed > and is committed as pivot faceting. SOLR-64 spawned a > PathTokenizer which is part of Solr now too. > > Recently Toke updated that page with some additional info. > It's definitely not a "how to" page, and perhaps should get > renamed/moved/revamped? Toke?
Unfortunately or luckily, depending on ones point of view, I am hit by a child #3 and buying house combo. A lot of intentions, but no promises for the next month or two. I think we need both an overview and a detailed how-to of the different angles on extended faceting in Solr, seen from a user-perspective. I am not sure I fully understand the different methods myself, so maybe we could start by discussing them here? Below is a quick outline of how I see them; please expand & correct. I plan to back up the claims about scale later with a wiki-page with performance tests. http://www.lucidimagination.com/solutions/webcasts/faceting @27-33 min: - Requires the user to transform the paths to multiple special terms - Step-by-step drill down: If a visual tree is needed, it requires one call for each branch. - Supports multiple paths/document - Constraints on output works just as standard faceting - Scales very well when a single branch is requested Example use case: Click-to-expand tree structure of categories for books. PathHierarchyTokenizer (trunk): Changes /A/B/C to /A, /A/B and /A/B/C. I don't know how this can be used directly for hierarchical faceting. The Lucid Imagination webcast uses the tokenization 0/A, 1/A/B and 2/A/B/C so they seem incompatible to me. The discussion on SOLR-1057 indicates that it can be used with SOLR-64, but SOLR-64 does its own tokenization!? Little help here? SOLR-64 (not up to date with trunk?): - Uses a custom tokenizer to handle delimited paths (A/B/C). - Single-path hierarchical faceting - Constraints can be given on the depth of the hierarchy but not on the number of entries at a given level (huge result set when a wide hierarchy is analyzed) - Fine (speed & memory) for small taxonomies - Does not scale well (speed) to large taxonomies Example use case: Tree structure of addresses for stores. SOLR-792 aka pivot faceting (Solr 4.0): - Uses multiple independent fields as input: Not suitable for taxonomies - Multi-value but not multi-path - Supports taxonomies by restraining to single-path/document(?) - Constraints can be given on entry count, but sorting cannot be done on recursive counting of entries (and it would be very CPU expensive to do so(?)) - Fine (speed & memory) for small taxonomies - Scales well (speed & memory)to large taxonomies - Scales poorly (speed)to large taxonomies and large result size Example use case: Tree structure with price, rating and stock SOLR-2412 (trunk, highly experimental): - Multi-path hierarchical faceting - Uses a field with delimited paths as input (A/B/C) - Constraints can be given on depth as well as entry count, but sorting cannot be done on recursive counting of entries (the number is there though, so it would be fairly easy to add such a sorter) - Fine (speed & memory) for small taxonomies - Scales well (speed & memory)to large taxonomies & result size Example use case: Tree structure of categories for books.