Hi, Bill (and others). I post this for what it's worth - it's a very specialized resolution we wrote to a similar requirement that may help with your (and similar) requirements.
Caveats abound [1] We're running 3.1. We wanted to be able to return facets which matched on our actual search, rather than all facets from the entire result set. For example, if a user searches for author 'Twain', we present to them a list of facets which match 'Twain', and exclude facets where 'Twain' is not found. (Now - we don't tell our users that these are 'facet' values - we just present an alpha-sorted list of author names with a count of associated documents) So, we search our Author search field to identify matching documents, get all the facets (i.e. normal Solr processing to this point), and then filter that facet set to include only those that match our original search. We added our own extra facet parameter (facet.sirsidynix.filter.facets) to instruct Solr when to do this special facet filtering. We modified SimpleFacets method getTermCounts right before the final "return counts;" like this: // Custom SirsiDynix code. if (params.getBool(FacetParams.FACET_SIRSIDYNIX_FILTER_FACETS, false)) { counts = filterCounts(field, counts); } return counts; and added method 'filterCounts()' which is this class, basically wrapping things up to run the search against each facet value, setting up MemoryIndex instances based on our schema, inserting the facet value, and running our original query against the MemoryIndex. Anything that matches has a score > 0, and those are the only ones we keep: /** * Custom SirsiDynix code: * Filters counts down to only those entries that match the original * query. Does this by using lucene's MemoryIndex - a very fast, in-memory, * single document index that can have queries run against it. * For each string value in count, we create a MemoryIndex and run the * original query against it. Anything with a score > 0 means a 'hit', so * the value matches the original query, and we'll retain it. Score 0 means * no hit (i.e. was a facet value that was associated with a document that matched * the query, but the facet value itself didn't match the query). * @param field name of the field that the facet values came from. * @param counts Lucene's list of facet values. * @return filtered set, only those matching the original query. */ private NamedList filterCounts(String field, NamedList counts) { if (!field.endsWith("_facet")) { return counts; } // Trim off "_facet" String fieldBase = field.substring(0,field.length() - 6); // Builds fields to search against. // Note that original came from (e.g.) AUTHOR_facet. // And, original search would have been for INITIAL_AUTHOR_SRCH_boost as well as // SUBSEQUENT_AUTHOR_SRCH_boost (and fuzzy's). However, we're only searching // one string at a time, so we'll shove it into the single-values INITIAL_xxx // fields. That will be good enough for the Query to be able to correctly // evaluate against the document. String fieldBoost = "INITIAL_" + fieldBase + "_SRCH_boost"; String fieldFuzzy = "INITIAL_" + fieldBase + "_SRCH_fuzzy"; NamedList newCounts = new NamedList(); IndexSchema schema = searcher.getSchema(); SchemaField schemaField = schema.getField(fieldBoost); FieldType fieldType = schemaField.getType(); Analyzer fieldAnalyzer = fieldType.getAnalyzer(); SchemaField schemaFuzzyField = schema.getField(fieldFuzzy); FieldType fuzzyFieldType = schemaFuzzyField.getType(); Analyzer fuzzyFieldAnalyzer = fuzzyFieldType.getAnalyzer(); for (int i = 0; i < counts.size(); i++) { String testValue = counts.getName(i); MemoryIndex index = new MemoryIndex(); index.addField(fieldBoost, testValue, fieldAnalyzer); index.addField(fieldFuzzy, testValue, fuzzyFieldAnalyzer); float score = index.search(rb.getQuery()); if (score > 0.0f) { newCounts.add(testValue, counts.getVal(i)); } } return newCounts; } A bit of explanation on our schema will be in order here. 1) We've suffixed all our facet fields with "_facet" - hence that first if statement. 2) We have matching 'searchable' and 'facet' fields, names basically differ only in the suffix. So, we strip off '_facet' and append '_boost' and '_fuzzy' (our two field types for searching against (and possibly applying boosts), and doing fuzzy matching against). (You'll see it's not exactly that - but you can hopefully modify your version to match your schema) Basically the idea is that we can derive the field name(s) against which the original search was issued from the facet field name. 3) You'll want to read up on the MemoryIndex class to see more about how it works, rather than me re-iterating that here. [1] Caveats 1) We didn't do anything with the date type faceting, or with any ranges. 2) We didn't do anything with Facet prefix handling - it may or may not work if you need prefixes. 3) Anything else that facets do that we didn't handle - or at least, didn't test :) As I say, it's a very special case for us, and this is in no way intended to be a general solution or fit for 'prime time' submission as a Solr enhancement. Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com > -----Original Message----- > From: Bill Bell [mailto:billnb...@gmail.com] > Sent: Wednesday, June 22, 2011 3:49 AM > To: solr-user@lucene.apache.org > Subject: Re: MultiValued facet behavior question > > You can type q=cardiology and match on cardiologist. If stemming did > not > work you can just add a synonym: > > cardiology,cardiologist > > But that is not the issue. The issue is around multiValue fields and > facets. You would expect a user > Who is searching on the multiValued field to match on some values in > there. For example, > they type "Cardiologist" and it matches on the value "Cardiologist". So > it > matches "in the multiValue field". > So that part works. Then when I output the facet, I need a different > behavior than the default. I need > The facet to only output the value that matches (scored) - NOT ALL > VALUES > in the multiValued field. > > I think it makes sense? > > > On 6/22/11 1:42 AM, "Michael Kuhlmann" <s...@kuli.org> wrote: > > >Am 22.06.2011 05:37, schrieb Bill Bell: > >> It can get more complicated. Here is another example: > >> > >> q=cardiology&defType=dismax&qf=specialties > >> > >> > >> (Cardiology and cardiologist are stems)... > >> > >> But I don't really know which value in Cardiologist match perfectly. > >> > >> Again, I only want it to return: > >> > >> Cardiologist: 3 > > > >You would never get "Cardiologist: 3" as the facet result, because if > >"Cardiologist" would be in your index, it's impossible to find it when > >searching for "cardiology" (except when you manage to write some > strange > >tokenizer that translates "cardiology" to "Cardiologist" on query > time, > >including the upper case letter). > > > >Facets are always taken from the index, so they usually match exactly > or > >never when querying for it. > > > >-Kuli > >