The Shingle Filter Breaks the words in a sentence into a combination of 2/3 words.
For faceting field you should use :- <field name="facet_field" *type="string"* indexed="true" stored="true" multiValued="true"/> The type of the field should be *string *so that it is not tokenised at all. On Wed, Oct 27, 2010 at 9:12 AM, Adam Estrada <estrada.a...@gmail.com>wrote: > Thanks guys, the solr.ShingleFilterFactory did work to get me multiple > terms per facet but now I am seeing some redundancy in the facets > numbers. See below... > > Highway (62) > Highway System (59) > National (59) > National Highway (59) > National Highway System (59) > System (59) > > See what's going on here? How can I make my multi token facets smarter > so that the tokens aren't duplicated? > > Thanks in advance, > Adam > > On Tue, Oct 26, 2010 at 10:32 PM, Ahmet Arslan <iori...@yahoo.com> wrote: > > Facets are generated from indexed terms. > > > > Depending on your need/use-case: > > > > You can use a additional separate String field (which is not tokenized) > for facets, populate it via copyField. Search on tokenized field facet on > non-tokenized field. > > > > Or > > > > You can add solr.ShingleFilterFactory to your index analyzer to form > multiple word terms. > > > > --- On Wed, 10/27/10, Adam Estrada <estrada.a...@gmail.com> wrote: > > > >> From: Adam Estrada <estrada.a...@gmail.com> > >> Subject: Multiple Word Facets > >> To: solr-user@lucene.apache.org > >> Date: Wednesday, October 27, 2010, 4:43 AM > >> All, > >> I am a new to Solr faceting and stuck on how to get > >> multiple-word > >> facets returned from a standard Solr query. See below for > >> what is > >> currently being returned. > >> > >> <lst name="facet_counts"> > >> <lst name="facet_queries"/> > >> <lst name="facet_fields"> > >> <lst name="title"> > >> <int name="Federal">89</int> > >> <int name="EFLHD">87</int> > >> <int name="Eastern">87</int> > >> <int name="Lands">87</int> > >> <int name="Highways">84</int> > >> <int name="FHWA">60</int> > >> <int name="Transportation">32</int> > >> <int name="GIS">22</int> > >> <int name="Planning">19</int> > >> <int name="Asset">15</int> > >> <int name="Environment">15</int> > >> <int name="Management">14</int> > >> <int name="Realty">12</int> > >> <int name="Highway">11</int> > >> <int name="HEP">10</int> > >> <int name="Program">9</int> > >> <int name="HEPGIS">7</int> > >> <int name="Resources">7</int> > >> <int name="Roads">7</int> > >> <int name="EEI">6</int> > >> <int name="Environmental">6</int> > >> <int name="Right">6</int> > >> <int name="Way">6</int> > >> ...etc... > >> > >> There are many terms in there that are 2 or 3 word phrases. > >> For > >> example, Eastern Federal Lands Highway Division all gets > >> broken down > >> in to the individual words that make up the total group of > >> words. I've > >> seen quite a few websites that do what it is I am trying to > >> do here so > >> any suggestions at this point would be great. See my schema > >> below > >> (copied from the example schema). > >> > >> <fieldType name="text" > >> class="solr.TextField" positionIncrementGap="100"> > >> <analyzer type="index"> > >> <tokenizer > >> class="solr.WhitespaceTokenizerFactory"/> > >> <filter > >> class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > >> ignoreCase="true" expand="false"/> > >> <filter > >> class="solr.StopFilterFactory" > >> > >> ignoreCase="true" > >> > >> words="stopwords.txt" > >> > >> enablePositionIncrements="true" > >> > >> /> > >> <filter > >> class="solr.WordDelimiterFilterFactory" > >> generateWordParts="1" > >> generateNumberParts="1" catenateWords="0" > >> catenateNumbers="0" > >> catenateAll="0" splitOnCaseChange="1"/> > >> <filter > >> class="solr.RemoveDuplicatesTokenFilterFactory"/> > >> </analyzer> > >> > >> Similar for type="query". Please advise on how to group or > >> cluster > >> document terms so that they can be used as facets. > >> > >> Many thanks in advance, > >> Adam Estrada > >> > > > > > > > > >