The Shingle Filter Breaks the words in a sentence into a combination of 2/3
words.

For faceting field you should use :-
<field name="facet_field" *type="string"* indexed="true" stored="true"
multiValued="true"/>

The type of the field should be *string *so that it is not tokenised at all.

On Wed, Oct 27, 2010 at 9:12 AM, Adam Estrada <estrada.a...@gmail.com>wrote:

> Thanks guys, the solr.ShingleFilterFactory did work to get me multiple
> terms per facet but now I am seeing some redundancy in the facets
> numbers. See below...
>
> Highway (62)
> Highway System (59)
> National (59)
> National Highway (59)
> National Highway System (59)
> System (59)
>
> See what's going on here? How can I make my multi token facets smarter
> so that the tokens aren't duplicated?
>
> Thanks in advance,
> Adam
>
> On Tue, Oct 26, 2010 at 10:32 PM, Ahmet Arslan <iori...@yahoo.com> wrote:
> > Facets are generated from indexed terms.
> >
> > Depending on your need/use-case:
> >
> > You can use a additional separate String field (which is not tokenized)
> for facets, populate it via copyField. Search on tokenized field facet on
> non-tokenized field.
> >
> > Or
> >
> > You can add solr.ShingleFilterFactory to your index analyzer to form
> multiple word terms.
> >
> > --- On Wed, 10/27/10, Adam Estrada <estrada.a...@gmail.com> wrote:
> >
> >> From: Adam Estrada <estrada.a...@gmail.com>
> >> Subject: Multiple Word Facets
> >> To: solr-user@lucene.apache.org
> >> Date: Wednesday, October 27, 2010, 4:43 AM
> >> All,
> >> I am a new to Solr faceting and stuck on how to get
> >> multiple-word
> >> facets returned from a standard Solr query. See below for
> >> what is
> >> currently being returned.
> >>
> >> <lst name="facet_counts">
> >> <lst name="facet_queries"/>
> >> <lst name="facet_fields">
> >> <lst name="title">
> >> <int name="Federal">89</int>
> >> <int name="EFLHD">87</int>
> >> <int name="Eastern">87</int>
> >> <int name="Lands">87</int>
> >> <int name="Highways">84</int>
> >> <int name="FHWA">60</int>
> >> <int name="Transportation">32</int>
> >> <int name="GIS">22</int>
> >> <int name="Planning">19</int>
> >> <int name="Asset">15</int>
> >> <int name="Environment">15</int>
> >> <int name="Management">14</int>
> >> <int name="Realty">12</int>
> >> <int name="Highway">11</int>
> >> <int name="HEP">10</int>
> >> <int name="Program">9</int>
> >> <int name="HEPGIS">7</int>
> >> <int name="Resources">7</int>
> >> <int name="Roads">7</int>
> >> <int name="EEI">6</int>
> >> <int name="Environmental">6</int>
> >> <int name="Right">6</int>
> >> <int name="Way">6</int>
> >> ...etc...
> >>
> >> There are many terms in there that are 2 or 3 word phrases.
> >> For
> >> example, Eastern Federal Lands Highway Division all gets
> >> broken down
> >> in to the individual words that make up the total group of
> >> words. I've
> >> seen quite a few websites that do what it is I am trying to
> >> do here so
> >> any suggestions at this point would be great. See my schema
> >> below
> >> (copied from the example schema).
> >>
> >>     <fieldType name="text"
> >> class="solr.TextField" positionIncrementGap="100">
> >>       <analyzer type="index">
> >>          <tokenizer
> >> class="solr.WhitespaceTokenizerFactory"/>
> >>     <filter
> >> class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> >> ignoreCase="true" expand="false"/>
> >>         <filter
> >> class="solr.StopFilterFactory"
> >>
> >> ignoreCase="true"
> >>
> >> words="stopwords.txt"
> >>
> >> enablePositionIncrements="true"
> >>
> >> />
> >>     <filter
> >> class="solr.WordDelimiterFilterFactory"
> >> generateWordParts="1"
> >> generateNumberParts="1" catenateWords="0"
> >> catenateNumbers="0"
> >> catenateAll="0" splitOnCaseChange="1"/>
> >>         <filter
> >> class="solr.RemoveDuplicatesTokenFilterFactory"/>
> >>       </analyzer>
> >>
> >> Similar for type="query". Please advise on how to group or
> >> cluster
> >> document terms so that they can be used as facets.
> >>
> >> Many thanks in advance,
> >> Adam Estrada
> >>
> >
> >
> >
> >
>

Reply via email to