Re: Faceting on text fields

Yonik Seeley Thu, 04 Jun 2009 10:37:01 -0700

Are you using Solr 1.3?
You might want to try the latest 1.4 test build - faceting has changed a lot.


-Yonik
http://www.lucidimagination.com

On Thu, Jun 4, 2009 at 12:01 PM, Yao Ge <yao...@gmail.com> wrote:
>
> I am index a database with over 1 millions rows. Two of fields contain
> unstructured text but size of each fields is limited (256 characters).
>
> I come up with an idea to use visualize the text fields using text cloud by
> turning the two text fields in facets. The weight of font and size is of
> each facet value (words) derived from the facet counts. I used simpler field
> type so that the there is no stemming to these facet values:
>    <fieldType name="word" class="solr.TextField" positionIncrementGap="100"
>>
>      <analyzer>
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="false"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="0" generateNumberParts="0" catenateWords="1"
> catenateNumbers="1" catenateAll="0"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>      </analyzer>
>    </fieldType>
>
> The facet query is considerably slower comparing to other facets from
> structured database fields (with highly repeated values). What I found
> interesting is that even after I constrained search results to just a few
> hunderd hits using other facets, these text facets are still very slow.
>
> I understand that text fields are not good candidate for faceting as it can
> contain very large number of unique values. However why it is still slow
> after my matching documents is reduced to hundreds? Is it because the whole
> filter is cached (regardless the matching docs) and I don't have enough
> filter cache size to fit the whole list?
>
> The following is my filterCahce setting:
>     <filterCache class="solr.LRUCache" size="5120" initialSize="512"
> autowarmCount="128"/>
>
> Lastly, what I really want to is to give user a chance to visualize and
> filter on top relevant words in the free-text fields. Are there alternative
> to facet field approach? term vectors? I can do client side process based on
> top N (say 100) hits for this but it is my last option.
> --
> View this message in context: 
> http://www.nabble.com/Faceting-on-text-fields-tp23872891p23872891.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Faceting on text fields

Reply via email to