Re: Faceting on text fields

Yao Ge Thu, 04 Jun 2009 12:12:22 -0700

Yes. I am using 1.3. When is 1.4 due for release?


Yonik Seeley-2 wrote:
> 
> Are you using Solr 1.3?
> You might want to try the latest 1.4 test build - faceting has changed a
> lot.
> 
> -Yonik
> http://www.lucidimagination.com
> 
> On Thu, Jun 4, 2009 at 12:01 PM, Yao Ge <yao...@gmail.com> wrote:
>>
>> I am index a database with over 1 millions rows. Two of fields contain
>> unstructured text but size of each fields is limited (256 characters).
>>
>> I come up with an idea to use visualize the text fields using text cloud
>> by
>> turning the two text fields in facets. The weight of font and size is of
>> each facet value (words) derived from the facet counts. I used simpler
>> field
>> type so that the there is no stemming to these facet values:
>>    <fieldType name="word" class="solr.TextField"
>> positionIncrementGap="100"
>>>
>>      <analyzer>
>>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>> ignoreCase="true" expand="false"/>
>>        <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords.txt"/>
>>        <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="0" generateNumberParts="0" catenateWords="1"
>> catenateNumbers="1" catenateAll="0"/>
>>        <filter class="solr.LowerCaseFilterFactory"/>
>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>      </analyzer>
>>    </fieldType>
>>
>> The facet query is considerably slower comparing to other facets from
>> structured database fields (with highly repeated values). What I found
>> interesting is that even after I constrained search results to just a few
>> hunderd hits using other facets, these text facets are still very slow.
>>
>> I understand that text fields are not good candidate for faceting as it
>> can
>> contain very large number of unique values. However why it is still slow
>> after my matching documents is reduced to hundreds? Is it because the
>> whole
>> filter is cached (regardless the matching docs) and I don't have enough
>> filter cache size to fit the whole list?
>>
>> The following is my filterCahce setting:
>>     <filterCache class="solr.LRUCache" size="5120" initialSize="512"
>> autowarmCount="128"/>
>>
>> Lastly, what I really want to is to give user a chance to visualize and
>> filter on top relevant words in the free-text fields. Are there
>> alternative
>> to facet field approach? term vectors? I can do client side process based
>> on
>> top N (say 100) hits for this but it is my last option.
>> --
>> View this message in context:
>> http://www.nabble.com/Faceting-on-text-fields-tp23872891p23872891.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Faceting-on-text-fields-tp23872891p23876051.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Faceting on text fields

Reply via email to