Are you using Solr 1.3? You might want to try the latest 1.4 test build - faceting has changed a lot.
-Yonik http://www.lucidimagination.com On Thu, Jun 4, 2009 at 12:01 PM, Yao Ge <yao...@gmail.com> wrote: > > I am index a database with over 1 millions rows. Two of fields contain > unstructured text but size of each fields is limited (256 characters). > > I come up with an idea to use visualize the text fields using text cloud by > turning the two text fields in facets. The weight of font and size is of > each facet value (words) derived from the facet counts. I used simpler field > type so that the there is no stemming to these facet values: > <fieldType name="word" class="solr.TextField" positionIncrementGap="100" >> > <analyzer> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="false"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt"/> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="0" generateNumberParts="0" catenateWords="1" > catenateNumbers="1" catenateAll="0"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > </analyzer> > </fieldType> > > The facet query is considerably slower comparing to other facets from > structured database fields (with highly repeated values). What I found > interesting is that even after I constrained search results to just a few > hunderd hits using other facets, these text facets are still very slow. > > I understand that text fields are not good candidate for faceting as it can > contain very large number of unique values. However why it is still slow > after my matching documents is reduced to hundreds? Is it because the whole > filter is cached (regardless the matching docs) and I don't have enough > filter cache size to fit the whole list? > > The following is my filterCahce setting: > <filterCache class="solr.LRUCache" size="5120" initialSize="512" > autowarmCount="128"/> > > Lastly, what I really want to is to give user a chance to visualize and > filter on top relevant words in the free-text fields. Are there alternative > to facet field approach? term vectors? I can do client side process based on > top N (say 100) hits for this but it is my last option. > -- > View this message in context: > http://www.nabble.com/Faceting-on-text-fields-tp23872891p23872891.html > Sent from the Solr - User mailing list archive at Nabble.com. > >