I am index a database with over 1 millions rows. Two of fields contain unstructured text but size of each fields is limited (256 characters).
I come up with an idea to use visualize the text fields using text cloud by turning the two text fields in facets. The weight of font and size is of each facet value (words) derived from the facet counts. I used simpler field type so that the there is no stemming to these facet values: <fieldType name="word" class="solr.TextField" positionIncrementGap="100" > <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> The facet query is considerably slower comparing to other facets from structured database fields (with highly repeated values). What I found interesting is that even after I constrained search results to just a few hunderd hits using other facets, these text facets are still very slow. I understand that text fields are not good candidate for faceting as it can contain very large number of unique values. However why it is still slow after my matching documents is reduced to hundreds? Is it because the whole filter is cached (regardless the matching docs) and I don't have enough filter cache size to fit the whole list? The following is my filterCahce setting: <filterCache class="solr.LRUCache" size="5120" initialSize="512" autowarmCount="128"/> Lastly, what I really want to is to give user a chance to visualize and filter on top relevant words in the free-text fields. Are there alternative to facet field approach? term vectors? I can do client side process based on top N (say 100) hits for this but it is my last option. -- View this message in context: http://www.nabble.com/Faceting-on-text-fields-tp23872891p23872891.html Sent from the Solr - User mailing list archive at Nabble.com.