Re: Faceting on text fields

Otis Gospodnetic Tue, 09 Jun 2009 22:38:30 -0700

Yao,

Solr can already cluster top N hits using Carrot2:
http://wiki.apache.org/solr/ClusteringComponent


I've also done ugly "manual counting" of terms in top N hits.  For example, 
look at the right side of this:
http://www.simpy.com/user/otis/tag/%22machine+learning%22

Something like http://www.sematext.com/product-key-phrase-extractor.html could 
also be used.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Yao Ge <yao...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, June 9, 2009 3:46:13 PM
> Subject: Re: Faceting on text fields
> 
> 
> Michael,
> 
> Thanks for the update! I definitely need to get a 1.4 build see if it makes
> a difference.
> 
> BTW, maybe instead of using faceting for text
> mining/clustering/visualization purpose, we can build a separate feature in
> SOLR for this. Many of commercial search engines I have experiences with
> (Google Search Appliance, Vivisimo etc) provide dynamic term clustering
> based on top N ranked documents (N is a parameter can be configured). When
> facet field is highly fragmented (say a text field), the existing set
> intersection based approach might no longer be optimum. Aggregating term
> vectors over top N docs might be more attractive. Another features I can
> really appreciate is to provide search time n-gram term clustering. Maybe
> this might be better suited for "spell checker" as it just a different way
> to display the alternative search terms.
> 
> -Yao
> 
> 
> Michael Ludwig-4 wrote:
> > 
> > Yao Ge schrieb:
> > 
> >> The facet query is considerably slower comparing to other facets from
> >> structured database fields (with highly repeated values). What I found
> >> interesting is that even after I constrained search results to just a
> >> few hunderd hits using other facets, these text facets are still very
> >> slow.
> >>
> >> I understand that text fields are not good candidate for faceting as
> >> it can contain very large number of unique values. However why it is
> >> still slow after my matching documents is reduced to hundreds? Is it
> >> because the whole filter is cached (regardless the matching docs) and
> >> I don't have enough filter cache size to fit the whole list?
> > 
> > Very interesting questions! I think an answer would both require and
> > further an understanding of how filters work, which might even lead to
> > a more general guideline on when and how to use filters and facets.
> > 
> > Even though faceting appears to have changed in 1.4 vs 1.3, it would
> > still be interesting to understand the 1.3 side of things.
> > 
> >> Lastly, what I really want to is to give user a chance to visualize
> >> and filter on top relevant words in the free-text fields. Are there
> >> alternative to facet field approach? term vectors? I can do client
> >> side process based on top N (say 100) hits for this but it is my last
> >> option.
> > 
> > Also a very interesting data mining question! I'm sorry I don't have any
> > answers for you. Maybe someone else does.
> > 
> > Best,
> > 
> > Michael Ludwig
> > 
> > 
> 
> -- 
> View this message in context: 
> http://www.nabble.com/Faceting-on-text-fields-tp23872891p23950084.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Faceting on text fields

Reply via email to