Hi,

Hm, I don't think facets (nor pure search/Solr) are the right tool for this 
job.  I think you have to do what Ian said, which is to compute the baseline 
for various concepts of interest (Barack Obama and Iran in your example), and 
then compare.

Look at point #2 on http://www.sematext.com/product-key-phrase-extractor.html . 
 I think this is what you are after, and you will even see an example that 
matches yours very closely.  My guess is that's how 
http://www.google.com/trends/hottrends works, too.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Asif Rahman <a...@newscred.com>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, June 23, 2009 8:05:48 AM
> Subject: Re: Facets with an IDF concept
> 
> Hi Grant,
> 
> I'll give a real life example of the problem that we are trying to solve.
> 
> We index a large number of current news articles on a continuing basis.  We
> tag these articles with news topics (e.g. Barack Obama, Iran, etc.).  We
> then use these tags to facet our queries.  For example, we might issue a
> query for all articles in the last 24 hours.  The facets would then tell us
> which news topics have been written about the most in that period.  The
> problem is that "Barack Obama", for example, is always written about in high
> frequency, as opposed to "Iran" which is currently very hot in the news, but
> which has not always been the case.  In this case, we'd like to see "Iran"
> show up higher than "Barack Obama" in the facet results.
> 
> To me, this seems identical to the tf-idf scoring expression that is used in
> normal search.  The facet count is analogous to the tf and I can access the
> facet term idf's through the Similarity API.
> 
> Is my reasoning sound?  Can you provide any guidance as to the best way to
> implement this?
> 
> Thanks for your help,
> 
> Asif
> 
> 
> On Tue, Jun 23, 2009 at 1:19 PM, Grant Ingersoll wrote:
> 
> >
> > On Jun 23, 2009, at 3:58 AM, Asif Rahman wrote:
> >
> >  Hi again,
> >>
> >> I guess nobody has used facets in the way I described below before.  Do
> >> any
> >> of the experts have any ideas as to how to do this efficiently and
> >> correctly?  Any thoughts would be greatly appreciated.
> >>
> >> Thanks,
> >>
> >> Asif
> >>
> >> On Wed, Jun 17, 2009 at 12:42 PM, Asif Rahman wrote:
> >>
> >>  Hi all,
> >>>
> >>> We have an index of news articles that are tagged with news topics.
> >>> Currently, we use solr facets to see which topics are popular for a given
> >>> query or time period.  I'd like to apply the concept of IDF to the facet
> >>> counts so as to penalize the topics that occur broadly through our index.
> >>> I've begun to write custom facet component that applies the IDF to the
> >>> facet
> >>> counts, but I also wanted to check if anyone has experience using facets
> >>> in
> >>> this way.
> >>>
> >>
> >
> > I'm not sure I'm following.  Would you be faceting on one field, but using
> > the DF from some other field?  Faceting is already a count of all the
> > documents that contain the term on a given field for that search.  If I'm
> > understanding, you would still do the typical faceting, but then rerank by
> > the global DF values, right?
> >
> > Backing up, what is the problem you are seeing that you are trying to
> > solve?
> >
> > I think you could do this, but you'd have to hook it in yourself.  By
> > penalize, do you mean remove, or just have them in the sort?  Generally
> > speaking, looking up the DF value can be expensive, especially if you do a
> > lot of skipping around.  I don't know how pluggable the sort capabilities
> > are for faceting, but that might be the place to start if you are just
> > looking at the sorting options.
> >
> >
> >
> > --------------------------
> > Grant Ingersoll
> > http://www.lucidimagination.com/
> >
> > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> > Solr/Lucene:
> > http://www.lucidimagination.com/search
> >
> >
> 
> 
> -- 
> Asif Rahman
> Lead Engineer - NewsCred
> a...@newscred.com
> http://platform.newscred.com

Reply via email to