Hi,

The ideia is don't index if something similar (headline+bodytext) for
the same exact medianame.

Do you mean I would need to index the doc first (maybe in a temp index)
and then use the MLT feature to find similar docs before adding to final
index?

Thanks,
Frederico


-----Original Message-----
From: Chris Fauerbach [mailto:chris.fauerb...@gmail.com] 
Sent: segunda-feira, 4 de Abril de 2011 10:22
To: solr-user@lucene.apache.org
Subject: Re: Using MLT feature

Do you want to not index if something similar? Or don't index if exact.
I would look into a hash code of the document if you don't want to index
exact.    Similar though, I think has to be based off a document in the
index.   

On Apr 4, 2011, at 5:16, Frederico Azeiteiro
<frederico.azeite...@cision.com> wrote:

> Hi,
> 
> 
> 
> I would like to hear your opinion about the MLT feature and if it's a
> good solution to what I need to implement.
> 
> 
> 
> My index has fields like: headline, body and medianame.
> 
> What I need to do is, before adding a new doc, verify if a similar doc
> exists for this media.
> 
> 
> 
> My idea is to use the MorelikeThisHandler
> (http://wiki.apache.org/solr/MoreLikeThisHandler) in the following
way:
> 
> 
> 
> For each new doc, perform a MLT search with q= medianame and
> stream.body=headline+bodytext.
> 
> If no similar docs are found than I can safely add the doc.
> 
> 
> 
> Is this feasible using the MLT handler? Is it a good approach? Are
there
> a better way to perform this comparison?
> 
> 
> 
> Thank you for your help.
> 
> 
> 
> Best regards,
> 
> ____________________________________________
> 
> Frederico Azeiteiro
> 
> 
> 

Reply via email to