Re: Filtering near-duplicates using TextProfileSignature

2010-06-09 Thread Markus Jelsma
Here's my config for the updateProcessor. It not uses another signature method but i've used TextProfileSignature as well and it works - sort of. updateRequestProcessorChain name=dedupe processor class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory bool

Re: Filtering near-duplicates using TextProfileSignature

2010-06-09 Thread Markus Jelsma
Well, it got me too! KMail didn't properly order this thread. Can't seem to find Hatcher's reply anywhere. ??!!? On Tuesday 08 June 2010 22:00:06 Andrew Clegg wrote: Andrew Clegg wrote: Re. your config, I don't see a minTokenLength in the wiki page for deduplication, is this a recent

Re: Filtering near-duplicates using TextProfileSignature

2010-06-09 Thread Andrew Clegg
Markus Jelsma wrote: Well, it got me too! KMail didn't properly order this thread. Can't seem to find Hatcher's reply anywhere. ??!!? Whole thread here: http://lucene.472066.n3.nabble.com/Filtering-near-duplicates-using-TextProfileSignature-tt479039.html -- View this message

Re: Filtering near-duplicates using TextProfileSignature

2010-06-09 Thread Neeb
Thanks guys. I will try this with some test documents, fingers crossed. And by the way, I got the minTokenLen parameter from one of the thread replies (from Erik). Cheerz, Ali -- View this message in context: http://lucene.472066.n3.nabble.com/Filtering-near-duplicates-using

Re: Filtering near-duplicates using TextProfileSignature

2010-06-08 Thread Neeb
://lucene.472066.n3.nabble.com/Filtering-near-duplicates-using-TextProfileSignature-tp479039p880044.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Filtering near-duplicates using TextProfileSignature

2010-06-08 Thread Andrew Clegg
in context: http://lucene.472066.n3.nabble.com/Filtering-near-duplicates-using-TextProfileSignature-tp479039p880385.html Sent from the Solr - User mailing list archive at Nabble.com.

Filtering near-duplicates using TextProfileSignature

2010-01-12 Thread Andrew Clegg
in the wiki? Thanks! Andrew. -- View this message in context: http://old.nabble.com/Filtering-near-duplicates-using-TextProfileSignature-tp27127151p27127151.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Filtering near-duplicates using TextProfileSignature

2010-01-12 Thread Erik Hatcher
On Jan 12, 2010, at 7:56 AM, Andrew Clegg wrote: I'm interested in near-dupe removal as mentioned (briefly) here: http://wiki.apache.org/solr/Deduplication However the link for TextProfileSignature hasn't been filled in yet. Does anyone have an example of using TextProfileSignature that

Re: Filtering near-duplicates using TextProfileSignature

2010-01-12 Thread Andrew Clegg
* http://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/update/processor/TextProfileSignature.java -- View this message in context: http://old.nabble.com/Filtering-near-duplicates-using-TextProfileSignature-tp27127151p27128173.html Sent from the Solr - User

Re: Filtering near-duplicates using TextProfileSignature

2010-01-12 Thread Erik Hatcher
On Jan 12, 2010, at 9:15 AM, Andrew Clegg wrote: Thanks Erik, but I'm still a little confused as to exactly where in the Solr config I set these parameters. You'd configure them within the processor element, something like this: str name=minTokenLen5/str The example on the wiki

Re: Filtering near-duplicates using TextProfileSignature

2010-01-12 Thread Andrew Clegg
-- it won't be til next week at the earliest though. Cheers, Andrew. -- View this message in context: http://old.nabble.com/Filtering-near-duplicates-using-TextProfileSignature-tp27127151p27128493.html Sent from the Solr - User mailing list archive at Nabble.com.