Here's my config for the updateProcessor. It not uses another signature method
but i've used TextProfileSignature as well and it works - sort of.
updateRequestProcessorChain name=dedupe
processor
class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory
bool
Well, it got me too! KMail didn't properly order this thread. Can't seem to
find Hatcher's reply anywhere. ??!!?
On Tuesday 08 June 2010 22:00:06 Andrew Clegg wrote:
Andrew Clegg wrote:
Re. your config, I don't see a minTokenLength in the wiki page for
deduplication, is this a recent
Markus Jelsma wrote:
Well, it got me too! KMail didn't properly order this thread. Can't seem
to
find Hatcher's reply anywhere. ??!!?
Whole thread here:
http://lucene.472066.n3.nabble.com/Filtering-near-duplicates-using-TextProfileSignature-tt479039.html
--
View this message
Thanks guys.
I will try this with some test documents, fingers crossed.
And by the way, I got the minTokenLen parameter from one of the thread
replies (from Erik).
Cheerz,
Ali
--
View this message in context:
http://lucene.472066.n3.nabble.com/Filtering-near-duplicates-using
://lucene.472066.n3.nabble.com/Filtering-near-duplicates-using-TextProfileSignature-tp479039p880044.html
Sent from the Solr - User mailing list archive at Nabble.com.
in context:
http://lucene.472066.n3.nabble.com/Filtering-near-duplicates-using-TextProfileSignature-tp479039p880385.html
Sent from the Solr - User mailing list archive at Nabble.com.
in the wiki?
Thanks!
Andrew.
--
View this message in context:
http://old.nabble.com/Filtering-near-duplicates-using-TextProfileSignature-tp27127151p27127151.html
Sent from the Solr - User mailing list archive at Nabble.com.
On Jan 12, 2010, at 7:56 AM, Andrew Clegg wrote:
I'm interested in near-dupe removal as mentioned (briefly) here:
http://wiki.apache.org/solr/Deduplication
However the link for TextProfileSignature hasn't been filled in yet.
Does anyone have an example of using TextProfileSignature that
*
http://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/update/processor/TextProfileSignature.java
--
View this message in context:
http://old.nabble.com/Filtering-near-duplicates-using-TextProfileSignature-tp27127151p27128173.html
Sent from the Solr - User
On Jan 12, 2010, at 9:15 AM, Andrew Clegg wrote:
Thanks Erik, but I'm still a little confused as to exactly where in
the Solr
config I set these parameters.
You'd configure them within the processor element, something like
this:
str name=minTokenLen5/str
The example on the wiki
-- it won't be til next week at the earliest though.
Cheers,
Andrew.
--
View this message in context:
http://old.nabble.com/Filtering-near-duplicates-using-TextProfileSignature-tp27127151p27128493.html
Sent from the Solr - User mailing list archive at Nabble.com.
11 matches
Mail list logo