Re: Using MLT feature

2011-04-08 Thread lboutros
It seems that tokens are sorted by frequencies : ... Collections.sort(profile, new TokenComparator()); ... and private static class TokenComparator implements ComparatorToken { public int compare(Token t1, Token t2) { return t2.cnt - t1.cnt; } and cnt is the token count.

RE: Using MLT feature

2011-04-08 Thread Frederico Azeiteiro
order comes from the way they are inserted in hashmap 'tokens' and not from the order the tokens appear on original text. Frederico -Original Message- From: lboutros [mailto:boutr...@gmail.com] Sent: sexta-feira, 8 de Abril de 2011 09:49 To: solr-user@lucene.apache.org Subject: Re: Using MLT

Re: Using MLT feature

2011-04-08 Thread lboutros
/SendEmail.jtp?type=nodenode=2794604i=1by-user=t Subject: Re: Using MLT feature It seems that tokens are sorted by frequencies : ... Collections.sort(profile, new TokenComparator()); ... and private static class TokenComparator implements ComparatorToken { public int compare(Token t1, Token

RE: Using MLT feature

2011-04-08 Thread Frederico Azeiteiro
10:11 To: solr-user@lucene.apache.org Subject: Re: Using MLT feature Couldn't you extend the TextProfileSignature and modify the TokenComparator class to use lexical order when token have the same frequency ? Ludovic. 2011/4/8 Frederico Azeiteiro [via Lucene] ml-node+2794604-1683988626-383...@n3

RE: Using MLT feature

2011-04-06 Thread Frederico Azeiteiro
...@openindex.io] Sent: terça-feira, 5 de Abril de 2011 15:20 To: solr-user@lucene.apache.org Cc: Frederico Azeiteiro Subject: Re: Using MLT feature If you check the code for TextProfileSignature [1] your'll notice the init method reading params. You can set those params as you did. Reading Javadoc

Re: Using MLT feature

2011-04-06 Thread Lance Norskog
Azeiteiro Subject: Re: Using MLT feature If you check the code for TextProfileSignature [1] your'll notice the init method reading params. You can set those params as you did. Reading Javadoc [2] might help as well. But what's not documented in the Javadoc is how QUANT is computed; it rounds

RE: Using MLT feature

2011-04-05 Thread Frederico Azeiteiro
using the TextProfileSignature with success? Thank you, Frederico -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: segunda-feira, 4 de Abril de 2011 16:47 To: solr-user@lucene.apache.org Cc: Frederico Azeiteiro Subject: Re: Using MLT feature Hi again, I

Re: Using MLT feature

2011-04-05 Thread Markus Jelsma
[mailto:frederico.azeite...@cision.com] Sent: segunda-feira, 4 de Abril de 2011 11:59 To: solr-user@lucene.apache.org Subject: RE: Using MLT feature Thank you Markus it looks great. But the wiki is not very detailed on this. Do you mean if I: 1. Create

RE: Using MLT feature

2011-04-05 Thread Frederico Azeiteiro
=minTokenLen5/str On the processor tag. Best regards, Frederico  -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: terça-feira, 5 de Abril de 2011 12:01 To: solr-user@lucene.apache.org Cc: Frederico Azeiteiro Subject: Re: Using MLT feature On Tuesday 05 April 2011

Re: Using MLT feature

2011-04-05 Thread Markus Jelsma
. Best regards, Frederico -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: terça-feira, 5 de Abril de 2011 12:01 To: solr-user@lucene.apache.org Cc: Frederico Azeiteiro Subject: Re: Using MLT feature On Tuesday 05 April 2011 12:19:33 Frederico

Re: Using MLT feature

2011-04-04 Thread Chris Fauerbach
Do you want to not index if something similar? Or don't index if exact. I would look into a hash code of the document if you don't want to index exact. Similar though, I think has to be based off a document in the index. On Apr 4, 2011, at 5:16, Frederico Azeiteiro

RE: Using MLT feature

2011-04-04 Thread Frederico Azeiteiro
- From: Chris Fauerbach [mailto:chris.fauerb...@gmail.com] Sent: segunda-feira, 4 de Abril de 2011 10:22 To: solr-user@lucene.apache.org Subject: Re: Using MLT feature Do you want to not index if something similar? Or don't index if exact. I would look into a hash code of the document if you

Re: Using MLT feature

2011-04-04 Thread Markus Jelsma
the MLT feature to find similar docs before adding to final index? Thanks, Frederico -Original Message- From: Chris Fauerbach [mailto:chris.fauerb...@gmail.com] Sent: segunda-feira, 4 de Abril de 2011 10:22 To: solr-user@lucene.apache.org Subject: Re: Using MLT feature Do you

RE: Using MLT feature

2011-04-04 Thread Frederico Azeiteiro
- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: segunda-feira, 4 de Abril de 2011 10:48 To: solr-user@lucene.apache.org Subject: Re: Using MLT feature http://wiki.apache.org/solr/Deduplication On Monday 04 April 2011 11:34:52 Frederico Azeiteiro wrote: Hi, The ideia is don't

RE: Using MLT feature

2011-04-04 Thread Frederico Azeiteiro
[mailto:frederico.azeite...@cision.com] Sent: segunda-feira, 4 de Abril de 2011 11:59 To: solr-user@lucene.apache.org Subject: RE: Using MLT feature Thank you Markus it looks great. But the wiki is not very detailed on this. Do you mean if I: 1. Create: updateRequestProcessorChain name=dedupe processor

Re: Using MLT feature

2011-04-04 Thread Markus Jelsma
: segunda-feira, 4 de Abril de 2011 11:59 To: solr-user@lucene.apache.org Subject: RE: Using MLT feature Thank you Markus it looks great. But the wiki is not very detailed on this. Do you mean if I: 1. Create: updateRequestProcessorChain name=dedupe processor class