UTING.md.
>>
>> On Thu, Mar 31, 2022 at 11:46 PM Petko Minkov wrote:
>> >
>> > Hi,
>> >
>> > I was looking at Lucene's code for MoreLikeThis, specifically this line:
>> >
>> https://github.com/apache/lucene/blob/69b040
contributing
> guidelines here:
> https://github.com/apache/lucene/blob/main/CONTRIBUTING.md.
>
> On Thu, Mar 31, 2022 at 11:46 PM Petko Minkov wrote:
> >
> > Hi,
> >
> > I was looking at Lucene's code for MoreLikeThis, specifically this l
kov wrote:
>
> Hi,
>
> I was looking at Lucene's code for MoreLikeThis, specifically this line:
> https://github.com/apache/lucene/blob/69b040fc6292ac47d7f7fc8bc3b7fd601794e54b/lucene/queries/src/java/org/apache/lucene/queries/mlt/MoreLikeThis.java#L640
>
> It looks like
Hi,
I was looking at Lucene's code for MoreLikeThis, specifically this line:
https://github.com/apache/lucene/blob/69b040fc6292ac47d7f7fc8bc3b7fd601794e54b/lucene/queries/src/java/org/apache/lucene/queries/mlt/MoreLikeThis.java#L640
It looks like in ClassicSimilarity, TF is a square root
doesn't work the way you think. Don't try to
interpret it as an absolute value, it is a relative one.
On Fri, May 28, 2021 at 1:36 PM TK Solr wrote:
I'd like to have suggestions on changing the scoring algorithm
of MoreLikeThis.
When I feed the identical string as the content of a document
wrote:
>
> I'd like to have suggestions on changing the scoring algorithm
> of MoreLikeThis.
>
> When I feed the identical string as the content of a document in the index
> to MoreLikeThis.like("field", new StringReader(docContent)),
> I get a score less than 1.0 (0.944
I'd like to have suggestions on changing the scoring algorithm
of MoreLikeThis.
When I feed the identical string as the content of a document in the index
to MoreLikeThis.like("field", new StringReader(docContent)),
I get a score less than 1.0 (0.944 in one of my test cases) that I exp
documents.
We have documents represented by both texts and float vectors.
We would like to be able to search similar documents to a given document
using a document vector (and not to convert document to query like
MORELIKETHIS).
There is a vector encoding to text technique, but it is not very
Hi,
find me the 10 most similar documents
I suppose you mean mlt.count supported by MoreLikeThisComponent.
https://cwiki.apache.org/confluence/display/solr/MoreLikeThis
MLT is ordinary search in Lucene, so you get documents in order of
similarity (default scoring criteria) and can limit result
Hi,
I was wondering if Lucene supports applying a filter to an MLT search?
I believe that Solr can do it, but I'm not sure if Lucene can ..
A possible use case is find me the 10 most similar documents to X
created in the last month.
Thanks
- Chris
I am trying to do a filtered MoreLikeThis query. For example, say I want
to do a MoreLikeThis query only on books written in 1998. My
understanding is that in order to do this, I need to use the
MoreLikeThisHandler. How do you fold together the query part and the
more like this part
Hey,
I have a question about MoreLikeThis in Lucene, Java. I built up an index and
want to find similar documents. But I always get no results for my query,
mlt.like(1) is always empty. Can anyone find my mistake? Here is an example. (I
use Lucene 4.0)
public class HelloLucene {
public
There are lots of parameters you can adjust, but the defaults essentially
assume that you have a fairly large corpus and aren't interested in
low-frequency terms.
So, try MoreLikeThis#setMinDocFreq. The default is 5. You don't have any
terms in your example with a doc freq over 2.
Also, try
=true /
If termVectors are not stored, MoreLikeThis will generate terms from stored
fields
Now since I am using lucene and not Solr, I will ask question from Lucene
point of view:
1. What is the difference between the below 2 index statements. As per my
understanding first one does not store
]
Sent: Wednesday, September 21, 2011 6:59 PM
To: java-user@lucene.apache.org
Subject: Re: MoreLikeThis Interface changes
On Wed, Sep 21, 2011 at 5:17 PM, Scott Smith ssm...@mainstreamdata.com wrote:
I'm updating my lucene code from 3.0 to 3.4. There's a change in the MLT
interface I'm
On Mon, Sep 26, 2011 at 2:06 PM, Scott Smith ssm...@mainstreamdata.com wrote:
is is the input stream. Did I miss something in your response?
Yes, this is totally unrelated to fields[].
it has to do with which fieldname is passed to the analyzer to
analyze the reader into tokens (and there
OK. Thanks
-Original Message-
From: Robert Muir [mailto:rcm...@gmail.com]
Sent: Monday, September 26, 2011 12:15 PM
To: java-user@lucene.apache.org
Subject: Re: MoreLikeThis Interface changes
On Mon, Sep 26, 2011 at 2:06 PM, Scott Smith ssm...@mainstreamdata.com wrote
Understand. Thanks for the information.
-Original Message-
From: Robert Muir [mailto:rcm...@gmail.com]
Sent: Wednesday, September 21, 2011 6:59 PM
To: java-user@lucene.apache.org
Subject: Re: MoreLikeThis Interface changes
On Wed, Sep 21, 2011 at 5:17 PM, Scott Smith ssm
) analyze
content differently according to different fields.
Previously, MoreLikeThis would use what was in the setFieldNames
parameter, iteratively like this:
for (field : fieldNames) {
analyzer.analyze(field, reader);
}
However, MoreLikeThis also had a bug where it would never close() the
reader
Hi All,
I am not sure if any one got chance to go over my question (below).
The question was to check if I can modify MoreLikeThis.like() result
using index time boosting.
I have found a work around as there is no easy way to influence MoreLikeThis
result using index time payload value
Hi,
In the Lucene 2.9.4 project, there is a requirement to boost some of the
keywords in the document using payload.
Now while searching, is there a way I can boost the MoreLikeThis result
using the index time payload values?
Or can I merge MoreLikeThis output and PayloadTermQuery output
-- StandardFilter -- LowerCaseFilter
-- StopFilter -- PorterStemFilter
And while searching using MoreLikeThis I am using analyzer similar to the
previous one but with addition of synonym filter
[Analyzer2] == StandardTokenizer -- StandardFilter -- LowerCaseFilter
-- StopFilter -- SynonymFilter
-- StandardFilter -- LowerCaseFilter
-- StopFilter -- PorterStemFilter
And while searching using MoreLikeThis I am using analyzer similar to the
previous one but with addition of synonym filter
[Analyzer2] == StandardTokenizer -- StandardFilter -- LowerCaseFilter
-- StopFilter -- SynonymFilter
Hi Koji,
Thanks for your reply... It is working now by setting doc and term frequency.
Regards,
Madhu.
From: Koji Sekiguchi k...@r.email.ne.jp
To: java-user@lucene.apache.org
Sent: Fri, 18 March, 2011 5:49:15 PM
Subject: Re: Regarding MoreLikeThis similarity
Hi,
I am new to lucene ... I have a question while implementing similarity search
using MoreLikeThis query. I have written a small program but it is not giving
any results. In my index file I have both strored and unstored(analyzed)
fields.
Sample Code :
IndexReader ir = IndexReader.open
(11/03/19 6:16), madhuri_1...@yahoo.com wrote:
Hi,
I am new to lucene ... I have a question while implementing similarity search
using MoreLikeThis query. I have written a small program but it is not giving
any results. In my index file I have both strored and unstored(analyzed) fields
Hi All,
I am using MoreLikeThis class in lucene to find more similar documents in
the index to the giving one. It works fine when I run it directly from
Eclipse but when I call it from my servlet I have this error:
“java.lang.NoClassDefFoundError:org/apache/lucene/search/similar/MoreLikeThis
It sounds like the jar containing the MoreLikeThis class in a place that
your servlet
can find it. It's in contrib, something like lucene-queriesversion.jar
Best
Erick
On Tue, Dec 7, 2010 at 4:24 PM, starz10de farag_ah...@yahoo.com wrote:
Hi All,
I am using MoreLikeThis class in lucene
Dear Erick ,
thanks a lot, I placed the jar file in WEB-INF\lib and it works.
best
--
View this message in context:
http://lucene.472066.n3.nabble.com/java-lang-NoClassDefFoundError-org-apache-lucene-search-similar-MoreLikeThis-tp2036296p2037181.html
Sent from the Lucene - Java Users mailing
On Sep 9, 2009, at 4:39 PM, Bill Au wrote:
Has anyone done anything regarding the support of PayloadTermQuery in
MoreLikeThis?
Not yet! Sounds interesting
I took a quick look at the code and it seems to be simply a matter of
swapping TermQuery with PayloadTermQuery. I guess a generic
Has anyone done anything regarding the support of PayloadTermQuery in
MoreLikeThis?
I took a quick look at the code and it seems to be simply a matter of
swapping TermQuery with PayloadTermQuery. I guess a generic solution would
be to add a enable method to enable PayloadTermQuery, keeping
: 1. Looking at the hits, they have the same score. I'd expect them to be
: different, based on their relevance to the source document. Any ideas?
...
: This is my output. I can paste my source code in too if needed.
The output of arbitrary secret code isn't really a very useful for the
and si correct but
morelikethis return no result for a given document id.
What am I missing?
mark harwood wrote:
MoreLikeThis needs to find the terms in your doc. It tries to do this by
using TermFreqVectors which are stored in the index if you choose to add
them at index-time. If you haven't
Hi Dave:
MoreLikeThis object has two parameters which controls his functionality:
mlt.setMinTermFreq(minTermFreq.intValue());
mlt.setMinDocFreq(minDocFreq.intValue());
By default MinTermFreq is 2, so if your document has no terms with
freq greater than 2 will return a query
MoreLikeThis essentially shortlists a large list of terms (found in example
text or an existing doc) and uses them in a query.
To see what terms have been shortlisted try calling query.rewrite(reader) and
then call toString() or extractTerms.
If this reveals no terms try using a debugger which
Thanks so much for hints, now it works correctly, the problem was with
mlt.setMinTermFreq.
Many thanks.
--
View this message in context:
http://www.nabble.com/Re%3A-MoreLikeThis-return-no-results-tp19230763p19256118.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com
Hi,
I'm trying to get MoreLikeThis working but it just returns no results. I
have lucene working for normal queries and indexing but MoreLikeThis Just
returns nothing. This is what I'm trying
IndexReader reader = IndexReader.open(INDEX_PATH);
IndexSearcher searcher = new IndexSearcher
AUTOMATIC REPLY
Tom Roberts is out of the office till 2nd September 2008.
LUX reopens on 1st September 2008
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
MoreLikeThis needs to find the terms in your doc. It tries to do this by using
TermFreqVectors which are stored in the index if you choose to add them at
index-time. If you haven't done this then it will fall back to reanalysing the
content of the document usings an analyser (despite what
As a test, I tried to compare a few documents on various topics (a few on
linux, and another on the U.S. constitution) to a source document on linux
using a query formed by MoreLikeThis.
1. Looking at the hits, they have the same score. I'd expect them to be
different, based on their relevance
Hi there...
im trying to get MoreLikeThis documents from my lucene index given a
sentence... just one line of text lets say... but i also want to get the
returned results only where a field has a specific value
so for example if i have my index and it contains a categoryId and
content
martinoleary wrote:
Hi there...
im trying to get MoreLikeThis documents from my lucene index given a
sentence... just one line of text lets say... but i also want to get the
returned results only where a field has a specific value
so for example if i have my index and it contains
Jonathan Ariel skrev:
Smart idea, but it won't help me. I have almost 50 categories and eventually
I would like to filter not just on category but maybe also on language,
etc.
Karl: what do you mean by measure the distance between the term vectors and
cluster them in real time?
I mean exactly
MoreLikeThis to receive a set of term
frequencies, instead of an IndexReader, and use that to do all the process.
Anyone knows if a document contains for his fields the term frequencies?
On Wed, Apr 23, 2008 at 7:46 AM, Karl Wettin [EMAIL PROTECTED] wrote:
Jonathan Ariel skrev:
Smart idea
there.
In that case I could change MoreLikeThis to receive a set of term
frequencies, instead of an IndexReader, and use that to do all the process.
That would probably not be too speedy.
Anyone knows if a document contains for his fields the term frequencies?
When adding a field to a document you can specify
This is a patch I made to be able to boost the terms with a specific factor
beside the relevancy returned by MoreLikeThis. This is helpful when having
more then 1 MoreLikeThis in the query, so words in the field A (i.e. Title)
can be boosted more than words in the field B (i.e. Description).
Any
Is there any way to execute a MoreLikeThis over a subset of documents? I
need to retrieve a set of interesting keywords from a subset of documents
and not the entire index (imagine that my index has documents categorized as
A, B and C and I just want to work with those categorized as A). Right now
Instead of this:
MoreLikeThis mlt = new MoreLikeThis(ir);
Reader target = ... // orig source of doc you want to find similarities to
Query query = mlt.like( target);
Hits hits = is.search(query);
do this:
MoreLikeThis mlt = new MoreLikeThis(ir);
Reader target = ... // orig source of doc you
But that doesn't help me with my problem, because the interesting terms are
taken from the entire index and not a subset as I need.
On Tue, Apr 22, 2008 at 6:46 PM, Glen Newton [EMAIL PROTECTED] wrote:
Instead of this:
MoreLikeThis mlt = new MoreLikeThis(ir);
Reader target = ... // orig
Jonathan Ariel skrev:
Is there any way to execute a MoreLikeThis over a subset of documents? I
need to retrieve a set of interesting keywords from a subset of documents
and not the entire index (imagine that my index has documents categorized as
A, B and C and I just want to work with those
I could have up to 2 million documents and growing.
On Tue, Apr 22, 2008 at 7:29 PM, Karl Wettin [EMAIL PROTECTED] wrote:
Jonathan Ariel skrev:
Is there any way to execute a MoreLikeThis over a subset of documents? I
need to retrieve a set of interesting keywords from a subset
Sorry, I misunderstood the problem. My mistake.
While not optimal and rather expensive space-wise, you could have - in
addition to existing keyword field - a field for each category. If
the document being indexed is in category A, only add the text to the
catA field. Now do MoreLikeThis on catA
field. Now do MoreLikeThis on catA. This assumes you know the
categories at index time, of course.
Redundant but workable.
-Glen
2008/4/22 Jonathan Ariel [EMAIL PROTECTED]:
Is there any way to execute a MoreLikeThis over a subset of documents? I
need to retrieve a set of interesting
Hi,
I've downloaded Lucene 2.3.0 binaries and in the contrib folder I can see
the Similarity package, but inside the Jar there are no classes!
Downloading the sources I ran into the same issue.
Am I doing something wrong? Where should I get the MoreLikeThis classes
from?
Thanks!
Jonathan
Hi, I'm trying to use MoreLikeThis but I can't find how to make a
MoreLikeThis query that will return related documents given a document and
some conditions, like country field in the related documents should be 1,
etc.
Is there any documentation on how to do this kind of queries?
Thanks
I've been stepping through the contrib MoreLikeThis class and was
wondering if people can give opinions on why you would or would not use
setBoost(true) for the MoreLikeThis object. It seems a bit odd (at least
to me) to boost the good terms in the query (based on the term's score),
since
: MoreLikeThis across multiple fields question...
On Sunday 21 October 2007 17:21, Chris Sizemore wrote:
i'm using MoreLikeThis. i'm trying to run the document comparison across
more than one field in my index, but i'm not at all sure that it's
actually happening -- when i examine the constructed query
hello--
i'm using MoreLikeThis. i'm trying to run the document comparison across more
than one field in my index, but i'm not at all sure that it's actually
happening -- when i examine the constructed query, only one field is mentioned!
here's my code:
FileReader reader = new FileReader
On Sunday 21 October 2007 17:21, Chris Sizemore wrote:
i'm using MoreLikeThis. i'm trying to run the document comparison across
more than one field in my index, but i'm not at all sure that it's
actually happening -- when i examine the constructed query, only one
field is mentioned! here's my
On Sunday 21 October 2007 17:21, Chris Sizemore wrote:
i'm using MoreLikeThis. i'm trying to run the document comparison across
more than one field in my index, but i'm not at all sure that it's
actually happening -- when i examine the constructed query, only one
field is mentioned! here's my
What is the appropriate way of achieving both stopwords and stemming of
stopwords when the MoreLikeThis class is used? My analyzer
(MoreLikeThis.setAnalyzer) uses the Snowball filter, and is initialized
with a stopwords set:
analyzer = new StandardAnalyzer(stopwords) {
public
I have some sample code for doing relevance feedback across multiple
documents at http://www.cnlp.org/apachecon2005
It could be modified to provide more of the MoreLikeThis
functionality (i.e. determining important terms via tf/idf) for now
it just takes the top X terms
-Grant
On Jul 25
), or maximizing tf.idf (as is done in MoreLikeThis).
Is there anything like this already implemented, or do I need to
iterate through all documents in the set manually, re-tokenize each
one (or maybe use TermVectors), and then calculate the weight for each
term?
http://project.carrot2.org
(as is done in MoreLikeThis).
Is there anything like this already implemented, or do I need to iterate
through all documents in the set manually, re-tokenize each one (or
maybe use TermVectors), and then calculate the weight for each term?
Thanks,
Jens
Right , I was making a silly mistake there. I have it working now.
Thanks for the reply.
yu wrote:
You can put lucene-queries-2.2.0.jar on your class path or your
Eclipse project build path. That's all you need.
Jay
Akanksha Baid wrote:
I am using Lucene 2.1.0 and want to use MoreLikeThis
I need this comparison to be case-insensitive
The choice of case-sensitivity (and preservation of punctuation, numbers etc
etc) is controlled by your choice of analyzer that you pass to MoreLikeThis. If
you want to ensure your list of stop words adheres to the same logic - use the
same
-
From: mark harwood [mailto:[EMAIL PROTECTED]
Sent: Monday, July 09, 2007 5:01 AM
To: java-user@lucene.apache.org
Subject: Re: Stop-words comparison in MoreLikeThis class in Lucene's
contrib/queries project
I need this comparison to be case-insensitive
The choice of case-sensitivity
-insensitive fashion?
- Original Message
From: Jong Kim [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Monday, 9 July, 2007 3:00:05 PM
Subject: RE: Stop-words comparison in MoreLikeThis class in Lucene's
contrib/queries project
My application stores term vectors with the index
to a product requirement, no token is thrown away at the time of
indexing, that is, no stopwords filtering at indexing time.
However, when executing MoreLikeThis feature, we do use a stopwords list
(the fact that we indexed each and every word does not mean that they have
to be included in the execution
OK. I can see the logic that says it might be useful/convenient to filter
case-sensitive search terms using a case-insensitive list of stop words.
What seems slightly odd is that you want exactness in the choice of case yet
are using an imprecise matching technique (MoreLikeThis) - effectively
is used for MoreLikeThis function.
2.2 Admin search - this is more like raw index lookup than typical end-user
search, can include stop words in the search terms.
The point here is that, the case matters only for those words that should be
included. For the words we do not want included in the end
-
2.1 End User search - stop word filtering is done on the search terms, the
same stop word list is used for MoreLikeThis function.
2.2 Admin search - this is more like raw index lookup than typical
end-user
search, can include stop words in the search terms.
The point here is that, the case matters
in MoreLikeThis class in Lucene's
contrib/queries project
the case matters only for those words that should be included.
Jong, just want to check we're on the same page - you do know MoreLikeThis
has a kind of automatic Stop-Wording built in , yes?
MoreLikeThis looks at the document frequency
: I need this comparison to be case-insensitive, but I don't see any way of
: achieving it by extending this class. I would have created a subclass of
: MoreLikeThis and override the isNoiseWord() method. However, the problem is
: that, neither isNoiseWord() method nor the instance variables
I'm trying to build a custom MoreLikeThis implementation that will run
within solr and I've run into a few API hurdles...
1. Can MLT.java be modified to optionally take the Similarity
implementation in the constructor? Currently it is hardcoded to:
private Similarity similarity = new
On May 30, 2007, at 2:45 AM, Ryan McKinley wrote:
I'm trying to build a custom MoreLikeThis implementation that will
run within solr and I've run into a few API hurdles...
1. Can MLT.java be modified to optionally take the Similarity
implementation in the constructor? Currently
2. Do retrieveTerms(int docNum) and createQuery(PriorityQueue q) need
to be private? Can they be public? If not public, could they at
least be protected?
I would think protected would be fine, what is your case for it being
public?
From the solr RequestHandler, I want to return the
I want to return the interesting terms used for MLT
Could you do this using Query.extractTerms() on the rewritten version of the
MoreLikeThis query (a BooleanQuery)?
Mark
- Original Message
From: Ryan McKinley [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Wednesday, 30 May
mark harwood wrote:
I want to return the interesting terms used for MLT
Could you do this using Query.extractTerms() on the rewritten version of the
MoreLikeThis query (a BooleanQuery)?
thanks! that works and avoids the PriorityQueue traverstal problems. I
can even get the boost
To
java-user@lucene.apache.org
cc
Subject
Re: MoreLikeThis?
Donna, this is what you need to do to get the jar, and after that you just
use MLT according to its API.
$ cd lucene-trunk
otis:~/dev/workspace/lucene-trunk otis$ cd contrib/queries/
otis:~/dev/workspace/lucene-trunk/contrib
Hello,
I'm sorry if this is a naive question, but I have implemented my own
MoreLikeThis functionality, and
in re-reading the FAQ saw that it looks like something like this is
already built, so I wanted to try it out and see
if it would simplify my code:
How do I find similar documents?
See
- Original Message
From: Donna L Gresh [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Tuesday, May 22, 2007 2:09:55 PM
Subject: MoreLikeThis?
Hello,
I'm sorry if this is a naive question, but I have implemented my own
MoreLikeThis functionality, and
in re-reading the FAQ saw
On Feb 1, 2007, at 7:13 PM, Brian Whitman wrote:
I'm looking for a way to search by a field's internal TF vector
representation.
MoreLikeThis does not seem to be what I want-- it constructs a text
query based on the top scoring TF-IDF terms. I want to query by TF
vector directly
I'm looking for a way to search by a field's internal TF vector
representation.
MoreLikeThis does not seem to be what I want-- it constructs a text
query based on the top scoring TF-IDF terms. I want to query by TF
vector directly, bypassing the tokens.
Lucene understandably has
(the approach you are taking towards your goal is sound by the way)
: Date: Sat, 23 Dec 2006 20:41:18 +0100
: From: Nick Snels [EMAIL PROTECTED]
: Reply-To: java-user@lucene.apache.org
: To: java-user@lucene.apache.org
: Subject: Restrict result returned by Morelikethis
:
: Hi,
:
: I have made a Morelikethis
Hi,
I have made a Morelikethis query to look up documents that match a certain
document id. This results in a search of the whole index. I would like the
Morelikethis query to search only part of the index. How can I do this?
I have already tried to create a BooleanQuery, like:
BooleanQuery
by Morelikethis
:
: Hi,
:
: I have made a Morelikethis query to look up documents that match a certain
: document id. This results in a search of the whole index. I would like the
: Morelikethis query to search only part of the index. How can I do this?
:
: I have already tried to create
Ever since I started using Lucene, I found all answers to all possible
questions in the archive.
But I need help about those ones.
1. I am using MoreLikeThis class, and cannot figure out why not all
terms are retrieved when using like() to generate queries.
I extract the terms from
Hello,
inspired by this thread, I also tried to implement a MoreLikeThis
search. But I have the same Problem of a null query.
I did set the Fieldname to a Field that is stored in the Index.
But like just returns null.
Here is my Code:
Hits hits = this.is.search(new
Does your index use StandardAnalyzer? Are your fields stored (Field.Store.YES)?
MoreLikeThis uses StandardAnalyzer by default to read the stored content from
the example doc which may produce tokens that do not match those of the indexed
content. Use setAnalyzer() to ensure they are in sync
for the short description of a document)
If I set the Fieldname to another Field (indexed with StandardAnalyzer)
which is Indexed (but not Stored) it works if I use the
like(StringReader ) Method but not with like(int docid).
This Code works:
MoreLikeThis mlt = new MoreLikeThis
Hi,
I'm using MoreLikeThis class to find similar documents... but I'm not
sure if it is correct to pass as argument a Pdf file to
*MoreLikeThis.like()* method.
Trying to be more clear:
1) In my Lucene index I add some PDF files (I use PDFBox to extract text
and add fields to index)
2) Now I want
: Thursday, 20 July, 2006 10:41:03 AM
Subject: PDF documents with MoreLikeThis class
Hi,
I'm using MoreLikeThis class to find similar documents... but I'm not
sure if it is correct to pass as argument a Pdf file to
*MoreLikeThis.like()* method.
Trying to be more clear:
1) In my Lucene index I add
Hi,
I used the method MoreLikeThis (in search.similar package) of Lucene to
find similar documents, but the result is 0 documents also when I index
more times the same document. I don't understand why the search doesn't
work... Here I give you the code I used
on
Cheers
Mark
- Original Message
From: Davide [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Wednesday, 19 July, 2006 9:40:31 AM
Subject: Problem finding similar documents with MoreLikeThis method.
Hi,
I used the method MoreLikeThis (in search.similar package) of Lucene to
find
names you want
to match on
Cheers
Mark
I've tried but It still doesn't work. I've called the method:
setFieldNames(new String[]{Field1, Field2, ...}) with Field1,
Field2 the fields I used when I index the files but nothing *Query* is
still empty and MoreLikeThis doesn't work... I don't think
if (fr != null){
System.out.println(Parsing FileReader: + fr);
query = mlt.like(fr);
Not clear from your code but fr isn't the same object as fileReader is it?
If so, that could be positioned at the end of the file and MoreLikeThis would
therefore read nothing.
- Original Message
Does your index have only the one document?
MoreLikeThis will only generate queries with terms that occur in more than
minDocFreq (default setting is 5).
This is to avoid the large overheads associated with searching for very common
words in your example text.
- Original Message
98 matches
Mail list logo