Re: TF in MoreLikeThis

2022-06-01 Thread Petko Minkov
UTING.md. >> >> On Thu, Mar 31, 2022 at 11:46 PM Petko Minkov wrote: >> > >> > Hi, >> > >> > I was looking at Lucene's code for MoreLikeThis, specifically this line: >> > >> https://github.com/apache/lucene/blob/69b040

Re: TF in MoreLikeThis

2022-04-01 Thread Petko Minkov
contributing > guidelines here: > https://github.com/apache/lucene/blob/main/CONTRIBUTING.md. > > On Thu, Mar 31, 2022 at 11:46 PM Petko Minkov wrote: > > > > Hi, > > > > I was looking at Lucene's code for MoreLikeThis, specifically this l

Re: TF in MoreLikeThis

2022-04-01 Thread Adrien Grand
kov wrote: > > Hi, > > I was looking at Lucene's code for MoreLikeThis, specifically this line: > https://github.com/apache/lucene/blob/69b040fc6292ac47d7f7fc8bc3b7fd601794e54b/lucene/queries/src/java/org/apache/lucene/queries/mlt/MoreLikeThis.java#L640 > > It looks like

TF in MoreLikeThis

2022-03-31 Thread Petko Minkov
Hi, I was looking at Lucene's code for MoreLikeThis, specifically this line: https://github.com/apache/lucene/blob/69b040fc6292ac47d7f7fc8bc3b7fd601794e54b/lucene/queries/src/java/org/apache/lucene/queries/mlt/MoreLikeThis.java#L640 It looks like in ClassicSimilarity, TF is a square root

Re: Tuning MoreLikeThis scoring algorithm

2021-06-01 Thread TK Solr
doesn't work the way you think. Don't try to interpret it as an absolute value, it is a relative one. On Fri, May 28, 2021 at 1:36 PM TK Solr wrote: I'd like to have suggestions on changing the scoring algorithm of MoreLikeThis. When I feed the identical string as the content of a document

Re: Tuning MoreLikeThis scoring algorithm

2021-05-28 Thread Robert Muir
wrote: > > I'd like to have suggestions on changing the scoring algorithm > of MoreLikeThis. > > When I feed the identical string as the content of a document in the index > to MoreLikeThis.like("field", new StringReader(docContent)), > I get a score less than 1.0 (0.944

Tuning MoreLikeThis scoring algorithm

2021-05-28 Thread TK Solr
I'd like to have suggestions on changing the scoring algorithm of MoreLikeThis. When I feed the identical string as the content of a document in the index to MoreLikeThis.like("field", new StringReader(docContent)), I get a score less than 1.0 (0.944 in one of my test cases) that I exp

Search similar documents using dense vectors (alternative to MORELIKETHIS)

2016-02-24 Thread Jan Rygl
documents. We have documents represented by both texts and float vectors. We would like to be able to search similar documents to a given document using a document vector (and not to convert document to query like MORELIKETHIS). There is a vector encoding to text technique, but it is not very

Re: Filtering MoreLikeThis results

2015-01-09 Thread Tomoko Uchida
Hi, find me the 10 most similar documents I suppose you mean mlt.count supported by MoreLikeThisComponent. https://cwiki.apache.org/confluence/display/solr/MoreLikeThis MLT is ordinary search in Lucene, so you get documents in order of similarity (default scoring criteria) and can limit result

Filtering MoreLikeThis results

2015-01-08 Thread chrisbamford
Hi, I was wondering if Lucene supports applying a filter to an MLT search? I believe that Solr can do it, but I'm not sure if Lucene can .. A possible use case is find me the 10 most similar documents to X created in the last month. Thanks - Chris

Lucene 4.0: Use of filter queries and filter caches in MoreLikeThis -- how do you set up the query for the MoreLikeThisHandler?

2013-02-20 Thread Misty Nodine
I am trying to do a filtered MoreLikeThis query. For example, say I want to do a MoreLikeThis query only on books written in 1998. My understanding is that in order to do this, I need to use the MoreLikeThisHandler. How do you fold together the query part and the more like this part

Lucene-MoreLikethis

2013-01-15 Thread Thomas Keller
Hey, I have a question about MoreLikeThis in Lucene, Java. I built up an index and want to find similar documents. But I always get no results for my query, mlt.like(1) is always empty. Can anyone find my mistake? Here is an example. (I use Lucene 4.0) public class HelloLucene { public

Re: Lucene-MoreLikethis

2013-01-15 Thread Jack Krupansky
There are lots of parameters you can adjust, but the defaults essentially assume that you have a fairly large corpus and aren't interested in low-frequency terms. So, try MoreLikeThis#setMinDocFreq. The default is 5. You don't have any terms in your example with a doc freq over 2. Also, try

MoreLikeThis and TermVector relationship

2011-10-24 Thread Saurabh Gokhale
=true / If termVectors are not stored, MoreLikeThis will generate terms from stored fields Now since I am using lucene and not Solr, I will ask question from Lucene point of view: 1. What is the difference between the below 2 index statements. As per my understanding first one does not store

RE: MoreLikeThis Interface changes

2011-09-26 Thread Scott Smith
] Sent: Wednesday, September 21, 2011 6:59 PM To: java-user@lucene.apache.org Subject: Re: MoreLikeThis Interface changes On Wed, Sep 21, 2011 at 5:17 PM, Scott Smith ssm...@mainstreamdata.com wrote: I'm updating my lucene code from 3.0 to 3.4.  There's a change in the MLT interface I'm

Re: MoreLikeThis Interface changes

2011-09-26 Thread Robert Muir
On Mon, Sep 26, 2011 at 2:06 PM, Scott Smith ssm...@mainstreamdata.com wrote: is is the input stream.  Did I miss something in your response? Yes, this is totally unrelated to fields[]. it has to do with which fieldname is passed to the analyzer to analyze the reader into tokens (and there

RE: MoreLikeThis Interface changes

2011-09-26 Thread Scott Smith
OK. Thanks -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Monday, September 26, 2011 12:15 PM To: java-user@lucene.apache.org Subject: Re: MoreLikeThis Interface changes On Mon, Sep 26, 2011 at 2:06 PM, Scott Smith ssm...@mainstreamdata.com wrote

RE: MoreLikeThis Interface changes

2011-09-22 Thread Scott Smith
Understand. Thanks for the information. -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Wednesday, September 21, 2011 6:59 PM To: java-user@lucene.apache.org Subject: Re: MoreLikeThis Interface changes On Wed, Sep 21, 2011 at 5:17 PM, Scott Smith ssm

Re: MoreLikeThis Interface changes

2011-09-21 Thread Robert Muir
) analyze content differently according to different fields. Previously, MoreLikeThis would use what was in the setFieldNames parameter, iteratively like this: for (field : fieldNames) { analyzer.analyze(field, reader); } However, MoreLikeThis also had a bug where it would never close() the reader

found workaround: Query on using Payload with MoreLikeThis class

2011-05-11 Thread Saurabh Gokhale
Hi All, I am not sure if any one got chance to go over my question (below). The question was to check if I can modify MoreLikeThis.like() result using index time boosting. I have found a work around as there is no easy way to influence MoreLikeThis result using index time payload value

Query on using Payload with MoreLikeThis class

2011-05-10 Thread Saurabh Gokhale
Hi, In the Lucene 2.9.4 project, there is a requirement to boost some of the keywords in the document using payload. Now while searching, is there a way I can boost the MoreLikeThis result using the index time payload values? Or can I merge MoreLikeThis output and PayloadTermQuery output

Question on the use of Synonym Filter while searching using MoreLikeThis

2011-05-09 Thread Saurabh Gokhale
-- StandardFilter -- LowerCaseFilter -- StopFilter -- PorterStemFilter And while searching using MoreLikeThis I am using analyzer similar to the previous one but with addition of synonym filter [Analyzer2] == StandardTokenizer -- StandardFilter -- LowerCaseFilter -- StopFilter -- SynonymFilter

Question on the Synonym Filter use while searching with MoreLikeThis

2011-05-08 Thread Saurabh Gokhale
-- StandardFilter -- LowerCaseFilter -- StopFilter -- PorterStemFilter And while searching using MoreLikeThis I am using analyzer similar to the previous one but with addition of synonym filter [Analyzer2] == StandardTokenizer -- StandardFilter -- LowerCaseFilter -- StopFilter -- SynonymFilter

Re: Regarding MoreLikeThis similarity Search

2011-03-19 Thread madhuri madhuri
Hi Koji, Thanks for your reply... It is working now by setting doc and term frequency. Regards, Madhu. From: Koji Sekiguchi k...@r.email.ne.jp To: java-user@lucene.apache.org Sent: Fri, 18 March, 2011 5:49:15 PM Subject: Re: Regarding MoreLikeThis similarity

Regarding MoreLikeThis similarity Search

2011-03-18 Thread madhuri_1820
Hi, I am new to lucene ... I have a question while implementing similarity search using MoreLikeThis query. I have written a small program but it is not giving any results. In my index file I have both strored and unstored(analyzed) fields. Sample Code : IndexReader ir = IndexReader.open

Re: Regarding MoreLikeThis similarity Search

2011-03-18 Thread Koji Sekiguchi
(11/03/19 6:16), madhuri_1...@yahoo.com wrote: Hi, I am new to lucene ... I have a question while implementing similarity search using MoreLikeThis query. I have written a small program but it is not giving any results. In my index file I have both strored and unstored(analyzed) fields

java.lang.NoClassDefFoundError:org/apache/lucene/search/similar/MoreLikeThis

2010-12-07 Thread starz10de
Hi All, I am using MoreLikeThis class in lucene to find more similar documents in the index to the giving one. It works fine when I run it directly from Eclipse but when I call it from my servlet I have this error: “java.lang.NoClassDefFoundError:org/apache/lucene/search/similar/MoreLikeThis

Re: java.lang.NoClassDefFoundError:org/apache/lucene/search/similar/MoreLikeThis

2010-12-07 Thread Erick Erickson
It sounds like the jar containing the MoreLikeThis class in a place that your servlet can find it. It's in contrib, something like lucene-queriesversion.jar Best Erick On Tue, Dec 7, 2010 at 4:24 PM, starz10de farag_ah...@yahoo.com wrote: Hi All, I am using MoreLikeThis class in lucene

Re: java.lang.NoClassDefFoundError:org/apache/lucene/search/similar/MoreLikeThis

2010-12-07 Thread starz10de
Dear Erick , thanks a lot, I placed the jar file in WEB-INF\lib and it works. best -- View this message in context: http://lucene.472066.n3.nabble.com/java-lang-NoClassDefFoundError-org-apache-lucene-search-similar-MoreLikeThis-tp2036296p2037181.html Sent from the Lucene - Java Users mailing

Re: support for PayloadTermQuery in MoreLikeThis

2009-09-10 Thread Grant Ingersoll
On Sep 9, 2009, at 4:39 PM, Bill Au wrote: Has anyone done anything regarding the support of PayloadTermQuery in MoreLikeThis? Not yet! Sounds interesting I took a quick look at the code and it seems to be simply a matter of swapping TermQuery with PayloadTermQuery. I guess a generic

support for PayloadTermQuery in MoreLikeThis

2009-09-09 Thread Bill Au
Has anyone done anything regarding the support of PayloadTermQuery in MoreLikeThis? I took a quick look at the code and it seems to be simply a matter of swapping TermQuery with PayloadTermQuery. I guess a generic solution would be to add a enable method to enable PayloadTermQuery, keeping

Re: Question: Lucene MoreLikeThis score values all the same:

2008-09-02 Thread Chris Hostetter
: 1. Looking at the hits, they have the same score. I'd expect them to be : different, based on their relevance to the source document. Any ideas? ... : This is my output. I can paste my source code in too if needed. The output of arbitrary secret code isn't really a very useful for the

Re: MoreLikeThis return no results

2008-09-01 Thread davood
and si correct but morelikethis return no result for a given document id. What am I missing? mark harwood wrote: MoreLikeThis needs to find the terms in your doc. It tries to do this by using TermFreqVectors which are stored in the index if you choose to add them at index-time. If you haven't

Re: MoreLikeThis return no results

2008-09-01 Thread Marcelo Ochoa
Hi Dave: MoreLikeThis object has two parameters which controls his functionality: mlt.setMinTermFreq(minTermFreq.intValue()); mlt.setMinDocFreq(minDocFreq.intValue()); By default MinTermFreq is 2, so if your document has no terms with freq greater than 2 will return a query

Re: MoreLikeThis return no results

2008-09-01 Thread mark harwood
MoreLikeThis essentially shortlists a large list of terms (found in example text or an existing doc) and uses them in a query. To see what terms have been shortlisted try calling query.rewrite(reader) and then call toString() or extractTerms. If this reveals no terms try using a debugger which

Re: MoreLikeThis return no results

2008-09-01 Thread davood
Thanks so much for hints, now it works correctly, the problem was with mlt.setMinTermFreq. Many thanks. -- View this message in context: http://www.nabble.com/Re%3A-MoreLikeThis-return-no-results-tp19230763p19256118.html Sent from the Lucene - Java Users mailing list archive at Nabble.com

MoreLikeThis return no results

2008-08-30 Thread davood
Hi, I'm trying to get MoreLikeThis working but it just returns no results. I have lucene working for normal queries and indexing but MoreLikeThis Just returns nothing. This is what I'm trying IndexReader reader = IndexReader.open(INDEX_PATH); IndexSearcher searcher = new IndexSearcher

Re: Re: MoreLikeThis return no results

2008-08-30 Thread tom
AUTOMATIC REPLY Tom Roberts is out of the office till 2nd September 2008. LUX reopens on 1st September 2008 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: MoreLikeThis return no results

2008-08-30 Thread mark harwood
MoreLikeThis needs to find the terms in your doc. It tries to do this by using TermFreqVectors which are stored in the index if you choose to add them at index-time. If you haven't done this then it will fall back to reanalysing the content of the document usings an analyser (despite what

Question: Lucene MoreLikeThis score values all the same:

2008-08-25 Thread vinay b
As a test, I tried to compare a few documents on various topics (a few on linux, and another on the U.S. constitution) to a source document on linux using a query formed by MoreLikeThis. 1. Looking at the hits, they have the same score. I'd expect them to be different, based on their relevance

MoreLikeThis from a field with a specific value

2008-07-15 Thread martinoleary
Hi there... im trying to get MoreLikeThis documents from my lucene index given a sentence... just one line of text lets say... but i also want to get the returned results only where a field has a specific value so for example if i have my index and it contains a categoryId and content

Re: MoreLikeThis from a field with a specific value

2008-07-15 Thread Daniel Noll
martinoleary wrote: Hi there... im trying to get MoreLikeThis documents from my lucene index given a sentence... just one line of text lets say... but i also want to get the returned results only where a field has a specific value so for example if i have my index and it contains

Re: MoreLikeThis over a subset of documents

2008-04-23 Thread Karl Wettin
Jonathan Ariel skrev: Smart idea, but it won't help me. I have almost 50 categories and eventually I would like to filter not just on category but maybe also on language, etc. Karl: what do you mean by measure the distance between the term vectors and cluster them in real time? I mean exactly

Re: MoreLikeThis over a subset of documents

2008-04-23 Thread Jonathan Ariel
MoreLikeThis to receive a set of term frequencies, instead of an IndexReader, and use that to do all the process. Anyone knows if a document contains for his fields the term frequencies? On Wed, Apr 23, 2008 at 7:46 AM, Karl Wettin [EMAIL PROTECTED] wrote: Jonathan Ariel skrev: Smart idea

Re: MoreLikeThis over a subset of documents

2008-04-23 Thread Karl Wettin
there. In that case I could change MoreLikeThis to receive a set of term frequencies, instead of an IndexReader, and use that to do all the process. That would probably not be too speedy. Anyone knows if a document contains for his fields the term frequencies? When adding a field to a document you can specify

MoreLikeThis patch to support boost factor

2008-04-23 Thread Jonathan Ariel
This is a patch I made to be able to boost the terms with a specific factor beside the relevancy returned by MoreLikeThis. This is helpful when having more then 1 MoreLikeThis in the query, so words in the field A (i.e. Title) can be boosted more than words in the field B (i.e. Description). Any

MoreLikeThis over a subset of documents

2008-04-22 Thread Jonathan Ariel
Is there any way to execute a MoreLikeThis over a subset of documents? I need to retrieve a set of interesting keywords from a subset of documents and not the entire index (imagine that my index has documents categorized as A, B and C and I just want to work with those categorized as A). Right now

Re: MoreLikeThis over a subset of documents

2008-04-22 Thread Glen Newton
Instead of this: MoreLikeThis mlt = new MoreLikeThis(ir); Reader target = ... // orig source of doc you want to find similarities to Query query = mlt.like( target); Hits hits = is.search(query); do this: MoreLikeThis mlt = new MoreLikeThis(ir); Reader target = ... // orig source of doc you

Re: MoreLikeThis over a subset of documents

2008-04-22 Thread Jonathan Ariel
But that doesn't help me with my problem, because the interesting terms are taken from the entire index and not a subset as I need. On Tue, Apr 22, 2008 at 6:46 PM, Glen Newton [EMAIL PROTECTED] wrote: Instead of this: MoreLikeThis mlt = new MoreLikeThis(ir); Reader target = ... // orig

Re: MoreLikeThis over a subset of documents

2008-04-22 Thread Karl Wettin
Jonathan Ariel skrev: Is there any way to execute a MoreLikeThis over a subset of documents? I need to retrieve a set of interesting keywords from a subset of documents and not the entire index (imagine that my index has documents categorized as A, B and C and I just want to work with those

Re: MoreLikeThis over a subset of documents

2008-04-22 Thread Jonathan Ariel
I could have up to 2 million documents and growing. On Tue, Apr 22, 2008 at 7:29 PM, Karl Wettin [EMAIL PROTECTED] wrote: Jonathan Ariel skrev: Is there any way to execute a MoreLikeThis over a subset of documents? I need to retrieve a set of interesting keywords from a subset

Re: MoreLikeThis over a subset of documents

2008-04-22 Thread Glen Newton
Sorry, I misunderstood the problem. My mistake. While not optimal and rather expensive space-wise, you could have - in addition to existing keyword field - a field for each category. If the document being indexed is in category A, only add the text to the catA field. Now do MoreLikeThis on catA

Re: MoreLikeThis over a subset of documents

2008-04-22 Thread Jonathan Ariel
field. Now do MoreLikeThis on catA. This assumes you know the categories at index time, of course. Redundant but workable. -Glen 2008/4/22 Jonathan Ariel [EMAIL PROTECTED]: Is there any way to execute a MoreLikeThis over a subset of documents? I need to retrieve a set of interesting

MoreLikeThis jar doesn't contain classes

2008-02-22 Thread Jonathan Ariel
Hi, I've downloaded Lucene 2.3.0 binaries and in the contrib folder I can see the Similarity package, but inside the Jar there are no classes! Downloading the sources I ran into the same issue. Am I doing something wrong? Where should I get the MoreLikeThis classes from? Thanks! Jonathan

MoreLikeThis queries

2008-02-22 Thread Jonathan Ariel
Hi, I'm trying to use MoreLikeThis but I can't find how to make a MoreLikeThis query that will return related documents given a document and some conditions, like country field in the related documents should be 1, etc. Is there any documentation on how to do this kind of queries? Thanks

MoreLikeThis and setBoost

2007-11-20 Thread Donna L Gresh
I've been stepping through the contrib MoreLikeThis class and was wondering if people can give opinions on why you would or would not use setBoost(true) for the MoreLikeThis object. It seems a bit odd (at least to me) to boost the good terms in the query (based on the term's score), since

RE: MoreLikeThis across multiple fields question...

2007-10-22 Thread Chris Sizemore
: MoreLikeThis across multiple fields question... On Sunday 21 October 2007 17:21, Chris Sizemore wrote: i'm using MoreLikeThis. i'm trying to run the document comparison across more than one field in my index, but i'm not at all sure that it's actually happening -- when i examine the constructed query

MoreLikeThis across multiple fields question...

2007-10-21 Thread Chris Sizemore
hello-- i'm using MoreLikeThis. i'm trying to run the document comparison across more than one field in my index, but i'm not at all sure that it's actually happening -- when i examine the constructed query, only one field is mentioned! here's my code: FileReader reader = new FileReader

Re: MoreLikeThis across multiple fields question...

2007-10-21 Thread Daniel Naber
On Sunday 21 October 2007 17:21, Chris Sizemore wrote: i'm using MoreLikeThis. i'm trying to run the document comparison across more than one field in my index, but i'm not at all sure that it's actually happening -- when i examine the constructed query, only one field is mentioned! here's my

Re: MoreLikeThis across multiple fields question...

2007-10-21 Thread Daniel Naber
On Sunday 21 October 2007 17:21, Chris Sizemore wrote: i'm using MoreLikeThis. i'm trying to run the document comparison across more than one field in my index, but i'm not at all sure that it's actually happening -- when i examine the constructed query, only one field is mentioned! here's my

MoreLikeThis and stopword stemming

2007-10-10 Thread Donna L Gresh
What is the appropriate way of achieving both stopwords and stemming of stopwords when the MoreLikeThis class is used? My analyzer (MoreLikeThis.setAnalyzer) uses the Snowball filter, and is initialized with a stopwords set: analyzer = new StandardAnalyzer(stopwords) { public

Re: MoreLikeThis for multiple documents

2007-07-26 Thread Grant Ingersoll
I have some sample code for doing relevance feedback across multiple documents at http://www.cnlp.org/apachecon2005 It could be modified to provide more of the MoreLikeThis functionality (i.e. determining important terms via tf/idf) for now it just takes the top X terms -Grant On Jul 25

Re: MoreLikeThis for multiple documents

2007-07-26 Thread Mathieu Lecarme
), or maximizing tf.idf (as is done in MoreLikeThis). Is there anything like this already implemented, or do I need to iterate through all documents in the set manually, re-tokenize each one (or maybe use TermVectors), and then calculate the weight for each term? http://project.carrot2.org

MoreLikeThis for multiple documents

2007-07-25 Thread Jens Grivolla
(as is done in MoreLikeThis). Is there anything like this already implemented, or do I need to iterate through all documents in the set manually, re-tokenize each one (or maybe use TermVectors), and then calculate the weight for each term? Thanks, Jens

Re: MoreLikeThis

2007-07-18 Thread Akanksha Baid
Right , I was making a silly mistake there. I have it working now. Thanks for the reply. yu wrote: You can put lucene-queries-2.2.0.jar on your class path or your Eclipse project build path. That's all you need. Jay Akanksha Baid wrote: I am using Lucene 2.1.0 and want to use MoreLikeThis

Re: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread mark harwood
I need this comparison to be case-insensitive The choice of case-sensitivity (and preservation of punctuation, numbers etc etc) is controlled by your choice of analyzer that you pass to MoreLikeThis. If you want to ensure your list of stop words adheres to the same logic - use the same

RE: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread Jong Kim
- From: mark harwood [mailto:[EMAIL PROTECTED] Sent: Monday, July 09, 2007 5:01 AM To: java-user@lucene.apache.org Subject: Re: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project I need this comparison to be case-insensitive The choice of case-sensitivity

Re: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread mark harwood
-insensitive fashion? - Original Message From: Jong Kim [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Monday, 9 July, 2007 3:00:05 PM Subject: RE: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project My application stores term vectors with the index

RE: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread Jong Kim
to a product requirement, no token is thrown away at the time of indexing, that is, no stopwords filtering at indexing time. However, when executing MoreLikeThis feature, we do use a stopwords list (the fact that we indexed each and every word does not mean that they have to be included in the execution

Re: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread mark harwood
OK. I can see the logic that says it might be useful/convenient to filter case-sensitive search terms using a case-insensitive list of stop words. What seems slightly odd is that you want exactness in the choice of case yet are using an imprecise matching technique (MoreLikeThis) - effectively

RE: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread Jong Kim
is used for MoreLikeThis function. 2.2 Admin search - this is more like raw index lookup than typical end-user search, can include stop words in the search terms. The point here is that, the case matters only for those words that should be included. For the words we do not want included in the end

Re: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread Erick Erickson
- 2.1 End User search - stop word filtering is done on the search terms, the same stop word list is used for MoreLikeThis function. 2.2 Admin search - this is more like raw index lookup than typical end-user search, can include stop words in the search terms. The point here is that, the case matters

RE: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread Jong Kim
in MoreLikeThis class in Lucene's contrib/queries project the case matters only for those words that should be included. Jong, just want to check we're on the same page - you do know MoreLikeThis has a kind of automatic Stop-Wording built in , yes? MoreLikeThis looks at the document frequency

Re: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-08 Thread Chris Hostetter
: I need this comparison to be case-insensitive, but I don't see any way of : achieving it by extending this class. I would have created a subclass of : MoreLikeThis and override the isNoiseWord() method. However, the problem is : that, neither isNoiseWord() method nor the instance variables

MoreLikeThis API changes?

2007-05-30 Thread Ryan McKinley
I'm trying to build a custom MoreLikeThis implementation that will run within solr and I've run into a few API hurdles... 1. Can MLT.java be modified to optionally take the Similarity implementation in the constructor? Currently it is hardcoded to: private Similarity similarity = new

Re: MoreLikeThis API changes?

2007-05-30 Thread Grant Ingersoll
On May 30, 2007, at 2:45 AM, Ryan McKinley wrote: I'm trying to build a custom MoreLikeThis implementation that will run within solr and I've run into a few API hurdles... 1. Can MLT.java be modified to optionally take the Similarity implementation in the constructor? Currently

Re: MoreLikeThis API changes?

2007-05-30 Thread Ryan McKinley
2. Do retrieveTerms(int docNum) and createQuery(PriorityQueue q) need to be private? Can they be public? If not public, could they at least be protected? I would think protected would be fine, what is your case for it being public? From the solr RequestHandler, I want to return the

Re: MoreLikeThis API changes?

2007-05-30 Thread mark harwood
I want to return the interesting terms used for MLT Could you do this using Query.extractTerms() on the rewritten version of the MoreLikeThis query (a BooleanQuery)? Mark - Original Message From: Ryan McKinley [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Wednesday, 30 May

Re: MoreLikeThis API changes?

2007-05-30 Thread Ryan McKinley
mark harwood wrote: I want to return the interesting terms used for MLT Could you do this using Query.extractTerms() on the rewritten version of the MoreLikeThis query (a BooleanQuery)? thanks! that works and avoids the PriorityQueue traverstal problems. I can even get the boost

Re: MoreLikeThis?

2007-05-23 Thread Donna L Gresh
To java-user@lucene.apache.org cc Subject Re: MoreLikeThis? Donna, this is what you need to do to get the jar, and after that you just use MLT according to its API. $ cd lucene-trunk otis:~/dev/workspace/lucene-trunk otis$ cd contrib/queries/ otis:~/dev/workspace/lucene-trunk/contrib

MoreLikeThis?

2007-05-22 Thread Donna L Gresh
Hello, I'm sorry if this is a naive question, but I have implemented my own MoreLikeThis functionality, and in re-reading the FAQ saw that it looks like something like this is already built, so I wanted to try it out and see if it would simplify my code: How do I find similar documents? See

Re: MoreLikeThis?

2007-05-22 Thread Otis Gospodnetic
- Original Message From: Donna L Gresh [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Tuesday, May 22, 2007 2:09:55 PM Subject: MoreLikeThis? Hello, I'm sorry if this is a naive question, but I have implemented my own MoreLikeThis functionality, and in re-reading the FAQ saw

Re: searching by field's TF vector (not MoreLikeThis)

2007-02-03 Thread Brian Whitman
On Feb 1, 2007, at 7:13 PM, Brian Whitman wrote: I'm looking for a way to search by a field's internal TF vector representation. MoreLikeThis does not seem to be what I want-- it constructs a text query based on the top scoring TF-IDF terms. I want to query by TF vector directly

searching by field's TF vector (not MoreLikeThis)

2007-02-01 Thread Brian Whitman
I'm looking for a way to search by a field's internal TF vector representation. MoreLikeThis does not seem to be what I want-- it constructs a text query based on the top scoring TF-IDF terms. I want to query by TF vector directly, bypassing the tokens. Lucene understandably has

Re: Restrict result returned by Morelikethis

2006-12-24 Thread Nick Snels
(the approach you are taking towards your goal is sound by the way) : Date: Sat, 23 Dec 2006 20:41:18 +0100 : From: Nick Snels [EMAIL PROTECTED] : Reply-To: java-user@lucene.apache.org : To: java-user@lucene.apache.org : Subject: Restrict result returned by Morelikethis : : Hi, : : I have made a Morelikethis

Restrict result returned by Morelikethis

2006-12-23 Thread Nick Snels
Hi, I have made a Morelikethis query to look up documents that match a certain document id. This results in a search of the whole index. I would like the Morelikethis query to search only part of the index. How can I do this? I have already tried to create a BooleanQuery, like: BooleanQuery

Re: Restrict result returned by Morelikethis

2006-12-23 Thread Chris Hostetter
by Morelikethis : : Hi, : : I have made a Morelikethis query to look up documents that match a certain : document id. This results in a search of the whole index. I would like the : Morelikethis query to search only part of the index. How can I do this? : : I have already tried to create

MoreLikeThis does not retrieve all terms when using like()

2006-09-28 Thread Hadas Cohen
Ever since I started using Lucene, I found all answers to all possible questions in the archive. But I need help about those ones. 1. I am using MoreLikeThis class, and cannot figure out why not all terms are retrieved when using like() to generate queries. I extract the terms from

Re: Problem finding similar documents with MoreLikeThis method.

2006-07-21 Thread Martin Braun
Hello, inspired by this thread, I also tried to implement a MoreLikeThis search. But I have the same Problem of a null query. I did set the Fieldname to a Field that is stored in the Index. But like just returns null. Here is my Code: Hits hits = this.is.search(new

Re: Problem finding similar documents with MoreLikeThis method.

2006-07-21 Thread mark harwood
Does your index use StandardAnalyzer? Are your fields stored (Field.Store.YES)? MoreLikeThis uses StandardAnalyzer by default to read the stored content from the example doc which may produce tokens that do not match those of the indexed content. Use setAnalyzer() to ensure they are in sync

Re: Problem finding similar documents with MoreLikeThis method.

2006-07-21 Thread Martin Braun
for the short description of a document) If I set the Fieldname to another Field (indexed with StandardAnalyzer) which is Indexed (but not Stored) it works if I use the like(StringReader ) Method but not with like(int docid). This Code works: MoreLikeThis mlt = new MoreLikeThis

PDF documents with MoreLikeThis class

2006-07-20 Thread Davide
Hi, I'm using MoreLikeThis class to find similar documents... but I'm not sure if it is correct to pass as argument a Pdf file to *MoreLikeThis.like()* method. Trying to be more clear: 1) In my Lucene index I add some PDF files (I use PDFBox to extract text and add fields to index) 2) Now I want

Re: PDF documents with MoreLikeThis class

2006-07-20 Thread mark harwood
: Thursday, 20 July, 2006 10:41:03 AM Subject: PDF documents with MoreLikeThis class Hi, I'm using MoreLikeThis class to find similar documents... but I'm not sure if it is correct to pass as argument a Pdf file to *MoreLikeThis.like()* method. Trying to be more clear: 1) In my Lucene index I add

Problem finding similar documents with MoreLikeThis method.

2006-07-19 Thread Davide
Hi, I used the method MoreLikeThis (in search.similar package) of Lucene to find similar documents, but the result is 0 documents also when I index more times the same document. I don't understand why the search doesn't work... Here I give you the code I used

Re: Problem finding similar documents with MoreLikeThis method.

2006-07-19 Thread mark harwood
on Cheers Mark - Original Message From: Davide [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Wednesday, 19 July, 2006 9:40:31 AM Subject: Problem finding similar documents with MoreLikeThis method. Hi, I used the method MoreLikeThis (in search.similar package) of Lucene to find

Re: Problem finding similar documents with MoreLikeThis method.

2006-07-19 Thread Davide
names you want to match on Cheers Mark I've tried but It still doesn't work. I've called the method: setFieldNames(new String[]{Field1, Field2, ...}) with Field1, Field2 the fields I used when I index the files but nothing *Query* is still empty and MoreLikeThis doesn't work... I don't think

Re: Problem finding similar documents with MoreLikeThis method.

2006-07-19 Thread mark harwood
if (fr != null){ System.out.println(Parsing FileReader: + fr); query = mlt.like(fr); Not clear from your code but fr isn't the same object as fileReader is it? If so, that could be positioned at the end of the file and MoreLikeThis would therefore read nothing. - Original Message

Re: Problem finding similar documents with MoreLikeThis method.

2006-07-19 Thread mark harwood
Does your index have only the one document? MoreLikeThis will only generate queries with terms that occur in more than minDocFreq (default setting is 5). This is to avoid the large overheads associated with searching for very common words in your example text. - Original Message