subject:"MoreLikeThis"

Re: TF in MoreLikeThis

2022-06-01 Thread Petko Minkov

in/CONTRIBUTING.md. >> >> On Thu, Mar 31, 2022 at 11:46 PM Petko Minkov wrote: >> > >> > Hi, >> > >> > I was looking at Lucene's code for MoreLikeThis, specifically this line: >> > >> https://github.com/apache/lucene/blob/69b0

Re: TF in MoreLikeThis

2022-04-01 Thread Petko Minkov

can find contributing > guidelines here: > https://github.com/apache/lucene/blob/main/CONTRIBUTING.md. > > On Thu, Mar 31, 2022 at 11:46 PM Petko Minkov wrote: > > > > Hi, > > > > I was looking at Lucene's code for MoreLikeThis, spec

Re: TF in MoreLikeThis

2022-04-01 Thread Adrien Grand

kov wrote: > > Hi, > > I was looking at Lucene's code for MoreLikeThis, specifically this line: > https://github.com/apache/lucene/blob/69b040fc6292ac47d7f7fc8bc3b7fd601794e54b/lucene/queries/src/java/org/apache/lucene/queries/mlt/MoreLikeThis.java#L640 > > It looks like

TF in MoreLikeThis

2022-03-31 Thread Petko Minkov

Hi, I was looking at Lucene's code for MoreLikeThis, specifically this line: https://github.com/apache/lucene/blob/69b040fc6292ac47d7f7fc8bc3b7fd601794e54b/lucene/queries/src/java/org/apache/lucene/queries/mlt/MoreLikeThis.java#L640 It looks like in ClassicSimilarity, TF is a square root, b

Re: Tuning MoreLikeThis scoring algorithm

2021-06-01 Thread TK Solr

just doesn't work the way you think. Don't try to interpret it as an absolute value, it is a relative one. On Fri, May 28, 2021 at 1:36 PM TK Solr wrote: I'd like to have suggestions on changing the scoring algorithm of MoreLikeThis. When I feed the identical string as the content

Re: Tuning MoreLikeThis scoring algorithm

2021-05-28 Thread Robert Muir

PM TK Solr wrote: > > I'd like to have suggestions on changing the scoring algorithm > of MoreLikeThis. > > When I feed the identical string as the content of a document in the index > to MoreLikeThis.like("field", new StringReader(docContent)), > I get a score less t

Tuning MoreLikeThis scoring algorithm

2021-05-28 Thread TK Solr

I'd like to have suggestions on changing the scoring algorithm of MoreLikeThis. When I feed the identical string as the content of a document in the index to MoreLikeThis.like("field", new StringReader(docContent)), I get a score less than 1.0 (0.944 in one of my test cases) that

Search similar documents using dense vectors (alternative to MORELIKETHIS)

2016-02-24 Thread Jan Rygl

documents. We have documents represented by both texts and float vectors. We would like to be able to search similar documents to a given document using a document vector (and not to convert document to query like MORELIKETHIS). There is a vector encoding to text technique, but it is not very

Re: Filtering MoreLikeThis results

2015-01-09 Thread Tomoko Uchida

Hi, > find me the 10 most similar documents I suppose you mean "mlt.count" supported by MoreLikeThisComponent. https://cwiki.apache.org/confluence/display/solr/MoreLikeThis MLT is ordinary search in Lucene, so you get documents in order of similarity (default scoring criteria)

Filtering MoreLikeThis results

2015-01-08 Thread chrisbamford

Hi, I was wondering if Lucene supports applying a filter to an MLT search? I believe that Solr can do it, but I'm not sure if Lucene can .. A possible use case is "find me the 10 most similar documents to X created in the last month". Thanks - Chris --

Lucene 4.0: Use of filter queries and filter caches in MoreLikeThis -- how do you set up the query for the MoreLikeThisHandler?

2013-02-20 Thread Misty Nodine

I am trying to do a filtered MoreLikeThis query. For example, say I want to do a MoreLikeThis query only on books written in 1998. My understanding is that in order to do this, I need to use the MoreLikeThisHandler. How do you fold together the query part and the more like this part such that

Re: Lucene-MoreLikethis

2013-01-15 Thread Jack Krupansky

There are lots of parameters you can adjust, but the defaults essentially assume that you have a fairly large corpus and aren't interested in low-frequency terms. So, try MoreLikeThis#setMinDocFreq. The default is 5. You don't have any terms in your example with a doc freq over 2.

Lucene-MoreLikethis

2013-01-15 Thread Thomas Keller

Hey, I have a question about "MoreLikeThis" in Lucene, Java. I built up an index and want to find similar documents. But I always get no results for my query, mlt.like(1) is always empty. Can anyone find my mistake? Here is an example. (I use Lucene 4.0) public class HelloLucene {

MoreLikeThis and TermVector relationship

2011-10-24 Thread Saurabh Gokhale

, MoreLikeThis will generate terms from stored fields Now since I am using lucene and not Solr, I will ask question from Lucene point of view: 1. What is the difference between the below 2 index statements. As per my understanding first one does not store separate TermVector and second does. new

RE: MoreLikeThis Interface changes

2011-09-26 Thread Scott Smith

OK. Thanks -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Monday, September 26, 2011 12:15 PM To: java-user@lucene.apache.org Subject: Re: MoreLikeThis Interface changes On Mon, Sep 26, 2011 at 2:06 PM, Scott Smith wrote: > "is" is the input stream

Re: MoreLikeThis Interface changes

2011-09-26 Thread Robert Muir

On Mon, Sep 26, 2011 at 2:06 PM, Scott Smith wrote: > "is" is the input stream. Did I miss something in your response? > Yes, this is totally unrelated to fields[]. it has to do with which fieldname is passed to the analyzer to analyze the reader into tokens (and there can be only one for thi

RE: MoreLikeThis Interface changes

2011-09-26 Thread Scott Smith

riginal Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Wednesday, September 21, 2011 6:59 PM To: java-user@lucene.apache.org Subject: Re: MoreLikeThis Interface changes On Wed, Sep 21, 2011 at 5:17 PM, Scott Smith wrote: > I'm updating my lucene code from 3.0 to 3.4. There

RE: MoreLikeThis Interface changes

2011-09-22 Thread Scott Smith

Understand. Thanks for the information. -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Wednesday, September 21, 2011 6:59 PM To: java-user@lucene.apache.org Subject: Re: MoreLikeThis Interface changes On Wed, Sep 21, 2011 at 5:17 PM, Scott Smith wrote: >

Re: MoreLikeThis Interface changes

2011-09-21 Thread Robert Muir

he content) This is because some Analyzers (e.g. PerFieldAnalyzerWrapper) analyze content differently according to different fields. Previously, MoreLikeThis would use what was in the setFieldNames parameter, iteratively like this: for (field : fieldNames) { analyzer.analyze(field, reader); } Howeve

MoreLikeThis Interface changes

2011-09-21 Thread Scott Smith

I'm updating my lucene code from 3.0 to 3.4. There's a change in the MLT interface I'm confused about. I used the MLT.like(InputStream) method. It now appears I should change to the MLT.like(InputStreamReader, fieldname) method. Easy enough to create an InputStreamReader from an InputStream.

found workaround: Query on using Payload with MoreLikeThis class

2011-05-11 Thread Saurabh Gokhale

Hi All, I am not sure if any one got chance to go over my question (below). The question was to check if I can modify MoreLikeThis.like() result using index time boosting. I have found a work around as there is no easy way to influence MoreLikeThis result using index time payload value. The

Query on using Payload with MoreLikeThis class

2011-05-10 Thread Saurabh Gokhale

Hi, In the Lucene 2.9.4 project, there is a requirement to boost some of the keywords in the document using payload. Now while searching, is there a way I can boost the MoreLikeThis result using the index time payload values? Or can I merge MoreLikeThis output and PayloadTermQuery output

Question on the use of Synonym Filter while searching using MoreLikeThis

2011-05-09 Thread Saurabh Gokhale

--> StandardFilter --> LowerCaseFilter --> StopFilter --> PorterStemFilter And while searching using MoreLikeThis I am using analyzer similar to the previous one but with addition of synonym filter [Analyzer2] == StandardTokenizer --> StandardFilter --> LowerCaseFilter --> StopFil

Question on the Synonym Filter use while searching with MoreLikeThis

2011-05-08 Thread Saurabh Gokhale

--> StandardFilter --> LowerCaseFilter --> StopFilter --> PorterStemFilter And while searching using MoreLikeThis I am using analyzer similar to the previous one but with addition of synonym filter [Analyzer2] == StandardTokenizer --> StandardFilter --> LowerCaseFilter --> StopFil

Re: Regarding MoreLikeThis similarity Search

2011-03-19 Thread madhuri madhuri

Hi Koji, Thanks for your reply... It is working now by setting doc and term frequency. Regards, Madhu. From: Koji Sekiguchi To: java-user@lucene.apache.org Sent: Fri, 18 March, 2011 5:49:15 PM Subject: Re: Regarding MoreLikeThis similarity Search (11/03/19 6

Re: Regarding MoreLikeThis similarity Search

2011-03-18 Thread Koji Sekiguchi

(11/03/19 6:16), madhuri_1...@yahoo.com wrote: Hi, I am new to lucene ... I have a question while implementing similarity search using MoreLikeThis query. I have written a small program but it is not giving any results. In my index file I have both strored and unstored(analyzed) fields

Regarding MoreLikeThis similarity Search

2011-03-18 Thread madhuri_1820

Hi, I am new to lucene ... I have a question while implementing similarity search using MoreLikeThis query. I have written a small program but it is not giving any results. In my index file I have both strored and unstored(analyzed) fields. Sample Code : IndexReader ir = IndexReader.open

Re: java.lang.NoClassDefFoundError:org/apache/lucene/search/similar/MoreLikeThis

2010-12-07 Thread starz10de

Dear Erick , thanks a lot, I placed the jar file in WEB-INF\lib and it works. best -- View this message in context: http://lucene.472066.n3.nabble.com/java-lang-NoClassDefFoundError-org-apache-lucene-search-similar-MoreLikeThis-tp2036296p2037181.html Sent from the Lucene - Java Users mailing

Re: java.lang.NoClassDefFoundError:org/apache/lucene/search/similar/MoreLikeThis

2010-12-07 Thread Erick Erickson

It sounds like the jar containing the MoreLikeThis class in a place that your servlet can find it. It's in contrib, something like lucene-queries.jar Best Erick On Tue, Dec 7, 2010 at 4:24 PM, starz10de wrote: > > > > Hi All, > > I am using MoreLikeThis class in luce

java.lang.NoClassDefFoundError:org/apache/lucene/search/similar/MoreLikeThis

2010-12-07 Thread starz10de

Hi All, I am using MoreLikeThis class in lucene to find more similar documents in the index to the giving one. It works fine when I run it directly from Eclipse but when I call it from my servlet I have this error: “java.lang.NoClassDefFoundError:org/apache/lucene/search/similar/MoreLikeThis

Re: support for PayloadTermQuery in MoreLikeThis

2009-09-10 Thread Grant Ingersoll

On Sep 9, 2009, at 4:39 PM, Bill Au wrote: Has anyone done anything regarding the support of PayloadTermQuery in MoreLikeThis? Not yet! Sounds interesting I took a quick look at the code and it seems to be simply a matter of swapping TermQuery with PayloadTermQuery. I guess a generic

support for PayloadTermQuery in MoreLikeThis

2009-09-09 Thread Bill Au

Has anyone done anything regarding the support of PayloadTermQuery in MoreLikeThis? I took a quick look at the code and it seems to be simply a matter of swapping TermQuery with PayloadTermQuery. I guess a generic solution would be to add a enable method to enable PayloadTermQuery, keeping

Re: Question: Lucene MoreLikeThis score values all the same:

2008-09-02 Thread Chris Hostetter

: 1. Looking at the hits, they have the same score. I'd expect them to be : different, based on their relevance to the source document. Any ideas? ... : This is my output. I can paste my source code in too if needed. The output of arbitrary "secret" code isn't really a very useful for the

Re: MoreLikeThis return no results

2008-09-01 Thread davood

Thanks so much for hints, now it works correctly, the problem was with mlt.setMinTermFreq. Many thanks. -- View this message in context: http://www.nabble.com/Re%3A-MoreLikeThis-return-no-results-tp19230763p19256118.html Sent from the Lucene - Java Users mailing list archive at Nabble.com

Re: MoreLikeThis return no results

2008-09-01 Thread mark harwood

MoreLikeThis essentially shortlists a large list of terms (found in example text or an existing doc) and uses them in a query. To see what terms have been shortlisted try calling query.rewrite(reader) and then call toString() or extractTerms. If this reveals no terms try using a debugger which

Re: MoreLikeThis return no results

2008-09-01 Thread Marcelo Ochoa

Hi Dave: MoreLikeThis object has two parameters which controls his functionality: mlt.setMinTermFreq(minTermFreq.intValue()); mlt.setMinDocFreq(minDocFreq.intValue()); By default MinTermFreq is 2, so if your document has no terms with freq greater than 2 will return a query

Re: MoreLikeThis return no results

2008-09-01 Thread davood

tor exists and si correct but morelikethis return no result for a given document id. What am I missing? mark harwood wrote: > > MoreLikeThis needs to find the terms in your doc. It tries to do this by > using TermFreqVectors which are stored in the index if you choose to add > them

Re: MoreLikeThis return no results

2008-08-30 Thread mark harwood

MoreLikeThis needs to find the terms in your doc. It tries to do this by using TermFreqVectors which are stored in the index if you choose to add them at index-time. If you haven't done this then it will fall back to reanalysing the content of the document usings an analyser (despite wha

Re: Re: MoreLikeThis return no results

2008-08-29 Thread tom

AUTOMATIC REPLY Tom Roberts is out of the office till 2nd September 2008. LUX reopens on 1st September 2008 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: MoreLikeThis return no results

2008-08-29 Thread tom

AUTOMATIC REPLY Tom Roberts is out of the office till 2nd September 2008. LUX reopens on 1st September 2008 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

MoreLikeThis return no results

2008-08-29 Thread davood

Hi, I'm trying to get MoreLikeThis working but it just returns no results. I have lucene working for normal queries and indexing but MoreLikeThis Just returns nothing. This is what I'm trying IndexReader reader = IndexReader.open(INDEX_PATH); IndexSearcher searcher = new Ind

Question: Lucene MoreLikeThis score values all the same:

2008-08-25 Thread vinay b

As a test, I tried to compare a few documents on various topics (a few on linux, and another on the U.S. constitution) to a source document on linux using a query formed by MoreLikeThis. 1. Looking at the hits, they have the same score. I'd expect them to be different, based on their relevan

Re: MoreLikeThis from a field with a specific value

2008-07-15 Thread Daniel Noll

martinoleary wrote: Hi there... im trying to get MoreLikeThis documents from my lucene index given a sentence... just one line of text lets say... but i also want to get the returned results only where a field has a specific value so for example if i have my index and it contains a

MoreLikeThis from a field with a specific value

2008-07-15 Thread martinoleary

Hi there... im trying to get MoreLikeThis documents from my lucene index given a sentence... just one line of text lets say... but i also want to get the returned results only where a field has a specific value so for example if i have my index and it contains a categoryId and content

MoreLikeThis patch to support boost factor

2008-04-23 Thread Jonathan Ariel

This is a patch I made to be able to boost the terms with a specific factor beside the relevancy returned by MoreLikeThis. This is helpful when having more then 1 MoreLikeThis in the query, so words in the field A (i.e. Title) can be boosted more than words in the field B (i.e. Description). Any

Re: MoreLikeThis over a subset of documents

2008-04-23 Thread Karl Wettin

that case I could change MoreLikeThis to receive a set of term frequencies, instead of an IndexReader, and use that to do all the process. That would probably not be too speedy. Anyone knows if a document contains for his fields the term frequencies? When adding a field to a document you can sp

Re: MoreLikeThis over a subset of documents

2008-04-23 Thread Jonathan Ariel

hange MoreLikeThis to receive a set of term frequencies, instead of an IndexReader, and use that to do all the process. Anyone knows if a document contains for his fields the term frequencies? On Wed, Apr 23, 2008 at 7:46 AM, Karl Wettin <[EMAIL PROTECTED]> wrote: > Jonathan Ariel skrev: > &g

Re: MoreLikeThis over a subset of documents

2008-04-23 Thread Karl Wettin

Jonathan Ariel skrev: Smart idea, but it won't help me. I have almost 50 categories and eventually I would like to "filter" not just on category but maybe also on language, etc. Karl: what do you mean by measure the distance between the term vectors and cluster them in real time? I mean exactly

Re: MoreLikeThis over a subset of documents

2008-04-22 Thread Jonathan Ariel

in category A, only add the text to the > catA field. Now do MoreLikeThis on catA. This assumes you know the > categories at index time, of course. > Redundant but workable. > > -Glen > > 2008/4/22 Jonathan Ariel <[EMAIL PROTECTED]>: > > Is there any way to execute a M

Re: MoreLikeThis over a subset of documents

2008-04-22 Thread Glen Newton

Sorry, I misunderstood the problem. My mistake. While not optimal and rather expensive space-wise, you could have - in addition to existing keyword field - a field for each category. If the document being indexed is in category A, only add the text to the catA field. Now do MoreLikeThis on catA

Re: MoreLikeThis over a subset of documents

2008-04-22 Thread Jonathan Ariel

I could have up to 2 million documents and growing. On Tue, Apr 22, 2008 at 7:29 PM, Karl Wettin <[EMAIL PROTECTED]> wrote: > Jonathan Ariel skrev: > > Is there any way to execute a MoreLikeThis over a subset of documents? I > > need to retrieve a set of interesting keyw

Re: MoreLikeThis over a subset of documents

2008-04-22 Thread Karl Wettin

Jonathan Ariel skrev: Is there any way to execute a MoreLikeThis over a subset of documents? I need to retrieve a set of interesting keywords from a subset of documents and not the entire index (imagine that my index has documents categorized as A, B and C and I just want to work with those

Re: MoreLikeThis over a subset of documents

2008-04-22 Thread Jonathan Ariel

But that doesn't help me with my problem, because the interesting terms are taken from the entire index and not a subset as I need. On Tue, Apr 22, 2008 at 6:46 PM, Glen Newton <[EMAIL PROTECTED]> wrote: > Instead of this: > > MoreLikeThis mlt = new MoreLikeThis(ir); > Read

Re: MoreLikeThis over a subset of documents

2008-04-22 Thread Glen Newton

Instead of this: MoreLikeThis mlt = new MoreLikeThis(ir); Reader target = ... // orig source of doc you want to find similarities to Query query = mlt.like( target); Hits hits = is.search(query); do this: MoreLikeThis mlt = new MoreLikeThis(ir); Reader target = ... // orig source of doc you

MoreLikeThis over a subset of documents

2008-04-22 Thread Jonathan Ariel

Is there any way to execute a MoreLikeThis over a subset of documents? I need to retrieve a set of interesting keywords from a subset of documents and not the entire index (imagine that my index has documents categorized as A, B and C and I just want to work with those categorized as A). Right now

MoreLikeThis queries

2008-02-22 Thread Jonathan Ariel

Hi, I'm trying to use MoreLikeThis but I can't find how to make a MoreLikeThis query that will return related documents given a document and some conditions, like country field in the related documents should be 1, etc. Is there any documentation on how to do this kind of querie

Re: MoreLikeThis jar doesn't contain classes

2008-02-22 Thread mark harwood

Looks like an issue with the build process. MoreLikeThis moved to the contrib\queries area some time ago. Thanks for the report - we'll need to fix this. - Original Message From: Jonathan Ariel <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Friday, 22 February,

MoreLikeThis jar doesn't contain classes

2008-02-22 Thread Jonathan Ariel

Hi, I've downloaded Lucene 2.3.0 binaries and in the contrib folder I can see the Similarity package, but inside the Jar there are no classes! Downloading the sources I ran into the same issue. Am I doing something wrong? Where should I get the MoreLikeThis classes from? Thanks! Jonathan

MoreLikeThis and setBoost

2007-11-20 Thread Donna L Gresh

I've been stepping through the contrib MoreLikeThis class and was wondering if people can give opinions on why you would or would not use setBoost(true) for the MoreLikeThis object. It seems a bit odd (at least to me) to boost the "good" terms in the query (based on the term&#

RE: MoreLikeThis across multiple fields question...

2007-10-22 Thread Chris Sizemore

org Subject: Re: MoreLikeThis across multiple fields question... On Sunday 21 October 2007 17:21, Chris Sizemore wrote: > i'm using MoreLikeThis. i'm trying to run the document comparison across > more than one field in my index, but i'm not at all sure that it's > actually happe

Re: MoreLikeThis across multiple fields question...

2007-10-21 Thread Daniel Naber

On Sunday 21 October 2007 17:21, Chris Sizemore wrote: > i'm using MoreLikeThis. i'm trying to run the document comparison across > more than one field in my index, but i'm not at all sure that it's > actually happening -- when i examine the constructed query,

Re: MoreLikeThis across multiple fields question...

2007-10-21 Thread Daniel Naber

On Sunday 21 October 2007 17:21, Chris Sizemore wrote: > i'm using MoreLikeThis. i'm trying to run the document comparison across > more than one field in my index, but i'm not at all sure that it's > actually happening -- when i examine the constructed query,

MoreLikeThis across multiple fields question...

2007-10-21 Thread Chris Sizemore

hello-- i'm using MoreLikeThis. i'm trying to run the document comparison across more than one field in my index, but i'm not at all sure that it's actually happening -- when i examine the constructed query, only one field is mentioned! here's my code: FileReade

MoreLikeThis and stopword stemming

2007-10-10 Thread Donna L Gresh

What is the appropriate way of achieving both stopwords and stemming of stopwords when the MoreLikeThis class is used? My analyzer (MoreLikeThis.setAnalyzer) uses the Snowball filter, and is initialized with a stopwords set: analyzer = new StandardAnalyzer(stopwords) { public

Re: MoreLikeThis for multiple documents

2007-07-26 Thread Grant Ingersoll

I have some sample code for doing relevance feedback across multiple documents at http://www.cnlp.org/apachecon2005 It could be modified to provide more of the MoreLikeThis functionality (i.e. determining important terms via tf/idf) for now it just takes the top X terms -Grant On Jul 25

Re: MoreLikeThis for multiple documents

2007-07-26 Thread Mathieu Lecarme

s used for blind relevance > feedback), or maximizing tf.idf (as is done in MoreLikeThis). > > Is there anything like this already implemented, or do I need to > iterate through all documents in the set "manually", re-tokenize each > one (or maybe use TermVectors), a

MoreLikeThis for multiple documents

2007-07-25 Thread Jens Grivolla

as is done in MoreLikeThis). Is there anything like this already implemented, or do I need to iterate through all documents in the set "manually", re-tokenize each one (or maybe use TermVectors), and then calculate the weight for each term?

Re: MoreLikeThis

2007-07-18 Thread Akanksha Baid

Right , I was making a silly mistake there. I have it working now. Thanks for the reply. yu wrote: You can put lucene-queries-2.2.0.jar on your class path or your Eclipse project build path. That's all you need. Jay Akanksha Baid wrote: I am using Lucene 2.1.0 and want to use MoreLik

Re: MoreLikeThis

2007-07-18 Thread yu

You can put lucene-queries-2.2.0.jar on your class path or your Eclipse project build path. That's all you need. Jay Akanksha Baid wrote: I am using Lucene 2.1.0 and want to use MoreLikeThis for querying documents. I understand that the jar file for the same is in contrib. I hav

MoreLikeThis

2007-07-18 Thread Akanksha Baid

I am using Lucene 2.1.0 and want to use MoreLikeThis for querying documents. I understand that the jar file for the same is in contrib. I have the contrib folder extracted, but am not sure what to do from this point on. What jar file am I looking for and where should put it. I am using

Re: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread markharw00d

>>So I'm afraid I can't use the technique you recommend. ah right - so the TermVector you use from the index will return mixed and lower case versions of the same text. One point to note - this would mean that of the 25 or so top terms selected by MoreLikeThis for que

RE: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread Jong Kim

comparison in MoreLikeThis class in Lucene's contrib/queries project >>the case matters only for those words that should be included. Jong, just want to check we're on the same page - you do know MoreLikeThis has a kind of automatic Stop-Wording built in , yes? MoreLikeThis looks at

Re: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread markharw00d

>>the case matters only for those words that should be included. Jong, just want to check we're on the same page - you do know MoreLikeThis has a kind of automatic Stop-Wording built in , yes? MoreLikeThis looks at the document frequency of all terms in the "this" text

Re: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread Erick Erickson

information and keep all tokens. 2. Search functionality is provided at two levels - 2.1 End User search - stop word filtering is done on the search terms, the same stop word list is used for MoreLikeThis function. 2.2 Admin search - this is more like raw index lookup than typical end-user search, c

RE: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread Jong Kim

used for MoreLikeThis function. 2.2 Admin search - this is more like raw index lookup than typical end-user search, can include stop words in the search terms. The point here is that, the case matters only for those words that should be included. For the words we do not want included in the end

Re: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread mark harwood

OK. I can see the logic that says it might be useful/convenient to filter case-sensitive search terms using a case-insensitive list of stop words. What seems slightly odd is that you want exactness in the choice of case yet are using an imprecise matching technique (MoreLikeThis) - effectively

RE: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread Jong Kim

e to a product requirement, no token is thrown away at the time of indexing, that is, no stopwords filtering at indexing time. However, when executing MoreLikeThis feature, we do use a stopwords list (the fact that we indexed each and every word does not mean that they have to be included in the execut

Re: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread mark harwood

case-insensitive fashion? - Original Message From: Jong Kim <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Monday, 9 July, 2007 3:00:05 PM Subject: RE: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project My application stores term vecto

RE: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread Jong Kim

- From: mark harwood [mailto:[EMAIL PROTECTED] Sent: Monday, July 09, 2007 5:01 AM To: java-user@lucene.apache.org Subject: Re: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project >>I need this comparison to be case-insensitive The choice of case-sensi

Re: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread mark harwood

>>I need this comparison to be case-insensitive The choice of case-sensitivity (and preservation of punctuation, numbers etc etc) is controlled by your choice of analyzer that you pass to MoreLikeThis. If you want to ensure your list of stop words adheres to the same logic - use the

Re: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-08 Thread Chris Hostetter

: I need this comparison to be case-insensitive, but I don't see any way of : achieving it by extending this class. I would have created a subclass of : MoreLikeThis and override the isNoiseWord() method. However, the problem is : that, neither isNoiseWord() method nor the instance vari

Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-08 Thread Jong Kim

Hi, The MoreLikeThis class in Lucene's contrib/queries project performs noise word filtering based on the case-sensitive comparison of the terms against the user-supplied stopwords set. I need this comparison to be case-insensitive, but I don't see any way of achieving it by exte

Re: MoreLikeThis API changes?

2007-05-30 Thread Ryan McKinley

mark harwood wrote: I want to return the "interesting" terms used for MLT Could you do this using Query.extractTerms() on the rewritten version of the MoreLikeThis query (a BooleanQuery)? thanks! that works and avoids the PriorityQueue traverstal problems. I can even get

Re: MoreLikeThis API changes?

2007-05-30 Thread mark harwood

>> I want to return the "interesting" terms used for MLT Could you do this using Query.extractTerms() on the rewritten version of the MoreLikeThis query (a BooleanQuery)? Mark - Original Message From: Ryan McKinley <[EMAIL PROTECTED]> To: java-user@lucene.apac

Re: MoreLikeThis API changes?

2007-05-30 Thread Ryan McKinley

2. Do retrieveTerms(int docNum) and createQuery(PriorityQueue q) need to be private? Can they be public? If not public, could they at least be protected? I would think protected would be fine, what is your case for it being public? From the solr RequestHandler, I want to return the "

Re: MoreLikeThis API changes?

2007-05-30 Thread Grant Ingersoll

On May 30, 2007, at 2:45 AM, Ryan McKinley wrote: I'm trying to build a custom MoreLikeThis implementation that will run within solr and I've run into a few API hurdles... 1. Can MLT.java be modified to optionally take the Similarity implementation in the constructor? Curre

MoreLikeThis API changes?

2007-05-29 Thread Ryan McKinley

I'm trying to build a custom MoreLikeThis implementation that will run within solr and I've run into a few API hurdles... 1. Can MLT.java be modified to optionally take the Similarity implementation in the constructor? Currently it is hardcoded to: private Similarity simila

Re: MoreLikeThis?

2007-05-23 Thread Donna L Gresh

a-user@lucene.apache.org To java-user@lucene.apache.org cc Subject Re: MoreLikeThis? Donna, this is what you need to do to get the jar, and after that you just use MLT according to its API. $ cd lucene-trunk otis:~/dev/workspace/lucene-trunk otis$ cd contrib/queries/ otis:~/dev/workspace/lucene-trunk/c

Re: MoreLikeThis?

2007-05-22 Thread Otis Gospodnetic

- Original Message From: Donna L Gresh <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Tuesday, May 22, 2007 2:09:55 PM Subject: MoreLikeThis? Hello, I'm sorry if this is a naive question, but I have implemented my own "MoreLikeThis" functionality, and in re-rea

MoreLikeThis?

2007-05-22 Thread Donna L Gresh

Hello, I'm sorry if this is a naive question, but I have implemented my own "MoreLikeThis" functionality, and in re-reading the FAQ saw that it looks like something like this is already built, so I wanted to try it out and see if it would simplify my code: How do I find similar

Re: searching by field's TF vector (not MoreLikeThis)

2007-02-03 Thread Brian Whitman

On Feb 1, 2007, at 7:13 PM, Brian Whitman wrote: I'm looking for a way to search by a field's internal TF vector representation. MoreLikeThis does not seem to be what I want-- it constructs a text query based on the top scoring TF-IDF terms. I want to query by TF vecto

searching by field's TF vector (not MoreLikeThis)

2007-02-01 Thread Brian Whitman

I'm looking for a way to search by a field's internal TF vector representation. MoreLikeThis does not seem to be what I want-- it constructs a text query based on the top scoring TF-IDF terms. I want to query by TF vector directly, bypassing the tokens. Lucene understa

Re: Restrict result returned by Morelikethis

2006-12-24 Thread Nick Snels

Query.html (the approach you are taking towards your goal is sound by the way) : Date: Sat, 23 Dec 2006 20:41:18 +0100 : From: Nick Snels <[EMAIL PROTECTED]> : Reply-To: java-user@lucene.apache.org : To: java-user@lucene.apache.org : Subject: Restrict result returned by Morelikethis : :

Re: Restrict result returned by Morelikethis

2006-12-23 Thread Chris Hostetter

Restrict result returned by Morelikethis : : Hi, : : I have made a Morelikethis query to look up documents that match a certain : document id. This results in a search of the whole index. I would like the : Morelikethis query to search only part of the index. How can I do this? : : I have already t

Restrict result returned by Morelikethis

2006-12-23 Thread Nick Snels

Hi, I have made a Morelikethis query to look up documents that match a certain document id. This results in a search of the whole index. I would like the Morelikethis query to search only part of the index. How can I do this? I have already tried to create a BooleanQuery, like: BooleanQuery

MoreLikeThis does not retrieve all terms when using like()

2006-09-28 Thread Hadas Cohen

Ever since I started using Lucene, I found all answers to all possible questions in the archive. But I need help about those ones. 1. I am using MoreLikeThis class, and cannot figure out why not all terms are retrieved when using like() to generate queries. I extract the terms from a

Re: Problem finding similar documents with MoreLikeThis method.

2006-07-21 Thread Martin Braun

he field for the short description of a document) If I set the Fieldname to another Field (indexed with StandardAnalyzer) which is Indexed (but not Stored) it works if I use the like(StringReader ) Method but not with like(int docid). This Code works: MoreLikeThis mlt =

Re: Problem finding similar documents with MoreLikeThis method.

2006-07-21 Thread mark harwood

Does your index use StandardAnalyzer? Are your fields stored (Field.Store.YES)? MoreLikeThis uses StandardAnalyzer by default to read the stored content from the example doc which may produce tokens that do not match those of the indexed content. Use setAnalyzer() to ensure they are in sync

Re: Problem finding similar documents with MoreLikeThis method.

2006-07-21 Thread Martin Braun

Hello, inspired by this thread, I also tried to implement a MoreLikeThis search. But I have the same Problem of a null query. I did set the Fieldname to a Field that is stored in the Index. But "like" just returns null. Here is my Code: Hits hits = this.is.

Re: PDF documents with "MoreLikeThis" class

2006-07-20 Thread mark harwood

> To: java-user@lucene.apache.org Sent: Thursday, 20 July, 2006 10:41:03 AM Subject: PDF documents with "MoreLikeThis" class Hi, I'm using MoreLikeThis class to find similar documents... but I'm not sure if it is correct to pass as argument a Pdf file to *MoreLikeThis.like()* method. Try

1 2 >

1 - 100 of 108 matches

Mail list logo