Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-08 Thread Jong Kim
Hi, The MoreLikeThis class in Lucene's contrib/queries project performs noise word filtering based on the case-sensitive comparison of the terms against the user-supplied stopwords set. I need this comparison to be case-insensitive, but I don't see any way of achieving it by extending this cla

Re: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-08 Thread Chris Hostetter
: I need this comparison to be case-insensitive, but I don't see any way of : achieving it by extending this class. I would have created a subclass of : MoreLikeThis and override the isNoiseWord() method. However, the problem is : that, neither isNoiseWord() method nor the instance variables refer

Re: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread mark harwood
July, 2007 10:12:08 PM Subject: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project Hi, The MoreLikeThis class in Lucene's contrib/queries project performs noise word filtering based on the case-sensitive comparison of the terms against the user-supplied st

RE: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread Jong Kim
- From: mark harwood [mailto:[EMAIL PROTECTED] Sent: Monday, July 09, 2007 5:01 AM To: java-user@lucene.apache.org Subject: Re: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project >>I need this comparison to be case-insensitive The choice of case-sensi

Re: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread mark harwood
case-insensitive fashion? - Original Message From: Jong Kim <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Monday, 9 July, 2007 3:00:05 PM Subject: RE: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project My application stores term vecto

RE: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread Jong Kim
supply stop words in a case-insensitive fashion? - Original Message From: Jong Kim <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Monday, 9 July, 2007 3:00:05 PM Subject: RE: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project My applicat

Re: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread mark harwood
er@lucene.apache.org Subject: Re: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project >>My application stores term vectors with the index And those stored term vectors contain terms produced by your choice of analyzer, no? Or are you saying that you have deli

RE: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread Jong Kim
the useful class even more useful. /Jong -Original Message- From: mark harwood [mailto:[EMAIL PROTECTED] Sent: Monday, July 09, 2007 11:54 AM To: java-user@lucene.apache.org Subject: Re: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project OK. I can see the

Re: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread Erick Erickson
: Monday, July 09, 2007 11:54 AM To: java-user@lucene.apache.org Subject: Re: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project OK. I can see the logic that says it might be useful/convenient to filter case-sensitive search terms using a case-insensitive list of stop

Re: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread markharw00d
>>the case matters only for those words that should be included. Jong, just want to check we're on the same page - you do know MoreLikeThis has a kind of automatic Stop-Wording built in , yes? MoreLikeThis looks at the document frequency of all terms in the "this" text you provide and only sele

RE: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread Jong Kim
comparison in MoreLikeThis class in Lucene's contrib/queries project >>the case matters only for those words that should be included. Jong, just want to check we're on the same page - you do know MoreLikeThis has a kind of automatic Stop-Wording built in , yes? MoreLikeThis looks at

Re: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread markharw00d
>>So I'm afraid I can't use the technique you recommend. ah right - so the TermVector you use from the index will return mixed and lower case versions of the same text. One point to note - this would mean that of the 25 or so top terms selected by MoreLikeThis for querying there is a reasonable