Looking for a stemmer that can return all inflected forms

2006-10-14 Thread Jong Kim
Hi, I'm looking for a stemmer that is capable of returning all morphological variants of a query term (to be used for high-recall search). For example, given a query term of 'cares', I would like to be able to generate 'cares', 'care', 'cared', and 'caring'. I looked at the Porter stemmer,

RE: Looking for a stemmer that can return all inflected forms

2006-10-15 Thread Jong Kim
imagine the stem is "car". Suddenly the word "cars" shares the same "car" stem and you have a false positive. Jong: I _think_ what you need is a "reverse lemmatizer". Otis - Original Message From: Bill Taylor <[EMAIL PROTECTED]> To: java

ranking/scoring algorithm in details

2007-02-28 Thread Jong Kim
Hi, Does anyone know of a written document that describes in some details how Lucene's ranking/scoring algorithm works? I'm safely assuming that a single consistent algorithm is being used to compute the scores of each matching documents (with or without explicit boost factors in the query) and r

Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-08 Thread Jong Kim
Hi, The MoreLikeThis class in Lucene's contrib/queries project performs noise word filtering based on the case-sensitive comparison of the terms against the user-supplied stopwords set. I need this comparison to be case-insensitive, but I don't see any way of achieving it by extending this cla

RE: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread Jong Kim
e. I don't imagine there should be a need to change the MoreLikeThis source. Cheers Mark - Original Message ---- From: Jong Kim <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Sunday, 8 July, 2007 10:12:08 PM Subject: Stop-words comparison in MoreLikeThis class in Lu

RE: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread Jong Kim
supply stop words in a case-insensitive fashion? - Original Message ---- From: Jong Kim <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Monday, 9 July, 2007 3:00:05 PM Subject: RE: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project My applicat

RE: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread Jong Kim
this applies to your app you could run MoreLikeThis on the lower-cased version of the field in the index. Cheers Mark - Original Message From: Jong Kim <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Monday, 9 July, 2007 3:55:03 PM Subject: RE: Stop-words compariso

RE: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread Jong Kim
Mark, I understand your point. However, we do not maintain a separate field for the lower-case version of the words. Instead we index them twice at the same position within the same field, which allows us to provide case-exact match for search queries containing upper case characters, but case-i

Re: Re-indexing a particular field only without re-indexing the entire enclosing document in the index

2012-04-23 Thread Jong Kim
involves multiple terms and/or multiple fields, right? /Jong On Mon, Apr 23, 2012 at 11:58 AM, Earl Hood wrote: > On Mon, Apr 23, 2012 at 10:31 AM, Jong Kim wrote: > > > Is there any good way to solve this design problem? Obviously, an > > alternative design would be to split

Lucene's internal doc ID space

2012-05-11 Thread Jong Kim
When I update a document in Lucene (i.e., re-indexing), I have to delete the existing document, and create a new one. My understanding is that this assigns a new doc ID for the newly created document. If that is the case, is it true that the system can rather quickly run out of doc ID space (which

Lucene index on NFS

2012-10-01 Thread Jong Kim
Hi, According to the Lucene In Action (Second Edition), the section 2.11.2 "Accessing an index over a remote file system" explains that there are issues related to accessing a Lucene index across remote file system including NFS. I'm particuarly interested in NFS compatibility, and wondering if t

Re: Lucene index on NFS

2012-10-02 Thread Jong Kim
012 at 3:21 AM, Vitaly Funstein > wrote: > >> How tolerant is your project of decreased search and indexing > performance? > >> You could probably write a simple test that compares search and write > >> speeds of local and NFS-mounted indexes and make the decision

Re: Lucene index on NFS

2012-10-02 Thread Jong Kim
ght? > > Paul > > > Le 2 oct. 2012 à 14:01, Jong Kim a écrit : > > > Thank you all for reply. > > > > So it soudns like it is a known fact that the performance would suffer > > rather significantly when the index files are accessed over NFS. But how > >

Re: Lucene index on NFS

2012-10-02 Thread Jong Kim
rather than corruption). I've seen > fairly large infrastructures being based on NFS and corruption is something > I've never heard about. > > > > Note: no concurrent access to a lucene index, right? > > > > Paul > > > > > > Le 2 oct.

Re: Lucene index on NFS

2012-10-02 Thread Jong Kim
random access to files and this has no reason to be > > unreliable unless bad things such as network drops happen (in which case > you'd > > get direct failures or timeouts rather than corruption). I've seen > fairly large > > infrastructures being based on NFS and

Re: Lucene index on NFS

2012-10-02 Thread Jong Kim
a replication. You end up repeating indexing once per > replica. You also may have to move the indices around as you > add/remove/restart nodes. We are moving to this architecture with a new > product, so I am just now starting to understand the trade-offs. > > Hope that helps. >

Re: Lucene index on NFS

2012-10-02 Thread Jong Kim
; > My 2 cents, > Tommaso > > 2012/10/2 Jong Kim > > > The setup is I have a home-grown server process that has exclusive access > > to the index files. All reads and writes are done through this server. No > > other process is reading the same index files whether