Re: Keyphrase Extraction

2007-04-29 Thread Otis Gospodnetic
Av, look at Lucene's JIRA and search for Mark Harwood. I believe he once contributed something that does this in JIRA. If you are interested in a commercial solution, I can recommend LingPipe. Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lucene Consulting - http

Re: batch indexing

2007-04-29 Thread Erick Erickson
Really take a look at the thread I mentioned, as well as search the user list archives. There's more information than you knew existed . My main thought is that I don't see any evidence that there's an actual problem. That is, what behavior of the simple FS based way of creating an index aren't y

RE: batch indexing

2007-04-29 Thread Chandan Tamrakar
Thanks Erik , so FSDirectory seems better option than RAMDirectory ? Also I think O.S can cache files in which case FSDirectory may not be bad , your thoughts ? -Original Message- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: Sunday, April 29, 2007 7:07 PM To: java-user@lucene.apa

Re: Search for docs containing only a certain word in a specified field?

2007-04-29 Thread Kun Hong
karl wettin wrote: 28 apr 2007 kl. 07.52 skrev Kun Hong: karl wettin wrote: 27 apr 2007 kl. 14.11 skrev Erik Hatcher: On Apr 27, 2007, at 6:39 AM, karl wettin wrote: 27 apr 2007 kl. 12.36 skrev Erik Hatcher: Unless someone has some other tricks I'm not aware of, that is. I guess it

Keyphrase Extraction

2007-04-29 Thread [EMAIL PROTECTED]
Hi, I tried using MoreLikeThis contrib feature to extract "interesting terms" from a document. This works very well - but only for SINGLE words. I am looking for a way to extra "keyPHRASES" from a document. Is there an easy way to achieve this using Lucene index? Thanks in advance! Av ___

Re : term frequency calculation in Lucene

2007-04-29 Thread saikrishna venkata pendyala
Hai , Where does the lucene compute term frequency vector ? {filename,function name} Actually the task is to replace the all term frequencies with some constant number(integer), how to do this ? Any kind of help is appreciated . Thanks in advance.

Re: batch indexing

2007-04-29 Thread Erick Erickson
As I understand it, FSDirectory *is* RAMdirectory, at least until it flushes. There have been several discussions of this, search the mail archive for things like MergeFactor, MaxBufferedDocs and the like. You'll find quite a bit of information about how these parameters interact. Particularly, s

batch indexing

2007-04-29 Thread Chandan Tamrakar
I am trying to index a huge documents on batches . Batch size is parameterized to the application say X docs , that means it will hold X no. of Docs in the RAM before I flush to file system using IndexWriter.addIndexes(Directory[]) method My question is : Do I need to set mergefactor ?