Av, look at Lucene's JIRA and search for Mark Harwood. I believe he once
contributed something that does this in JIRA. If you are interested in a
commercial solution, I can recommend LingPipe.
Otis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Lucene Consulting - http
Really take a look at the thread I mentioned, as well as search
the user list archives. There's more information than you knew
existed .
My main thought is that I don't see any evidence that there's an
actual problem. That is, what behavior of the simple FS based
way of creating an index aren't y
Thanks Erik , so FSDirectory seems better option than RAMDirectory ? Also I
think O.S can cache files in which case FSDirectory may not be bad , your
thoughts ?
-Original Message-
From: Erick Erickson [mailto:[EMAIL PROTECTED]
Sent: Sunday, April 29, 2007 7:07 PM
To: java-user@lucene.apa
karl wettin wrote:
28 apr 2007 kl. 07.52 skrev Kun Hong:
karl wettin wrote:
27 apr 2007 kl. 14.11 skrev Erik Hatcher:
On Apr 27, 2007, at 6:39 AM, karl wettin wrote:
27 apr 2007 kl. 12.36 skrev Erik Hatcher:
Unless someone has some other tricks I'm not aware of, that is.
I guess it
Hi,
I tried using MoreLikeThis contrib feature to extract "interesting terms" from
a document. This works very well - but only for SINGLE words.
I am looking for a way to extra "keyPHRASES" from a document. Is there an easy
way to achieve this using Lucene index?
Thanks in advance!
Av
___
Hai ,
Where does the lucene compute term frequency vector ? {filename,function
name}
Actually the task is to replace the all term frequencies with some
constant number(integer), how to do this ?
Any kind of help is appreciated .
Thanks in advance.
As I understand it, FSDirectory *is* RAMdirectory, at least until
it flushes. There have been several discussions of this,
search the mail archive for things like MergeFactor, MaxBufferedDocs
and the like. You'll find quite a bit of information about how these
parameters interact.
Particularly, s
I am trying to index a huge documents on batches . Batch size is
parameterized to the application say X docs , that means it will hold X no.
of
Docs in the RAM before I flush to file system using
IndexWriter.addIndexes(Directory[]) method
My question is :
Do I need to set mergefactor ?