Re: Performance of never optimizing

2008-11-02 Thread Chris Lu
Hi, Justus, I had met with very similar problems as JIRA has, which has high modification and on a large data volume. It's a pretty common use case for Lucene. The way I dealt with high rate of modification is to create a secondary in-memory index. And only persist documents older than a per

Re: Performance of never optimizing

2008-11-02 Thread Justus Pendleton
On 03/11/2008, at 4:27 PM, Otis Gospodnetic wrote: Why are you optimizing? Trying to make the search faster? I would try to avoid optimizing during high usage periods. I assume that the original, long-ago, decision to optimize was made to improve searching performance. One thing that you

Re: Performance of never optimizing

2008-11-02 Thread Otis Gospodnetic
Hello, Very quick comments. - Original Message > From: Justus Pendleton <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Sunday, November 2, 2008 10:42:52 PM > Subject: Performance of never optimizing > > Howdy, > > I have a couple of questions regarding some Lucene ben

Performance of never optimizing

2008-11-02 Thread Justus Pendleton
Howdy, I have a couple of questions regarding some Lucene benchmarking and what the results mean[3]. (Skip to the numbered list at the end if you don't want to read the lengthy exegesis :) I'm a developer for JIRA[1]. We are currently trying to get a better understanding of Lucene, and ou

Searching over multiple fields using XML document

2008-11-02 Thread syedfa
Dear fellow Java/Lucene developers: I am trying to search an xml document over multiple fields. The index I created using the SAX method. I am trying to search shakespeare's "Hamlet" over the and tags for words that the user is looking for. I am thinking of using the MultiFieldQueryParser ho

Re: Benchmarking my indexer

2008-11-02 Thread Rafael Cunha de Almeida
On Sun, 2 Nov 2008 07:11:20 -0500 Grant Ingersoll <[EMAIL PROTECTED]> wrote: > > On Nov 1, 2008, at 1:39 AM, Rafael Cunha de Almeida wrote: > > > Hello, > > > > I did an indexer that parses some files and indexes them using > > lucene. I > > want to benchmark the whole thing, so I'd like to co

Re: Exact Phrase Query

2008-11-02 Thread semelak ss
Hello Erick, If it weren't for your help and kind response, I would be struggling now with the initial problem I had. The solution to that problem turned out to be the one you mentioned in your response (indexwriters/indexreaders both being opened at the same time). The problem I mentioned in

Re: Exact Phrase Query

2008-11-02 Thread Erick Erickson
Sorry, but I've really run out of patience here. You have consistently stated only part of the problem, never posting enough information to allow me to answer helpfully. You haven't even taken the time to proofread your posts, which has wasted my (limited, volunteer) time. In the future, please co

Re: Exact Phrase Query

2008-11-02 Thread semelak ss
Also, is there a way to pass a null or no tokenizer when writing to the index the field "words" ?? I have no need for tokenizing the words and the exact query will always be known. To understand better the problem, when are performing words comparison in large number of text documents. Each wo

addDocument vs addIndexes

2008-11-02 Thread Hadi Forghani
hi friends merge N document to an existing index is better than add N document to an existing index? in the other word, has IndexWriter.addIndexesNoOptimize less I/O than IndexWriter.addDocument? thanks

Re: Exact Phrase Query

2008-11-02 Thread semelak ss
I was in a hurry when copying and pasting the code. What I've been using is only writer. RamWriter was never used as it never really worked (thanks to you, I now understand the reason). The above is not really related to the problem I was facing. I modified my code so that an indexreader/indexw

Re: Benchmarking my indexer

2008-11-02 Thread Grant Ingersoll
On Nov 1, 2008, at 1:39 AM, Rafael Cunha de Almeida wrote: Hello, I did an indexer that parses some files and indexes them using lucene. I want to benchmark the whole thing, so I'd like to count the tokens being indexed so I can calculate the average number of indexed tokens per second. Is