Re: ThreadedIndexWriter vs. IndexWriter

2009-08-11 Thread Michael McCandless
Phew! Thank you for raising this... it was a sneaky one. Mike On Tue, Aug 11, 2009 at 4:13 PM, Jibo John wrote: > Mike, > > Yes, it works perfect ! > > I did observe a dip in the indexing throughput (1855 recs/sec vs. 2200 > recs/sec previously), but, more importantly, no data is lost this time.

Re: ThreadedIndexWriter vs. IndexWriter

2009-08-11 Thread Jibo John
Mike, Yes, it works perfect ! I did observe a dip in the indexing throughput (1855 recs/sec vs. 2200 recs/sec previously), but, more importantly, no data is lost this time. Thanks for helping me nail this down. -Jibo On Aug 11, 2009, at 11:12 AM, Michael McCandless wrote: OK I found th

Re: ThreadedIndexWriter vs. IndexWriter

2009-08-11 Thread Michael McCandless
OK I found the problem! It was losing docs from the queue, when shutting down the thread pool, because we were calling super's addDocument(doc) not addDocument(doc, analyzer). IndexWriter was simply forwarding that call to ThreadedIndexWriter's addDocument(doc, analyzer) which in turn would do no

Re: ThreadedIndexWriter vs. IndexWriter

2009-08-11 Thread Michael McCandless
I'm baffled why you're losing docs w/ ThreadedIndexWriter. One question: your Lucene core JAR seems to be newer than the last MEAP update. Did you update it manually? Also, your indexes were optimized, but your algs don't have an optimize step -- did you separately run an optimize? Could you zi

Re: ThreadedIndexWriter vs. IndexWriter

2009-08-03 Thread Jibo John
Mike, Verified that I have the latest source code. Here are the alg files and the checkindexer output. - indexwriter alg analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer doc.

Re: ThreadedIndexWriter vs. IndexWriter

2009-08-02 Thread Michael McCandless
Woops sorry for the confusion! Mike On Sat, Aug 1, 2009 at 1:03 PM, Phil Whelan wrote: > Hi Mike, > > It's Jibo, not me, having the problem. But thanks for the link. I was > interested to look at the code. Will be buying the book soon. > > Phil > > On Sat, Aug 1, 2009 at 2:08 AM, Michael McCandle

Re: ThreadedIndexWriter vs. IndexWriter

2009-08-01 Thread Phil Whelan
Hi Mike, It's Jibo, not me, having the problem. But thanks for the link. I was interested to look at the code. Will be buying the book soon. Phil On Sat, Aug 1, 2009 at 2:08 AM, Michael McCandless wrote: > > (Please note that ThreadedIndexWriter is source code available with > the upcoming revi

Re: ThreadedIndexWriter vs. IndexWriter

2009-08-01 Thread Michael McCandless
(Please note that ThreadedIndexWriter is source code available with the upcoming revision to Lucene in Action.) Phil, is it possible you are using an older version of the book's source code? In particular, can you check whether your version of ThreadedIndexWriter.java has this: public void clo

Re: ThreadedIndexWriter vs. IndexWriter

2009-07-31 Thread Jibo John
Hi Phil, It's 5 threads for IndexWriter. For ThreadedIndexWriter, I used: writer.num.threads=16 writer.max.thread.queue.size=80 Thanks, -Jibo On Jul 31, 2009, at 5:01 PM, Phil Whelan wrote: Hi Jibo, Your mergeFactor is different, and the resulting numFiles (segment files) is different. May

Re: ThreadedIndexWriter vs. IndexWriter

2009-07-31 Thread ohaya
Hi, I don't know the answer to your questions, but I'm guessing that the answer to #3 is probably because the answers to #1 and #2. Did you try to look at the indexes using Luke? That shows the top 50 terms when it starts, so it might be obvious what the differences are, and that might give

Re: ThreadedIndexWriter vs. IndexWriter

2009-07-31 Thread Phil Whelan
Hi Jibo, Your mergeFactor is different, and the resulting numFiles (segment files) is different. Maybe each thread is responsible for a segment file. Just curious - do you have 3 threads? Phil - To unsubscribe, e-mail: java-user

Re: ThreadedIndexWriter vs. IndexWriter

2009-07-31 Thread Jibo John
Mike, Here you go: IndexWriter: $ java -classpath /Users/jibo/Desktop/iwork/lucene/java/trunk/build/ lucene-core-2.9-dev.jar org.apache.lucene.index.CheckIndex /Users/jibo/ Desktop/iwork/lucene/java/trunk/contrib/benchmark/work/index NOTE: testing will be more thorough if y

Re: ThreadedIndexWriter vs. IndexWriter

2009-07-31 Thread Jibo John
Tried with a larger set of documents (2,000,000 ) this time. ThreadedIndexWriter --- Size - 1.4 G optimized - yes (as suggested by Phil) Number of documents - 1,999,924 (Not idea where the 76 documents vanished...) Number of terms - 3,638,801 IndexWriter

Re: ThreadedIndexWriter vs. IndexWriter

2009-07-31 Thread Michael McCandless
Hmmm... can you run CheckIndex on both indexes and post the results? java org.apache.lucene.index.CheckIndex /path/to/index Mike On Fri, Jul 31, 2009 at 2:38 PM, Jibo John wrote: > Number of docs are the same in the index for both the cases (200,000). > I haven't altered the benchmark/ code, b

Re: ThreadedIndexWriter vs. IndexWriter

2009-07-31 Thread ohaya
Hi, Sorry to jump in, but I've been following this thread with interest :)... Am I misunderstanding your original observation, that ThreadedIndexWriter produced smaller index? Did the ThreadedIndexWriter also finish faster (I'm assuming that it should)? If the index is smaller, and everyt

Re: ThreadedIndexWriter vs. IndexWriter

2009-07-31 Thread Phil Whelan
Hi Jibo, Have you tried optimizing indexes? I do not know anything about the implementation of ThreadedIndexWriter, but if they both optimize down to the same size, it could just mean that ThreadedIndexWriter is not as optimized. Thanks, Phil On Fri, Jul 31, 2009 at 11:38 AM, Jibo John wrote: >

Re: ThreadedIndexWriter vs. IndexWriter

2009-07-31 Thread Jibo John
Number of docs are the same in the index for both the cases (200,000). I haven't altered the benchmark/ code, but, used a profiler to verify that Benchmark main thread is closed only after all other threads are closed. Thanks, -Jibo On Jul 31, 2009, at 2:34 AM, Michael McCandless wrote:

Re: ThreadedIndexWriter vs. IndexWriter

2009-07-31 Thread Michael McCandless
Hmm... this doesn't sound right. That example (ThreadedIndexWriter) is meant to be a drop-in replacement, wherever you use an IndexWriter, that keeps an under-the-hood thread pool (using java.util.concurrent.*) to add/update documents with multiple threads. It should not result in a smaller index

ThreadedIndexWriter vs. IndexWriter

2009-07-30 Thread Jibo John
While trying out a few tuning options using contrib/benchmak as described in LIA (2nd edition) book, I had an interesting observation. If I use a ThreadedIndexWriter (picked the example from lia2e, page 356) instead of IndexWriter, the index size got reduced by 40% compared to using IndexWr