Re: Making lucene indexing multi threaded

2014-10-28 Thread Erick Erickson
bq: When I loop the result set, I reuse the same Document instance. I really, really, _really_ hope you're calling new for the Document in the loop. Otherwise that single document will eventually contain all the data from your entire corpus! I'd expect some other errors to pop out if you are

Re: Making lucene indexing multi threaded

2014-10-27 Thread Jason Wu
Hi Nischal, I had similar indexing issue. My lucene indexing took 22 mins for 70 MB docs. When i debugged the problem, i found out the indexWriter.addDocument(doc) taking a really long time. Have you already found the solution about it? Thank you, Jason -- View this message in context:

RE: Making lucene indexing multi threaded

2014-10-27 Thread Fuad Efendi
@lucene.apache.org Subject: Re: Making lucene indexing multi threaded Hi Nischal, I had similar indexing issue. My lucene indexing took 22 mins for 70 MB docs. When i debugged the problem, i found out the indexWriter.addDocument(doc) taking a really long time. Have you already found the solution about

RE: Making lucene indexing multi threaded

2014-10-27 Thread Jason Wu
Hi Fuad, Thanks for your suggestions and quick response. I am using a single-threaded indexing way to add docs. I will try the multiple-threaded indexing to see if my issue will be resolved. This issue only exists after I upgraded lucene version from 2.4.1(with Java 1.6) to 4.8.1(with Java 1.7).

Re: Making lucene indexing multi threaded

2014-10-27 Thread G.Long
Like Nischal, did you check that you don't call the commit() method after each indexed document? :) Regards, Gary Long Le 27/10/2014 16:47, Jason Wu a écrit : Hi Fuad, Thanks for your suggestions and quick response. I am using a single-threaded indexing way to add docs. I will try the

Re: Making lucene indexing multi threaded

2014-10-27 Thread Jason Wu
Hi Gary, Thanks for your response. I only call the commit when all my docs are added. Here is the procedure of my Lucene indexing and re-indexing: 1. If index data exists inside index directory, remove all the index data. 2. Create IndexWriter with 256MB RAMBUFFERSIZE 3. Process

Re: Making lucene indexing multi threaded

2013-09-03 Thread Danil ŢORIN
Don't commit after adding each and every document. On Tue, Sep 3, 2013 at 7:20 AM, nischal reddy nischal.srini...@gmail.comwrote: Hi, Some more update on my progress, i have multithreaded indexing in my application, i have used thread pool executor and used a pool size of 4 but had a

Re: Making lucene indexing multi threaded

2013-09-02 Thread Erick Erickson
Stop. Back up. Test. G The very _first_ thing I'd do is just comment out the bit that actually indexes the content. I'm guessing you have some loop like: while (more files) { read the file transform the data create a Lucene document index the document } Just comment out the index

Re: Making lucene indexing multi threaded

2013-09-02 Thread Adrien Grand
Hi, Lucene's IndexWriter can safely accept updates coming from several threads, just make sure to share the same IndexWriter instance across all threads, no extrenal locking is necessary. 30 minutes sound slike a lot for 3 files unless they are large. You can have a look at

Re: Making lucene indexing multi threaded

2013-09-02 Thread nischal reddy
Hi Eric, I have commented out the indexing part (indexwriter.addDocument()) part in my application and it is taking around 90 seconds, but when i uncomment the indexing part it is taking lot of time. My machine specs are windows 7, intel i7 processor, 4gb ram and doest have an ssd harddisk.

Re: Making lucene indexing multi threaded

2013-09-02 Thread nischal reddy
Hi, Some more update on my progress, i have multithreaded indexing in my application, i have used thread pool executor and used a pool size of 4 but had a very slight increase in the performace very negligible, still it is taking around 20 minutes of time to index around 30k files, Some more