bq: When I loop the result set, I reuse the same Document instance.
I really, really, _really_ hope you're calling new for the Document in
the loop. Otherwise that single document will eventually contain all
the data from your entire corpus! I'd expect some other errors to pop
out if you are
Hi Nischal,
I had similar indexing issue. My lucene indexing took 22 mins for 70 MB
docs. When i debugged the problem, i found out the
indexWriter.addDocument(doc) taking a really long time.
Have you already found the solution about it?
Thank you,
Jason
--
View this message in context:
@lucene.apache.org
Subject: Re: Making lucene indexing multi threaded
Hi Nischal,
I had similar indexing issue. My lucene indexing took 22 mins for 70 MB
docs. When i debugged the problem, i found out the
indexWriter.addDocument(doc) taking a really long time.
Have you already found the solution about
Hi Fuad,
Thanks for your suggestions and quick response. I am using a single-threaded
indexing way to add docs. I will try the multiple-threaded indexing to see
if my issue will be resolved.
This issue only exists after I upgraded lucene version from 2.4.1(with Java
1.6) to 4.8.1(with Java 1.7).
Like Nischal, did you check that you don't call the commit() method
after each indexed document? :)
Regards,
Gary Long
Le 27/10/2014 16:47, Jason Wu a écrit :
Hi Fuad,
Thanks for your suggestions and quick response. I am using a single-threaded
indexing way to add docs. I will try the
Hi Gary,
Thanks for your response. I only call the commit when all my docs are added.
Here is the procedure of my Lucene indexing and re-indexing:
1. If index data exists inside index directory, remove all the index
data.
2. Create IndexWriter with 256MB RAMBUFFERSIZE
3. Process
Don't commit after adding each and every document.
On Tue, Sep 3, 2013 at 7:20 AM, nischal reddy nischal.srini...@gmail.comwrote:
Hi,
Some more update on my progress,
i have multithreaded indexing in my application, i have used thread pool
executor and used a pool size of 4 but had a
Stop. Back up. Test. G
The very _first_ thing I'd do is just comment out the bit that
actually indexes the content. I'm guessing you have some
loop like:
while (more files) {
read the file
transform the data
create a Lucene document
index the document
}
Just comment out the index
Hi,
Lucene's IndexWriter can safely accept updates coming from several
threads, just make sure to share the same IndexWriter instance across
all threads, no extrenal locking is necessary.
30 minutes sound slike a lot for 3 files unless they are large.
You can have a look at
Hi Eric,
I have commented out the indexing part (indexwriter.addDocument()) part in
my application and it is taking around 90 seconds, but when i uncomment the
indexing part it is taking lot of time.
My machine specs are
windows 7, intel i7 processor, 4gb ram and doest have an ssd harddisk.
Hi,
Some more update on my progress,
i have multithreaded indexing in my application, i have used thread pool
executor and used a pool size of 4 but had a very slight increase in the
performace very negligible, still it is taking around 20 minutes of time to
index around 30k files,
Some more
11 matches
Mail list logo