Hi Michale and others,
I did get some hint for my problem. There was a bug in the code which was
eating up the memory which I figured out after lot of effort.
Thanks All of you for your suggestions.
Regards
Ajay
Michael McCandless-2 wrote:
I agree, memory profiler or heap dump or small
Erick,
I did get some hint for my problem. There was a bug in the code which was
eating up the memory which I figured out after lot of effort.
Thanks All of you for your suggestions.
But I still feel it takes lot of time to index documents. Its taking around
an hour or more for indexing 330 MB
Try the ideas here?
http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
Mike
On Mon, Mar 15, 2010 at 1:51 AM, ajay_gupta ajay...@gmail.com wrote:
Erick,
I did get some hint for my problem. There was a bug in the code which was
eating up the memory which I figured out after lot of
Have you run it through a memory profiler yet? Seems the obvious next step.
If that doesn't help, cut it down to the simplest possible
self-contained program that demonstrates the problem and post it here.
--
Ian.
On Thu, Mar 4, 2010 at 6:04 AM, ajay_gupta ajay...@gmail.com wrote:
Erick,
I agree, memory profiler or heap dump or small test case is the next
step... the code looks fine.
This is always a single thread adding docs?
Are you really certain that the iterator only iterates over 2500 docs?
What analyzer are you using?
Mike
On Thu, Mar 4, 2010 at 4:50 AM, Ian Lea
Ian,
OOM exception point varies not fixed. It could come anywhere once memory
exceeds a certain point.
I have allocated 1 GB memory for JVM. I haven't used profiler.
When I said after 70 K docs it fails i meant approx 70k documents but if I
reduce memory then it will OOM before 70K so its not
Lucene doesn't load everything into memory and can carry on running
consecutive searches or loading documents for ever without hitting OOM
exceptions. So if it isn't failing on a specific document the most
likely cause is that your program is hanging on to something it
shouldn't. Previous docs?
Interpolating from your data (and, by the way, some code
examples would help a lot), if you're reopening the index
reader to pick up recent additions but not closing it if a
different one is returned from reopen, you'll consume
resources. From the JavaDocs...
IndexReader new = r.reopen();
if
The worst case RAM usage for Lucene is a single doc with many unique
terms. Lucene allocates ~60 bytes per unique term (plus space to hold
that term's characters = 2 bytes per char). And, Lucene cannot flush
within one document -- it must flush after the doc has been fully
indexed.
This past
Mike,
Actually my documents are very small in size. We have csv files where each
record represents a document which is not very large so I don't think
document size is an issue.
For each record I am tokenizing it and for each token I am keeping 3
neighbouring tokens in a Hashtable. After X
The first place I'd look is how big my your strings
got. w_context and context_str come to mind. My
first suspicion is that you're building ever-longer
strings and around 70K documents your strings
are large enough to produce OOMs.
FWIW
Erick
On Wed, Mar 3, 2010 at 1:09 PM, ajay_gupta
Erick,
w_context and context_str are local to this method and are used only for
2500 K documents not entire 70 k. I am clearing the hashmap after each 2500k
doc processing and also I printed memory consumed by hashmap which is kind
of constant for each chunk processing. For each invocation of
I'm not following this entirely, but these docs may be huge by the
time you add context for every word in them. You say that you
search the existing indices then I get the content and append.
So is it possible that after 70K documents your additions become
so huge that you're blowing up? Have
Ajay,
I've posted a few times on OOM issues. Here is one thread.
http://mail-archives.apache.org/mod_mbox//lucene-java-user/200909.mbox/%
3c5b20def02611534db08854076ce825d803626...@sc1exc2.corp.emainc.com%3e
I'll try and get some more links to you from some other threads I
started for OOM
Ajay,
Here is another thread I started on the same issue.
http://stackoverflow.com/questions/1362460/why-does-lucene-cause-oom-whe
n-indexing-large-files
Paul
-Original Message-
From: java-user-return-45254-paul.b.murdoch=saic@lucene.apache.org
Hi Erick,
I tried setting setRAMBufferSizeMB as 200-500MB as well but still it goes
OOM error.
I thought its filebased indexing so memory shouldn't be an issue but you
might be right that when searching it might be using lot of memory ? Is
there way to load documents in chunks or someothere way
Where exactly are you hitting the OOM exception? Have you got a stack
trace? How much memory are you allocating to the JVM? Have you run a
profiler to find out what is using the memory?
If it runs OK for 70K docs then fails, 2 possibilities come to mind:
either the 70K + 1 doc is particularly
It's not searching that I'm wondering about. The memory size, as
far as I understand, really only has document resolution. That is, you
can't index a part of a document, flush to disk, then index the rest of
the document. The entire document is parsed into memory, and only
then flushed to disk if
18 matches
Mail list logo