Very nice - it does pass now!
I wish there was a better way of incorporating the patch than just
shadowing the original StandardDirectoryReader with a patched one, but
unfortunately this class is final and FilterDirectoryReader doesn't seem to
do help here, making a cleaner approach seemingly
Thanks, I'll look at the issue soon.
Right, segment merging won't spontaneously create deletes. Deletes
are only made if you explicitly delete OR (tricky) there is a
non-aborting exception (e.g. an analysis problem) hit while indexing a
document; in that case IW indexes a portion of the document
Normally, reopens only go forwards in time, so if you could ensure
that when you reopen one reader to another, the 2nd one is always
newer, then I think you should never hit this issue
Mike, I'm not sure if I fully understand your suggestion. In a nutshell,
the use here case is as follows: I
One other observation - if instead of a reader opened at a later commit
point (T1), I pass in an NRT reader *without* doing the second commit on
the index prior, then there is no exception. This probably also hinges on
the assumption that no buffered docs have been flushed after T0, thus
creating
Thats because there are 3 constructors in segmentreader:
1. one used for opening new (checks hasDeletions, only reads liveDocs if so)
2. one used for non-NRT reopen -- problem one for you
3. one used for NRT reopen (takes a LiveDocs as a param, so no bug)
so personally i think you should be able
Seems to me the bug occurs regardless of whether the passed in newer reader
is NRT or non-NRT. This is because the user operates at the level of
DirectoryReader, not SegmentReader and modifying the test code to do the
following reproduces the bug:
writer.commit();
DirectoryReader latest =
Yes, there is also a safety check, but IMO it should be removed.
See the patch on the issue, the test passes now.
On Wed, Sep 10, 2014 at 9:31 PM, Vitaly Funstein vfunst...@gmail.com wrote:
Seems to me the bug occurs regardless of whether the passed in newer reader
is NRT or non-NRT. This is
I think I see the bug here, but maybe I'm wrong. Here's my theory:
Suppose no segments at a particular commit point contain any deletes. Now,
we also hold open an NRT reader into the index, which may end up with some
deletes, after the commit occurred. Then, according to the following
conditional
Hmm, which Lucene version are you using? We recently beefed up the
checking in this code, so you ought to be hitting an exception in
newer versions.
But that being said, I think the bug is real: if you try to reopen
from a newer NRT reader down to an older (commit point) reader then
you can hit
I'm on 4.6.1. I'll file an issue for sure, but is there a workaround you could
think of in the meantime? As you probably remember, the reason for doing this
in the first place was to prevent the catastrophic heap exhaustion when
SegmentReader instances are opened from scratch for every new
Okay, created LUCENE-5931 for this. As it turns out, my original test
actually does do deletes on the index so please disregard my question about
segment merging.
On Tue, Sep 9, 2014 at 3:00 PM, vfunst...@gmail.com wrote:
I'm on 4.6.1. I'll file an issue for sure, but is there a workaround you
UPDATE:
After making the changes we discussed to enable sharing of SegmentReaders
between the NRT reader and a commit point reader, specifically calling
through to DirectoryReader.openIfChanged(DirectoryReader, IndexCommit), I
am seeing this exception, sporadically:
Caused by:
On Thu, Aug 28, 2014 at 5:38 PM, Vitaly Funstein vfunst...@gmail.com wrote:
On Thu, Aug 28, 2014 at 1:25 PM, Michael McCandless
luc...@mikemccandless.com wrote:
The segments_N file can be different, that's fine: after that, we then
re-use SegmentReaders when they are in common between the
Hmm screen shot didn't make it ... can you post link?
If you are using NRT reader then when a new one is opened, it won't
open new SegmentReaders for all segments, just for newly
flushed/merged segments since the last reader was opened. So for your
N commit points that you have readers open for,
Here's the link:
https://drive.google.com/file/d/0B5eRTXMELFjjbUhSUW9pd2lVN00/edit?usp=sharing
I'm indexing let's say 11 unique fields per document. Also, the NRT reader
is opened continually, and regular searches use that one. But a special
kind of feature allows searching a particular point in
...@thetaphi.de
-Original Message-
From: Vitaly Funstein [mailto:vfunst...@gmail.com]
Sent: Thursday, August 28, 2014 7:56 PM
To: java-user@lucene.apache.org
Subject: Re: BlockTreeTermsReader consumes crazy amount of memory
Here's the link:
https://drive.google.com/file/d
-Original Message-
From: Vitaly Funstein [mailto:vfunst...@gmail.com]
Sent: Thursday, August 28, 2014 7:56 PM
To: java-user@lucene.apache.org
Subject: Re: BlockTreeTermsReader consumes crazy amount of memory
Here's the link:
https://drive.google.com/file/d/0B5eRTXMELFjjbUhSUW9pd2lVN00/edit?us
Can you drill down some more to see what's using those ~46 MB? Is the
the FSTs in the terms index?
But, we need to decouple the single segment is opened with multiple
SegmentReaders from e.g. single SegmentReader is using too much RAM
to hold terms index. E.g. from this screen shot it looks
Thanks, Mike - I think the issue is actually the latter, i.e. SegmentReader
on its own can certainly use enough heap to cause problems, which of course
would be made that much worse by failure to pool readers for unchanged
segments.
But where are you seeing the behavior that would result in reuse
Sent from my BlackBerry® smartphone
-Original Message-
From: Vitaly Funstein vfunst...@gmail.com
Date: Thu, 28 Aug 2014 10:56:17
To: java-user@lucene.apache.org
Reply-To: java-user@lucene.apache.org
Subject: Re: BlockTreeTermsReader consumes crazy amount of memory
Here's the link
Ugh, you're right: this still won't re-use from IW's reader pool. Can
you open an issue? Somehow we should make this easier.
In the meantime, I guess you can use openIfChanged from your back in
time reader to open another back in time reader. This way you have
two pools... IW's pool for the
Reply-To: java-user@lucene.apache.org
Subject: Re: BlockTreeTermsReader consumes crazy amount of memory
You can actually use IndexReader.openIfChanged(latestNRTReader,
IndexCommit): this should pull/share SegmentReaders from the pool
inside IW, when available. But it will fail to share e.g
Thanks for the suggestions! I'll file an enhancement request.
But I am still a little skeptical about the approach of pooling segment
readers from prior DirectoryReader instances, opened at earlier commit
points. It looks like the up to date check for non-NRT directory reader
just compares the
On Thu, Aug 28, 2014 at 4:18 PM, Vitaly Funstein vfunst...@gmail.com wrote:
Thanks for the suggestions! I'll file an enhancement request.
But I am still a little skeptical about the approach of pooling segment
readers from prior DirectoryReader instances, opened at earlier commit
points. It
On Thu, Aug 28, 2014 at 1:25 PM, Michael McCandless
luc...@mikemccandless.com wrote:
The segments_N file can be different, that's fine: after that, we then
re-use SegmentReaders when they are in common between the two commit
points. Each segments_N file refers to many segments...
Yes, you
On Thu, Aug 28, 2014 at 2:38 PM, Vitaly Funstein vfunst...@gmail.com
wrote:
Looks like this is used inside Lucene41PostingsFormat, which simply passes
in those defaults - so you are effectively saying the minimum (and
therefore, maximum) block size can be raised to reuse the size of the terms
Reply-To: java-user@lucene.apache.org
Subject: Re: BlockTreeTermsReader consumes crazy amount of memory
You can actually use IndexReader.openIfChanged(latestNRTReader,
IndexCommit): this should pull/share SegmentReaders from the pool
inside IW, when available. But it will fail to share e.g
@lucene.apache.org
Reply-To: java-user@lucene.apache.org
Subject: Re: BlockTreeTermsReader consumes crazy amount of memory
You can actually use IndexReader.openIfChanged(latestNRTReader,
IndexCommit): this should pull/share SegmentReaders from the pool
inside IW, when available. But it will fail
: BlockTreeTermsReader consumes crazy amount of memory
Ugh, you're right: this still won't re-use from IW's reader pool. Can
you open an issue? Somehow we should make this easier.
In the meantime, I guess you can use openIfChanged from your back in
time reader to open another back in time reader
= 2*(min-1),
Sent from my BlackBerry® smartphone
-Original Message-
From: Vitaly Funstein vfunst...@gmail.com
Date: Thu, 28 Aug 2014 14:38:37
To: java-user@lucene.apache.org
Reply-To: java-user@lucene.apache.org
Subject: Re: BlockTreeTermsReader consumes crazy amount of memory
On Thu
==null!-(?)
Sent from my BlackBerry® smartphone
-Original Message-
From: Vitaly Funstein vfunst...@gmail.com
Date: Thu, 28 Aug 2014 13:18:08
To: java-user@lucene.apache.org
Reply-To: java-user@lucene.apache.org
Subject: Re: BlockTreeTermsReader consumes crazy amount of memory
Thanks
Yes!
Sent from my BlackBerry® smartphone
-Original Message-
From: Vitaly Funstein vfunst...@gmail.com
Date: Thu, 28 Aug 2014 14:39:50
To: java-user@lucene.apache.org
Reply-To: java-user@lucene.apache.org
Subject: Re: BlockTreeTermsReader consumes crazy amount of memory
On Thu, Aug 28
(Commit=all!!
Sent from my BlackBerry® smartphone
-Original Message-
From: Vitaly Funstein vfunst...@gmail.com
Date: Thu, 28 Aug 2014 13:18:08
To: java-user@lucene.apache.org
Reply-To: java-user@lucene.apache.org
Subject: Re: BlockTreeTermsReader consumes crazy amount of memory
Thanks
To: Lucene Usersjava-user@lucene.apache.org
Reply-To: java-user@lucene.apache.org
Subject: Re: BlockTreeTermsReader consumes crazy amount of memory
Ugh, you're right: this still won't re-use from IW's reader pool. Can
you open an issue? Somehow we should make this easier.
In the meantime, I
-(FST)=
Sent from my BlackBerry® smartphone
-Original Message-
From: Michael McCandless luc...@mikemccandless.com
Date: Thu, 28 Aug 2014 15:49:30
To: Lucene Usersjava-user@lucene.apache.org
Reply-To: java-user@lucene.apache.org
Subject: Re: BlockTreeTermsReader consumes crazy amount
: BlockTreeTermsReader consumes crazy amount of memory
(writer !=commit!:legitNewreader! {
Asisthe== return doOpenFromWriter(commit);
None } else {just
return doOpenNoWriter(commit);!
Asis! }
Sent from my BlackBerry® smartphone
-Original Message-
From: Michael McCandless luc
: BlockTreeTermsReader consumes crazy amount of memory
doOpenIfChanged(final Index Commit commit)
throws IOException {
ensureOpen();
Sent from my BlackBerry® smartphone
-Original Message-
From: craiglan...@gmail.com
Date: Fri, 29 Aug 2014 00:40:23
To: java-user@lucene.apache.org
Reply
This is surprising: unless you have an excessive number of unique
fields, BlockTreeTermReader shouldn't be such a big RAM consumer.
Bu you only have 12 unique fields?
Can you post screen shots of the heap usage?
Mike McCandless
http://blog.mikemccandless.com
On Tue, Aug 26, 2014 at 3:53 PM,
Mike,
Here's the screenshot; not sure if it will go through as an attachment
though - if not, I'll post it as a link. Please ignore the altered package
names, since Lucene is shaded in as part of our build process.
Some more context about the use case. Yes, the terms are pretty much
unique; the
This is a follow up to the earlier thread I started to understand memory
usage patterns of SegmentReader instances, but I decided to create a
separate post since this issue is much more serious than the heap overhead
created by use of stored field compression.
Here is the use case, once again.
40 matches
Mail list logo