inal reader, it is transient and norms are really smallish compared to
other memory hogs, so we can live with it.
I'm nixing this sharing. Successfully loaded norms are simply inherited from
papa-reader.
> Refactor Directory/Multi/SegmentReader creation/reopening/clo
papa-SegmentReader forever (not so deadly when SRs
null themselves up on close, but it's worse with final fields and even now I
can construct a case that leaks noticeable memory)
> Refactor Directory/Multi/SegmentReader creation/reopening/clo
[
https://issues.apache.org/jira/browse/LUCENE-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853280#action_12853280
]
Earwin Burrfoot commented on LUCENE-2355:
-
- when SegmentReader / CoreReaders
aders.
I think I'm going to introduce interface RefCounted, abstract class
WhateverRefCounted, which guards against increment on closed instance and has
nice mass-decRef methods in IOUtils.
> Refactor Directory/Multi/SegmentReader creation/reopenin
to a user-passed array still has the
same performance even without the cache.
> Refactor Directory/Multi/SegmentReader creation/reopening/cloning/closing
> -
>
> Key: LUCENE-2355
>
rack reopen stuff. No issue for plugin model yet,
but I'll probably create it, can't edit this one, I'm no committer.
> Make it possible to subclass SegmentReader
> --
>
> Key: LUCENE-2345
> URL: htt
Refactor Directory/Multi/SegmentReader creation/reopening/cloning/closing
-
Key: LUCENE-2355
URL: https://issues.apache.org/jira/browse/LUCENE-2355
Project: Lucene - Java
from SegmentReader and requires you
to reopen it.
that's definitely much cleaner and would solve the issue in my current patch
(sadly i'm on 3.0 and want to keep my patch there at a minimum until i can port
to all the goodness on 3.1).
bq. Also, they extend not only SegmentReader,
inimal APIs.
* All subcomponents subclass either DataWriter or DataReader.
* The Architecture class (under KinoSearch::Plan) determines which plugins
get loaded.
[http://www.rectangular.com/svn/kinosearch/trunk/core/]
> Make it possible to
. My patch removes loadTermsIndex method from
SegmentReader and requires you to reopen it. At that moment you can stuff it
with a set of plugins without leaving 'final' paradise.
There are still things to consider. Some SR guts could be converted to plugins
themselves, so you can override
NRT indexing, if the SegmentReader is opened with no TermInfosReader (for
merging), then the plugins will be initialized with a SegmentReader that has no
ability to walk the TermsEnum.
I guess SegmentPlugin initialization should wait until after the terms index is
loaded or have another method
l API i would like to see
from the plugin model
> Make it possible to subclass SegmentReader
> --
>
> Key: LUCENE-2345
> URL: https://issues.apache.org/jira/browse/LUCENE-2345
> Project: Lucene
3.1 (new feature)?
3.1 only of course (just posted a 3.0 patch now as that's what i'm using and i
need the functionality now)
bq. Tim, do you think the plugin model ("extension by composition") would be
workable for your use case? Ie, instead of a factory enabling subclasse
sible to subclass SegmentReader
> --
>
> Key: LUCENE-2345
> URL: https://issues.apache.org/jira/browse/LUCENE-2345
> Project: Lucene - Java
> Issue Type: Wish
> Components:
ch w/ your current state, even if it's "rough"?
Ahem. Right now it's more or less finished for
Multi/Directory/MutableDirectory/WriterBackedDirectory-readers.
But the SegmentReader is in shambles (i.e. does not compile yet).
Should I post asis?
> Make it possibl
are a sign of bad design.
I.e. in your case extending IW is crazy.
You should have an interface capturing IW methods and two implementations - one
writing to the index and another delegating to its subwriters. You don't do
DirectoryReader extends SegmentReader, do you? They both extend
luc
it possible to subclass SegmentReader
> --
>
> Key: LUCENE-2345
> URL: https://issues.apache.org/jira/browse/LUCENE-2345
> Project: Lucene - Java
> Issue Type: Wish
> Compon
27;s hold off on this issue while we iterate at least with
Earwin's first refactoring effort...?
Earwin, can you post a patch w/ your current state, even if it's "rough"?
There seems to be alot of interest/opinions on how to "fix" things here :)
Both I
http://www.informit.com/articles/article.aspx?p=20521]
> Make it possible to subclass SegmentReader
> --
>
> Key: LUCENE-2345
> URL: https://issues.apache.org/jira/browse/LUCENE-2345
> Project: Lucene - Java
) methods should be private or at least final and
never public!
> Make it possible to subclass SegmentReader
> --
>
> Key: LUCENE-2345
> URL: https://issues.apache.org/jira/browse/LUCENE-2345
> Pr
times
ctors limit you. Perhaps it would make sense though in what you're trying to do
...
> Make it possible to subclass SegmentReader
> --
>
> Key: LUCENE-2345
> URL: https://issues.apache.org/jira/b
d in DirReader to be used for actual
searching. The factory/init() approach ignores this, and each user of this API
will be on his own to separate lightweight readers from full-fledged ones.
> Make it possible to subclass SegmentReader
> --
>
>
mmit this only on 3.1 (new feature)?
Earwin: will this change really conflict w/ your ongoing refactoring (to have
DirReader subclass MultiReader)? It seems somewhat orthogonal?
> Make it possible to subclass SegmentReader
> --
>
>
ubclass SegmentReader
> --
>
> Key: LUCENE-2345
> URL: https://issues.apache.org/jira/browse/LUCENE-2345
> Project: Lucene - Java
> Issue Type: Wish
> Components: Index
>
tter for setting this
If this is not expected to change during the lifetime of IW, I think it should
be added to IWC when you upgrade the patch to 3.1.
> Make it possible to subclass SegmentReader
> --
>
> Key: LUCENE-2345
>
ctory ability
(not tested yet, but i'll be doing that shortly as i integrate this
functionality)
It adds a SegmentReaderFactory.
The IndexWriter now has a getter and setter for setting this
SegmentReader has a new protected method init() which is called after the
segment reader has been i
port a patch when i absorb 3.1 (just use the "finalized" apis)
> Make it possible to subclass SegmentReader
> --
>
> Key: LUCENE-2345
> URL: https://issues.apache.org/jira/browse/LUCENE-2345
>
n
settle on a deadline? Don't like to see the effort vaporate, also merging is
gonna be hell with flex branch alone, don't want to double it :)
> Make it possible to subclass SegmentReader
> --
>
> Key: LUCE
ill start working on a patch tomorrow
will take a few days as i'll start with a 3.0 patch (which i use), then will
create a 3.1 patch once i've got that all flushed out
> Make it possible to subclass SegmentReader
> --
>
>
ible to subclass SegmentReader
> --
>
> Key: LUCENE-2345
> URL: https://issues.apache.org/jira/browse/LUCENE-2345
> Project: Lucene - Java
> Issue Type: Wish
> Components: Index
>
Make it possible to subclass SegmentReader
--
Key: LUCENE-2345
URL: https://issues.apache.org/jira/browse/LUCENE-2345
Project: Lucene - Java
Issue Type: Wish
Components: Index
2426.
> Deadlock with FSIndexInput and SegmentReader
> -
>
> Key: LUCENE-2263
> URL: https://issues.apache.org/jira/browse/LUCENE-2263
> Project: Lucene - Java
> Issue Type: Bug
&
October 2008
bash-3.00$ uname -a
SunOS op06udb1 5.10 Generic_13-03 sun4v sparc SUNW,Sun-Blade-T6340
> Deadlock with FSIndexInput and SegmentReader
> -
>
> Key: LUCENE-2263
> URL: https://issues.apache.
Deadlock with FSIndexInput and SegmentReader
-
Key: LUCENE-2263
URL: https://issues.apache.org/jira/browse/LUCENE-2263
Project: Lucene - Java
Issue Type: Bug
Affects Versions: 2.2
atch is we'll only pool if the doc count is
above a threshold, 100,000 seems like a good number. Also pooling will be
optional.
> PooledSegmentReader, pools SegmentReader underlying byte arrays
> ---
>
>
Java 1.5, ConcurrentLinkedQueue can be used.
> PooledSegmentReader, pools SegmentReader underlying byte arrays
> ---
>
> Key: LUCENE-1574
> URL: https://issues.apache.org/jira/browse/LUCENE-15
the method in SimpleStringInterner for lockless pooling?
> PooledSegmentReader, pools SegmentReader underlying byte arrays
> ---
>
> Key: LUCENE-1574
> URL: https://issues.apache.org/jira
Just realized it, thanks for making SegmentReader public!!!
-John
true, Zoie is using a bloom filter over a intHash set from
fastutil for exactly the perf reason Jason pointed.
> PooledSegmentReader, pools SegmentReader underlying byte arrays
> ---
>
> Key: LUCENE-1574
>
+1
We're already able to warm a reader in the the newly merged segment,
in the BG and not delaying any near real-time reopens in the
meantime... but if we can merge FieldCache entries directly in RAM
this ought to be a good speedup over re-un-inverting the merged
segment.
Mike
On Thu, Jul 30, 20
Perhaps in separate patch that's limited to field cache merging
we can simply modify our existing field cache code (i.e. not
rewrite field caching in general) (in conjunction with
IW.getReader and segment merging) to automatically (or with a
settings callback in IW for which fields should be auto
I know this has been somewhat stuck in LUCENE-831 which seems to
have blown up quite a bit over time and is untouched of late?
Perhaps in separate patch that's limited to field cache merging
we can simply modify our existing field cache code (i.e. not
rewrite field caching in general) (in conjunct
seem necessary.
> IndexWriter.readerPool create new segmentReader outside of sync block
> -
>
> Key: LUCENE-1726
> URL: https://issues.apache.org/jira/browse/LUCENE-1726
> Project: Luc
hat's not very costly, and 2) we can't gain that
concurrency back anyway because we synchronize on IW when opening the
reader.
> IndexWriter.readerPool create new segmentReader outside of sync block
> -
>
&
). This way we know when the SRMV is in use and different
threads don't clobber each other creating and closing SRs using
readerPool.
> IndexWriter.readerPool create new segmentReader outside of sync block
> -
>
&
could sync on as well. Otherwise we've got
synchronization in many places, IW, IW.readerPool, SR, SR.core.
It would seem to make things brittle? Perhaps listing out the
various reasons we're synchronizing, to see if we can
consolidate some of them will help?
> IndexWriter.readerPo
ate new segmentReader outside of sync block
> -
>
> Key: LUCENE-1726
> URL: https://issues.apache.org/jira/browse/LUCENE-1726
> Project: Lucene - Java
>
(this is where hazard
comes in), and next you sync on the SRMapValue. Another thread can
sneak in and close the SRMapValue.reader during the time that no sync
is held.
> IndexWriter.readerPool create new segmentReader outside of sync block
>
g it to
see if we can get an error.
I'm still a little confused as to why we're going to see the bug
if readerPool.get is syncing on the SRMapValue. I guess there's
a slight possibility of the error, and perhaps a more randomized
test would produce it.
> IndexWriter.reade
ynchronize on core for the cloning methods?
I don't think that's needed? The core is simply carried over to the
newly cloned reader.
> IndexWriter.readerPool create new segmentReader outside of sync block
>
o in the patch, perhaps in
TestIndexWriterReader? Great work on this, it's easier to
understand SegmentReader now that all the shared objects are in
one object (CoreReaders). It should make debugging go more
smoothly.
Is there a reason we're not synchronizing on SR.core in
openDocStore
by a merge that
doesn't need to merge the doc stores; later, an NRT reader is opened
that separately opens the doc stores of the same [pooled]
SegmentReader, but then it's the merge that closes the read-only clone
of the reader.
In this case the separately opened (by the NRT reader) doc
, eg this failure is w/ only doc
stores so it seems likely the merging logic that opens doc stores just
before kicking off the merge may be to blame.
> IndexWriter.readerPool create new segmentReader outside of sync block
> ---
can recommend techniques or tools for
debugging this type of multithreading issue? (i.e. how do you go
about figuring this type of issue out?)
> IndexWriter.readerPool create new segmentReader outside of syn
this test case.
> IndexWriter.readerPool create new segmentReader outside of sync block
> -
>
> Key: LUCENE-1726
> URL: https://issues.apache.org/jira/browse/LUCENE-1726
>
irectory.close(MockRAMDirectory.java:278)
[junit] at
org.apache.lucene.index.Test1726.testIndexing(Test1726.java:48)
[junit] at
org.apache.lucene.util.LuceneTestCase.runTest(LuceneTestCase.java:88)
{code}
> IndexWriter.readerPool create new segmentReader outs
close(MockRAMDirectory.java:278)
[junit] at
org.apache.lucene.index.Test1726.testIndexing(Test1726.java:48)
[junit] at
org.apache.lucene.util.LuceneTestCase.runTest(LuceneTestCase.java:88)
{code}
> IndexWriter.readerPool create new segmentReader outside of sy
(rather than both) and
now we get a "MockRAMDirectory: cannot close: there are still open files"
exception.
> IndexWriter.readerPool create new segmentReader outside of sync block
> -
>
>
derPool create new segmentReader outside of sync block
> -
>
> Key: LUCENE-1726
> URL: https://issues.apache.org/jira/browse/LUCENE-1726
> Project: Lucene - Java
&
ound in readerPool.get, tests
would fail and/or hang. I'm not sure yet where we'd add the
sync(this) block.
I'll work on reproducing the above mentioned issue, thanks for
the advice.
> IndexWriter.readerPool create new segmentReader
rPool create new segmentReader outside of sync block
> -
>
> Key: LUCENE-1726
> URL: https://issues.apache.org/jira/browse/LUCENE-1726
> Project: Lucene - Java
>
exception in TestStressIndexing2 (or
another test class) when the mv.reader.incRef occurs and the
reader is already closed?
> IndexWriter.readerPool create new segmentReader outside of sync block
> -
>
>
n leaves the sync block
* Thread #1 calls release, which decRefs the reader & closes it
* Thread #2 resumes, sees it has a non-null mv.reader and incRefs
it, which is illegal (reader was already closed).
> IndexWriter.readerPool create new segmentReader outsid
We need a test case that fails with the
current patch?
> IndexWriter.readerPool create new segmentReader outside of sync block
> -
>
> Key: LUCENE-1726
> URL: https://issues.apac
ool create new segmentReader outside of sync block
> -
>
> Key: LUCENE-1726
> URL: https://issues.apache.org/jira/browse/LUCENE-1726
> Project: Lucene - Java
> Is
alue strongly typed? Eg name it SegmentReaderValue, and it
has a single member "SegmentReader reader".
getIfExists has duplicate checks for null (mv != null is checked twice and
mv.value != null too).
I think there is a thread hazard here, in particular a risk that one thread
decrefs a
new segmentReader outside of sync block
> -
>
> Key: LUCENE-1726
> URL: https://issues.apache.org/jira/browse/LUCENE-1726
> Project: Lucene - Java
> Issue Type: Improvement
rPool when a new
segmentReader is being warmed/instantiated. This is important
when new segmentReaders on large new segments are being accessed
for the first time. Otherwise today IW.getReader may wait while
the new SR is being created.
* IW.readerPool map values are now of type MapValue
* We synchr
IndexWriter.readerPool create new segmentReader outside of sync block
-
Key: LUCENE-1726
URL: https://issues.apache.org/jira/browse/LUCENE-1726
Project: Lucene - Java
Issue
eed to get deletes from the
> SegmentReader
>
>
> Key: LUCENE-1700
> URL: https://issues.apache.org/jira/browse/LUCENE-1700
> Project: Lucene - Java
built.
I plan to commit in a day or two.
> LogMergePolicy.findMergesToExpungeDeletes need to get deletes from the
> SegmentReader
>
>
> Key: LUCENE-1700
>
eed to get deletes from the
> SegmentReader
>
>
> Key: LUCENE-1700
> URL: https://issues.apache.org/jira/browse/LUCENE-1700
> Project: Lucene - Java
can solve the package protected
SegmentInfo issue here by creating a new class with the
necessary attributes?
Here's what LUCENE-1313 does:
{code} SegmentReader sr = writer.readerPool.getIfExists(info);
if (info.hasDeletions() || (sr != null && sr.hasDeletions())) {
{code}
Because Segme
LogMergePolicy.findMergesToExpungeDeletes need to get deletes from the
SegmentReader
Key: LUCENE-1700
URL: https://issues.apache.org/jira/browse/LUCENE-1700
Project
ols SegmentReader underlying byte arrays
> ---
>
> Key: LUCENE-1574
> URL: https://issues.apache.org/jira/browse/LUCENE-1574
> Project: Lucene - Java
> Issue Type: Improvement
Michael McCandless wrote:
On Thu, May 21, 2009 at 10:53 AM, Earwin Burrfoot wrote:
I agree we should probably remove it, unless there are users relying
on this. Maintaining side-by-side sources is difficult with time.
As I said in the initial message, this feature introduces no run
On Thu, May 21, 2009 at 10:53 AM, Earwin Burrfoot wrote:
>> I agree we should probably remove it, unless there are users relying
>> on this. Maintaining side-by-side sources is difficult with time.
>
> As I said in the initial message, this feature introduces no runtime
> behaviour changes, so y
2009/5/21 Michael McCandless :
> It looks like this was done in order to implement
> SegmentTermDocs.read(int[], int[]) natively, when using a gcj
> environment, since that gave performance improvements?
Yup, you're right. But something tells me, since Lucene 1.9 many
things changed and this is no
ies and Class.newInstance() is used
> to create SegmentReader.
>
> I've tracked down this code's origins to:
> r150531 | cutting | 2004-09-22 22:32:27 +0400 (ср, 22 сен 2004) | 2 lines
> Add GCJ native code for SegmentTermDocs.read(int[],int[]) to
> accellerate TermScorer
Right now a set of system properties and Class.newInstance() is used
to create SegmentReader.
I've tracked down this code's origins to:
r150531 | cutting | 2004-09-22 22:32:27 +0400 (ср, 22 сен 2004) | 2 lines
Add GCJ native code for SegmentTermDocs.read(int[],int[]) to
accellerate
the GC. In some
cases this is faster than a single large array because of the
way Java (or the OS?) transfers memory around through the CPU
cache.
> PooledSegmentReader, pools SegmentReader underlying byte arrays
> ---
>
>
be it's LUCENE-1526).
> PooledSegmentReader, pools SegmentReader underlying byte arrays
> ---
>
> Key: LUCENE-1574
> URL: https://issues.apache.org/jira/browse/LUCENE-1574
&g
PooledSegmentReader, pools SegmentReader underlying byte arrays
---
Key: LUCENE-1574
URL: https://issues.apache.org/jira/browse/LUCENE-1574
Project: Lucene - Java
Issue Type
bq. It seems that Lucene should have ways to incorporate new bitset
implementations in the future using interfaces and things.
DocIdSet?
> Use OpenBitSet instead of BitVector in SegmentReader
>
>
> Key: LUCENE-1485
>
for a more
structured and timely discussion.
> Use OpenBitSet instead of BitVector in SegmentReader
>
>
> Key: LUCENE-1485
> URL: https://issues.apache.org/jira/browse/LUCENE-1485
>
noise?
> Use OpenBitSet instead of BitVector in SegmentReader
>
>
> Key: LUCENE-1485
> URL: https://issues.apache.org/jira/browse/LUCENE-1485
> Project: Lucene - Java
&g
the -client option in the JVM on Mac OS X. Using
-server the numbers look almost the same for OpenBitSet and BitVector with
BitVector being slightly faster.
> Use OpenBitSet instead of BitVector in SegmentReader
>
>
>
BitVector and OpenBitSet. FastGet is called on OpenBitSet.
> Use OpenBitSet instead of BitVector in SegmentReader
>
>
> Key: LUCENE-1485
> URL: https://issues.apache.org/jira/browse/LUCENE-1485
>
iteable to disk. We're working on an isSparse
method for OpenBitSet.
> Use OpenBitSet instead of BitVector in SegmentReader
>
>
> Key: LUCENE-1485
> URL: https://issues.apache.org/jira/brows
Use OpenBitSet instead of BitVector in SegmentReader
Key: LUCENE-1485
URL: https://issues.apache.org/jira/browse/LUCENE-1485
Project: Lucene - Java
Issue Type: Improvement
signed to live
for the life of the SegmentReader/IndexReader and thread.
On Jul 12, 2008, at 2:12 AM, Roman Puchkovskiy wrote:
Well, possibly I'm mistaken, but it seems that this affects non-
static fields
too. Please see
http://www.nabble.com/ThreadLocal-in-SegmentReader-to18306230.html
ence, it will be GC'd - eventually, and then
>>> it will be cleared as new ThreadLocals are created.
>>>
>>> With a static reference, the thread can reference the ThreadLocal at
>>> any time, and thus the WeakReference will not be cleared.
>>>
cts stored in ThreadLocals are designed to live
for the life of the SegmentReader/IndexReader and thread.
On Jul 12, 2008, at 2:12 AM, Roman Puchkovskiy wrote:
Well, possibly I'm mistaken, but it seems that this affects non-
static fields
too. Please see
http://www.nabble.com/ThreadLocal-in-
m, but I don't think that is the case with
> Lucene, as the objects stored in ThreadLocals are designed to live
> for the life of the SegmentReader/IndexReader and thread.
>
>
> On Jul 12, 2008, at 2:12 AM, Roman Puchkovskiy wrote:
>
>>
>> Well, possibly I'
nce will not be cleared.
>
> If the object is VERY large, and new ThreadLocals are not created it
> could cause a problem, but I don't think that is the case with
> Lucene, as the objects stored in ThreadLocals are designed to live
> for the life of the SegmentReade
e, and new ThreadLocals are not created it
could cause a problem, but I don't think that is the case with
Lucene, as the objects stored in ThreadLocals are designed to live
for the life of the SegmentReader/IndexReader and thread.
On Jul 12, 2008, at 2:12 AM, Roman Puchkovskiy wrote:
Well, possibly I'm mistaken, but it seems that this affects non-static fields
too. Please see
http://www.nabble.com/ThreadLocal-in-SegmentReader-to18306230.html where the
use case is described in the details.
In short: it seems that the scope of ThreadLocals does not matter. What
really ma
theserverside.com/news/thread.tss?thread_id=41473
Once again, I'm not pointing to Lucene SegmentReader as a "bad"
implementation, and maybe the current "problems" of ThreadLocals
are not a problem for SegmentReader but it seems safer to use
ThreadLocals to pass context informat
ot;problems" using ThreadLocals.
>>>
>>> http://opensource.atlassian.com/projects/hibernate/browse/HHH-2481
>>> http://www.theserverside.com/news/thread.tss?thread_id=41473
>>>
>>> Once again, I'm not pointing to Lucene SegmentReader as a "bad
s the thread is alive.
On Jul 9, 2008, at 4:46 PM, Adrian Tarau wrote:
Just a few examples of "problems" using ThreadLocals.
http://opensource.atlassian.com/projects/hibernate/browse/HHH-2481
http://www.theserverside.com/news/thread.tss?thread_id=41473
Once again, I'm not point
1 - 100 of 132 matches
Mail list logo