Hi,
I think removing binaries directly without going though the GC logic is
dangerous, because we can't be sure if there are other references. There
is one exception, it is if each file is guaranteed to be unique. For that,
we could for example append a unique UUID to each file. The Lucene file
sy
Hi,
On 10 March 2015 at 09:52, Chetan Mehrotra
wrote:
> > Is Oak already single instance when it comes to the identification and
> storage of binaries ?
>
> Yes. Oak uses content addressable storage for binaries
>
> > Are the existing TextExtractors also single instance ?
>
> No. If same binary
Thats one approach we can think about. Thinking further with Lucene
design of immutable files things become simpler (ignoring the reindex
case). In normal usage Lucene never reuses the file name and never
modifies any existing file. So we would not have to worry about
reading older revisions. We on
Could the Lucene indexer explicitly track these files (e.g. as a property in
the index definition)? And also take care of removing them? (the latter part is
assuming that the same index file is not identical across various definitions)
> On 10 Mar 2015, at 12:18, Chetan Mehrotra wrote:
>
> On
Thank you! This example helped me iron out the errors in my index configuration!
It would be good to have a bit more example code online for these things.
On 6 March 2015 at 04:16, Chetan Mehrotra wrote:
> Hi Torgeir,
>
> Sorry for the delay here as got stuck with other issues. I tried your
> ap
On Tue, Mar 10, 2015 at 4:12 PM, Michael Dürig wrote:
> The problem is that you don't even have a list of all previous revisions of
> the root node state. Revisions are created on the fly and kept as needed.
hmm yup. Then we would need to think of some other approach to know
all the blobId referr
On 10.3.15 11:32 , Chetan Mehrotra wrote:
On Tue, Mar 10, 2015 at 3:33 PM, Michael Dürig wrote:
SegmentMK doesn't even have the concept of a previous revision of a
NodeState.
Yes that is to be thought about. I want to read all previous revision
for path /oak:index/lucene/:data. For segment
Hi,
the vote passes as follows:
+1 Michael Dürig
+1 Amit Jain
+1 Alex Parvulescu
+1 Davide Giannella
+1 Julian Reschke
+1 Thomas Mueller
I'll push the release out.
Thomas, your vote was a bit unclear. Your first statement was
a +1 vote. Later you voiced concerns and suggested to not
release the
On Tue, Mar 10, 2015 at 3:33 PM, Michael Dürig wrote:
> SegmentMK doesn't even have the concept of a previous revision of a
> NodeState.
Yes that is to be thought about. I want to read all previous revision
for path /oak:index/lucene/:data. For segment I believe I would need
to start at root refe
On 10.3.15 10:49 , Chetan Mehrotra wrote:
For Segment I am
not sure how to easily read previous revisions of given NodeState
SegmentMK doesn't even have the concept of a previous revision of a
NodeState.
Michael
> Is Oak already single instance when it comes to the identification and
> storage of binaries ?
Yes. Oak uses content addressable storage for binaries
> Are the existing TextExtractors also single instance ?
No. If same binary is referred at multiple places then text extraction
would be perfor
On Tue, Mar 10, 2015 at 1:50 PM, Michael Marth wrote:
> But I wonder: how do you envision that this new index cleanup would locate
> indexes in the content-addressed DS
Thats bit tricky. Have rough idea here on how to approach but would
require more thinking here. The approach I am thinking of i
Hi Chetan,
I like the idea.
But I wonder: how do you envision that this new index cleanup would locate
indexes in the content-addressed DS?
Michael
> On 10 Mar 2015, at 07:46, Chetan Mehrotra wrote:
>
> Hi Team,
>
> With storing of Lucene index files within DataStore our usage pattern
> of D
Hi,
Is Oak already single instance when it comes to the identification and
storage of binaries ?
Are the existing TextExtractors also single instance ?
By Single instance I mean, 1 copy of the binary and its token stream in the
repository regardless of how many times its referenced.
Best Regards
I
LuceneIndexEditor currently extract the binary contents via Tika in
same thread which is used for processing the commit. Such an approach
does not make good use of multi processor system specifically when
index is being built up as part of migration process.
Looking at JR2 I see LazyTextExtractor
15 matches
Mail list logo