Had a offline chat with Thomas on that and for now creationTime based
approach can be used to allow index logic to distinguish between
reindex and fresh index.
Thomas proposal above was more to avoid large transaction problem
where new index would be build side by side. With Lucene this is not a
It might be simpler if we just record the index creation time in the
index definition node itself (or some predefined meta node under
definition node). This can be done in IndexUpdate itself where it
would set the time when it triggers a reindex or the first index.
Later Lucene would make use of
Hi,
It might be simpler if we just record the index creation time in the
index definition node itself (or some predefined meta node under
definition node). This can be done in IndexUpdate itself where it
would set the time when it triggers a reindex or the first index.
Sorry I don't understand.
On Tue, Oct 21, 2014 at 1:18 PM, Thomas Mueller muel...@adobe.com wrote:
What we need is a distinction between the old and the new index *data*.
Yes and that can be done by storing the index creation time.
In the approach you suggested where two different nodes are used and
later the nodes are
Hi,
The node doesn't need to be moved, even after multiple reindex operations.
Please note index creation is no different from reindex. In both cases, a
new index data node is created. So, if an index definition is created:
/oak:index/lucene
Then the index is being built:
Thanks for the details Thomas!
But above model varies from current model which make use of MVCC. The
reindex operation triggers removal of :data node in branch and
IndexReader always looks for :data node to open the directory on
trunk. So while reindex is in progress existing readers make use of
Hi,
Yes, that's my point. I wouldn't use MVCC for reindexing the Lucene index.
Reindexing is very costly, and I wouldn't do it in one huge, and possibly
hours long transaction.
* You need to have access to the old and (for readers) the new data (to
re-create the index)
* Eventually, you want to
Hi,
If we set reindex to true in any index definition then Oak would
remove the existing index content before performing the reindex. This
would work fine if the index content are stored within NodeStore
itself.
It is important to also specify that this appears as a single commit thanks
to
Hi,
As for external Lucene indexes, what about this:
* in the :data node, store a index creation time, in milliseconds since
1970
* use that as a path prefix for the actual index files
So if the index is configured as follows:
/oak:index/lucene { path: /quickstart/repo/lucenIndex }
Then
Hi,
Then would use that UUID as the prefix ...
Sorry, that should be Then would use that _time_ as the prefix ... - I
thought about using a UUID first, but then changed to milliseconds since
1970, as that's easier (you immediately see which one is the latest
directory). But UUID would work as
Hi,
If we set reindex to true in any index definition then Oak would
remove the existing index content before performing the reindex. This
would work fine if the index content are stored within NodeStore
itself.
However if the index are stored externally e.g. Solr or Lucene index
with
11 matches
Mail list logo