Re: Reindex and external indexes - Possibility of stale index data

2014-10-27 Thread Chetan Mehrotra
Had a offline chat with Thomas on that and for now creationTime based approach can be used to allow index logic to distinguish between reindex and fresh index. Thomas proposal above was more to avoid large transaction problem where new index would be build side by side. With Lucene this is not a

Re: Reindex and external indexes - Possibility of stale index data

2014-10-21 Thread Chetan Mehrotra
It might be simpler if we just record the index creation time in the index definition node itself (or some predefined meta node under definition node). This can be done in IndexUpdate itself where it would set the time when it triggers a reindex or the first index. Later Lucene would make use of

Re: Reindex and external indexes - Possibility of stale index data

2014-10-21 Thread Thomas Mueller
Hi, It might be simpler if we just record the index creation time in the index definition node itself (or some predefined meta node under definition node). This can be done in IndexUpdate itself where it would set the time when it triggers a reindex or the first index. Sorry I don't understand.

Re: Reindex and external indexes - Possibility of stale index data

2014-10-21 Thread Chetan Mehrotra
On Tue, Oct 21, 2014 at 1:18 PM, Thomas Mueller muel...@adobe.com wrote: What we need is a distinction between the old and the new index *data*. Yes and that can be done by storing the index creation time. In the approach you suggested where two different nodes are used and later the nodes are

Re: Reindex and external indexes - Possibility of stale index data

2014-10-21 Thread Thomas Mueller
Hi, The node doesn't need to be moved, even after multiple reindex operations. Please note index creation is no different from reindex. In both cases, a new index data node is created. So, if an index definition is created: /oak:index/lucene Then the index is being built:

Re: Reindex and external indexes - Possibility of stale index data

2014-10-21 Thread Chetan Mehrotra
Thanks for the details Thomas! But above model varies from current model which make use of MVCC. The reindex operation triggers removal of :data node in branch and IndexReader always looks for :data node to open the directory on trunk. So while reindex is in progress existing readers make use of

Re: Reindex and external indexes - Possibility of stale index data

2014-10-21 Thread Thomas Mueller
Hi, Yes, that's my point. I wouldn't use MVCC for reindexing the Lucene index. Reindexing is very costly, and I wouldn't do it in one huge, and possibly hours long transaction. * You need to have access to the old and (for readers) the new data (to re-create the index) * Eventually, you want to

Re: Reindex and external indexes - Possibility of stale index data

2014-10-13 Thread Alex Parvulescu
Hi, If we set reindex to true in any index definition then Oak would remove the existing index content before performing the reindex. This would work fine if the index content are stored within NodeStore itself. It is important to also specify that this appears as a single commit thanks to

Re: Reindex and external indexes - Possibility of stale index data

2014-10-13 Thread Thomas Mueller
Hi, As for external Lucene indexes, what about this: * in the :data node, store a index creation time, in milliseconds since 1970 * use that as a path prefix for the actual index files So if the index is configured as follows: /oak:index/lucene { path: /quickstart/repo/lucenIndex } Then

Re: Reindex and external indexes - Possibility of stale index data

2014-10-13 Thread Thomas Mueller
Hi, Then would use that UUID as the prefix ... Sorry, that should be Then would use that _time_ as the prefix ... - I thought about using a UUID first, but then changed to milliseconds since 1970, as that's easier (you immediately see which one is the latest directory). But UUID would work as

Reindex and external indexes - Possibility of stale index data

2014-10-12 Thread Chetan Mehrotra
Hi, If we set reindex to true in any index definition then Oak would remove the existing index content before performing the reindex. This would work fine if the index content are stored within NodeStore itself. However if the index are stored externally e.g. Solr or Lucene index with