On 7/13/07, Mike Klaas <[EMAIL PROTECTED]> wrote:
>>> ... ParallelReader, where some fields are in one sub-index ...
>> the processor would ask the updateHandler for the existing
>> document - the updateHandler deals with
>> getting it to/from the right place.
>
> The big reason you would use ParallelReader is to avoid touching
> the less-modified/bigger fields in one index when changing some of
> the other fields in the other index.

I've pondered this a few times: it could be a huge win for
highlighting apps, which can be stored-field-heavy.

However, I wonder if there is something that I am missing: PR
requires perfect synchro of lucene doc ids, no?  If you update fields
for a doc in one index, need not you (re-)store the fields in all
other indices too, to keep the doc ids in sync?

Well, it would be tricky... one PR usecase would be to entirely
re-index one field (in it's own separate index) thus maintaining
synchronization with the main index. As Doug said
"ParallelReader was not really designed to support incremental updates of
fields, but rather to accellerate batch updates. For incremental
updates you're probably better served by updating a single index."

That's probably not too useful for a general purpose platform like Solr.

Another way to support a more incremental model is perhaps to split up
the smaller volatile index into many segments so that updating a
single doc involves rewriting just that segment.

There might also be possibilities in different types of IndexReader
implementations:  one could map docids to maintain synchronization.
This brings up a slightly different problem that lucene scorers expect
to go in docid order.

-Yonik

Reply via email to