At 6:43 AM -0500 1/11/07, Erik Hatcher wrote:
>If all fields are stored, the implementation could simply pull them all into 
>memory on the Solr side and add the document as if it had been sent entirely 
>by the client.  But, what happens when for un-stored fields?

I'll observe that Luke has a "Reconstruct and Edit" function which displays the 
indexed values for each field for the selected Document when stored values 
aren't available... it iterates the entire inverted index and intersects each 
term position vector with the target Document ID via TermPositions.skipTo(id).

While that would be too slow to do on a per-update basis, it might be feasible 
for an update function if it cached a list of partially defined Documents and 
only at the end (at closing or whenever the list grew past a defined maximum) 
did a bulk intersection to find indexed values which are not overridden with 
new values, with just a single traversal of the index in Term then updated 
DocID order.  Once done the reconstructed Documents could be added and the 
prior versions deleted.

The roadblocks come up when re-adding the indexed values to the index: while 
the updater can create a new untokenized unstored Field for each indexed value 
so it is literally re-added, in that case there is no way to externally specify 
the position offset to match the original.  DocumentWriter and the classes it 
relies on are package-private and final, so no way to interpose there.  But an 
effective hack might be to set the reconstructed Fields to tokenized but 
specify for those fields a special Analyzer which acts like Keyword Analyzer 
but looks up the position offset in a table created by the update mechanism and 
returns it with the token.  A little convoluted but probably doable if someone 
had the time and inclination.

- J.J.

Reply via email to