Re: loading many documents by ID

Ryan McKinley Fri, 02 Feb 2007 10:48:06 -0800

>
> How about: Iterable<SolrDocument>

Maybe... but that might not be the easiest for request handlers to
use... they would then need to spin up a different thread and use a
pull model (provide a new doc on demand) rather than push (call
addDocument()).


With Iterable, you don't need to start a thread to implement a
'streaming' parser.  You can use an anonymous inner class that waits
until next() is called before reading the next row/line/document, etc.
In affect this lets the RequestHandler set up all the common
configurations and then lets the UpdateHandler ask for a document one
at a time.

What I like about this is that the code that loops through each row of
my SQL updater does not need to know *anything* about the
UpdateHandler.  I would rather not call updater.addDoc( cmd ) within
the while( rs.next() )  loop.  This makes it much cleaner and easier
to test.

If writing a 'streaming' Iterable is more trouble then someone wants
to go through, they can easily return a Collection<SolrDocument> or an
array with single element.

When I'm coding, the design tends to morph a lot.


mine too!

I think we need to figure out what type of update semantics we want
w.r.t. adding multiple documents, and all the other misc autocommit
params.


Right now, what i am working with is an 'update' command that you can
pass along modes for each field.  If no modes are specified (or they
are all OVERWRITE) it behaves exactly as we have now (SQL REPLACE).
If any field uses something other then OVERWRITE, it behaves like an
SQL INSERT ... ON DUPLICATE KEY UPDATE.

Re: loading many documents by ID

Reply via email to