> > How about: Iterable<SolrDocument> Maybe... but that might not be the easiest for request handlers to use... they would then need to spin up a different thread and use a pull model (provide a new doc on demand) rather than push (call addDocument()).
With Iterable, you don't need to start a thread to implement a 'streaming' parser. You can use an anonymous inner class that waits until next() is called before reading the next row/line/document, etc. In affect this lets the RequestHandler set up all the common configurations and then lets the UpdateHandler ask for a document one at a time. What I like about this is that the code that loops through each row of my SQL updater does not need to know *anything* about the UpdateHandler. I would rather not call updater.addDoc( cmd ) within the while( rs.next() ) loop. This makes it much cleaner and easier to test. If writing a 'streaming' Iterable is more trouble then someone wants to go through, they can easily return a Collection<SolrDocument> or an array with single element.
When I'm coding, the design tends to morph a lot.
mine too!
I think we need to figure out what type of update semantics we want w.r.t. adding multiple documents, and all the other misc autocommit params.
Right now, what i am working with is an 'update' command that you can pass along modes for each field. If no modes are specified (or they are all OVERWRITE) it behaves exactly as we have now (SQL REPLACE). If any field uses something other then OVERWRITE, it behaves like an SQL INSERT ... ON DUPLICATE KEY UPDATE.
