On 2/1/07, Ryan McKinley <[EMAIL PROTECTED]> wrote:
>
> Not sure... depends on how update handlers will use it...
by update handler, you mean UpdateRequestHandler(s)? or UpdateHandler?
Both.
> One thing we might not want to get rid of though is streaming
> (constructing and adding a document, then discarding it). People are
> starting to add a lot of documents in a single XML request, and this
> will be much larger for CVS/SQL.
>
So you are uncomfortable with the Collection because you would have to
load all the documents before indexing them. If this was many, it
could be a problem...
If UpdateHandler is going to take care of stuff like autocommit and
modifying documents, It seems best to have that apply to all the
documents you are going to modify as a unit. For example, say i have
a SQL updater that will modify 100,000 documents incrementing field
'count_*' and replacing 'fl_*'. If the DocumentCommand only applies
to a single document, it would have to match each field as it went
along rather then once when it starts.
How about: Iterable<SolrDocument>
Maybe... but that might not be the easiest for request handlers to
use... they would then need to spin up a different thread and use a
pull model (provide a new doc on demand) rather than push (call
addDocument()).
I'm really just thinking a little out loud... just first impressions
- don't read too much into it.
When I'm coding, the design tends to morph a lot.
I think we need to figure out what type of update semantics we want
w.r.t. adding multiple documents, and all the other misc autocommit
params.
-Yonik