Re: loading many documents by ID

Ryan McKinley Fri, 02 Feb 2007 03:16:02 -0800


1) regardless of the verb (updatable/modifiable) i'm not sure that it
makes sense to annotate in the schema the fields that should be copied on
update, and not label the feilds that must be "set" on update (ie: the
fields that cannot be copied)


I agree.  I started down that path, and it gets pretty ugly.  I
stopped.  I have opted for a syntax that 'updates' all stored fields,
but lets you say explicitly what to do for each field.  If there is a
stored field you want to skip, you can specify that in command rather
then in the schema.

another simple approach would be to make "updatability" a property of the
schema, that can contain a few different values...
 "strict" - indexed and stored are no longer valid field(type)
            attributes -- all fields are indexed and stored. all fields
            are copied on "update" unless the update command inlcudes
            instructions to replace, append or incriment the field value
  "loose" - indexed/stored still exist, any attempt to "update" an
            existing document is legal, all stored fields are copied
            on update unless the update command includes in structures
            to replace, append or increment the field value.
   "none" - any attempt to update will fail.


This is an interisting idea, but (if i'm understaning your suggestion
correctly) it seems like TOO big of change from the existing schema.

The more I think about the 'error' behavior, I am convinced we just
need solid, easily explainable logic for what happens and why.  I
think throwing an error if there are no stored fields is reasonable
and only updating stored fields is simple enough logic I don't think
we need to over complicate it.

another approach i don't really have fully fleshed out in my head would be
to introduce a concept of "fieldsets" ... an update that
sets/appends/incrments a field in a fieldset which does not provide a


I may be working on this, but not sure if it is what you are saying.  I have:

public class IndexDocumentCommand
{
 public enum FieldMODE {
   APPEND,    // add the fields to existing fields
   OVERWRITE, // overwrite existing fields
   INCREMENT, // increment existing field.  Must be a number!
   DISTINCT,  // same as APPEND, but make sure there are distinct values
   IGNORE     // ignore the previous value -- don't copy it
 };

 public Iterable<SolrDocument> docs;
 public Map<String,FieldMODE> fieldMode; // What to do for each field.
 public int commitMaxTime = -1;
}

If fieldMode is null or they are all OVERWRITE, the addDoc command
behaves as it always has.  Otherwise, it first extracts the exiting
stored values (unless the fieldMode is IGNORE) then applies the new
documents value on top of the old one.

Currently I am only handling wildcard substitution for "*" - the
default mode.  I have not tried to tackle dynamic fields yet...  it
seems a bit more complicated!

Re: loading many documents by ID

Reply via email to