That's another great example of a mode that Bulk Field Update (my mythical
feature) needs - switch a list of fields from stored to docvalues.

And maybe even the opposite since there are scenarios in which docValues is
worse than stored and you would only find that out after indexing...
billions of documents.

Being able to switch indexed mode of a field (or list of fields) is also a
mode needed for bulk update (reindex).


-- Jack Krupansky

On Fri, Mar 18, 2016 at 4:12 AM, Ishan Chattopadhyaya <
ichattopadhy...@gmail.com> wrote:

> Hi Mohsin,
> There's some work in progress for in-place updates to docValued fields,
> https://issues.apache.org/jira/browse/SOLR-5944. Can you try the latest
> patch there (or ping me if you need a git branch)?
> It would be nice to know how fast the updates go for your usecase with that
> patch. Please note that for that patch, both the version field and the
> updated field needs to have stored=false, indexed=false, docValues=true.
> Regards,
> Ishan
>
>
> On Thu, Mar 17, 2016 at 10:55 PM, Jack Krupansky <jack.krupan...@gmail.com
> >
> wrote:
>
> > It would be nice to have a wiki/doc for "Bulk Field Update" that listed
> all
> > of these techniques and tricks.
> >
> > And, of course, it would be so much better to have an explicit Lucene
> > feature for this. It could work in the background like merge and process
> > one segment at a time as efficiently as possible.
> >
> > Have several modes:
> >
> > 1. Set a field of all documents to explicit value.
> > 2. Set a field of query documents to an explicit value.
> > 3. Increment by n.
> > 4. Add new field to all document, or maybe by query.
> > 5. Delete existing field for all documents.
> > 6. Delete field value for all documents or a specified query.
> >
> >
> > -- Jack Krupansky
> >
> > On Thu, Mar 17, 2016 at 12:31 PM, Ken Krugler <
> kkrugler_li...@transpac.com
> > >
> > wrote:
> >
> > > As others noted, currently updating a field means deleting and
> inserting
> > > the entire document.
> > >
> > > Depending on how you use the field, you might be able to create another
> > > core/container with that one field (plus the key field), and use join
> > > support.
> > >
> > > Note that https://issues.apache.org/jira/browse/LUCENE-6352 is an
> > > improvement, which looks like it's in the 5.x code line, though I don't
> > see
> > > a fix version.
> > >
> > > -- Ken
> > >
> > > > From: Mohsin Beg Beg
> > > > Sent: March 16, 2016 3:52:47pm PDT
> > > > To: solr-user@lucene.apache.org
> > > > Subject: how to update billions of docs
> > > >
> > > > Hi,
> > > >
> > > > I have a requirement to replace a value of a field in 100B's of docs
> in
> > > 100's of cores.
> > > > The field is multiValued=false docValues=true type=StrField
> stored=true
> > > indexed=true.
> > > >
> > > > Atomic Updates performance is on the order of 5K docs per sec per
> core
> > > in solr 5.3 (other fields are quite big).
> > > >
> > > > Any suggestions ?
> > > >
> > > > -Mohsin
> > >
> > >
> > > --------------------------
> > > Ken Krugler
> > > +1 530-210-6378
> > > http://www.scaleunlimited.com
> > > custom big data solutions & training
> > > Hadoop, Cascading, Cassandra & Solr
> > >
> > >
> > >
> > >
> > >
> > >
> >
>

Reply via email to