Walter,

When you do the query, what is the sort of the results?

tim

On Mon, Feb 10, 2020 at 8:44 PM Walter Underwood <wun...@wunderwood.org>
wrote:

> I’ll back up a bit, since it is sort of an X/Y problem.
>
> I have an index with four shards and 17 million documents. I want to dump
> all the docs in JSON, label each one with a classifier, then load them back
> in with the labels. This is a one-time (or rare) bootstrap of the
> classified data. This will unblock testing and relevance work while we get
> the classifier hooked into the indexing pipeline.
>
> Because I’m dumping all the fields, we can’t rely on docValues.
>
> It is OK if it takes a few hours.
>
> Right now, it is running about 1.7 Mdoc/hour, so roughly 10 hours. That is
> 16 threads searching id:0* through id:f*, fetching 1000 rows each time,
> using cursorMark and distributed search. Median response time is 10 s. CPU
> usage is about 1%.
>
> It is all pretty grubby and it seems like there could be a better way.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Feb 10, 2020, at 3:39 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
> >
> > Any field that’s unique per doc would do, but yeah, that’s usually an ID.
> >
> > Hmmm, I don’t see why separate queries for 0-f are necessary if you’re
> firing
> > at individual replicas. Each replica should have multiple UUIDs that
> start with 0-f.
> >
> > Unless I misunderstand and you’re just firing off, say, 16 threads at
> the entire
> > collection rather than individual shards which would work too. But for
> individual
> > shards I think you need to look for all possible IDs...
> >
> > Erick
> >
> >> On Feb 10, 2020, at 5:37 PM, Walter Underwood <wun...@wunderwood.org>
> wrote:
> >>
> >>
> >>> On Feb 10, 2020, at 2:24 PM, Walter Underwood <wun...@wunderwood.org>
> wrote:
> >>>
> >>> Not sure if range queries work on a UUID field, ...
> >>
> >> A search for id:0* took 260 ms, so it looks like they work just fine.
> I’ll try separate queries for 0-f.
> >>
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >
>
>

Reply via email to