sort=“id asc” wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)
> On Feb 10, 2020, at 9:50 PM, Tim Casey <tca...@gmail.com> wrote: > > Walter, > > When you do the query, what is the sort of the results? > > tim > > On Mon, Feb 10, 2020 at 8:44 PM Walter Underwood <wun...@wunderwood.org> > wrote: > >> I’ll back up a bit, since it is sort of an X/Y problem. >> >> I have an index with four shards and 17 million documents. I want to dump >> all the docs in JSON, label each one with a classifier, then load them back >> in with the labels. This is a one-time (or rare) bootstrap of the >> classified data. This will unblock testing and relevance work while we get >> the classifier hooked into the indexing pipeline. >> >> Because I’m dumping all the fields, we can’t rely on docValues. >> >> It is OK if it takes a few hours. >> >> Right now, it is running about 1.7 Mdoc/hour, so roughly 10 hours. That is >> 16 threads searching id:0* through id:f*, fetching 1000 rows each time, >> using cursorMark and distributed search. Median response time is 10 s. CPU >> usage is about 1%. >> >> It is all pretty grubby and it seems like there could be a better way. >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >>> On Feb 10, 2020, at 3:39 PM, Erick Erickson <erickerick...@gmail.com> >> wrote: >>> >>> Any field that’s unique per doc would do, but yeah, that’s usually an ID. >>> >>> Hmmm, I don’t see why separate queries for 0-f are necessary if you’re >> firing >>> at individual replicas. Each replica should have multiple UUIDs that >> start with 0-f. >>> >>> Unless I misunderstand and you’re just firing off, say, 16 threads at >> the entire >>> collection rather than individual shards which would work too. But for >> individual >>> shards I think you need to look for all possible IDs... >>> >>> Erick >>> >>>> On Feb 10, 2020, at 5:37 PM, Walter Underwood <wun...@wunderwood.org> >> wrote: >>>> >>>> >>>>> On Feb 10, 2020, at 2:24 PM, Walter Underwood <wun...@wunderwood.org> >> wrote: >>>>> >>>>> Not sure if range queries work on a UUID field, ... >>>> >>>> A search for id:0* took 260 ms, so it looks like they work just fine. >> I’ll try separate queries for 0-f. >>>> >>>> wunder >>>> Walter Underwood >>>> wun...@wunderwood.org >>>> http://observer.wunderwood.org/ (my blog) >>>> >>> >> >>