Curiouser and curiouser. So two possibilities are just the time it takes to assemble the packet and the time it takes to send it back. Three more experiments then.
1> change the returned doc to return a single docValues=true field. My claim: The response will be very close to the 400-600 ms range back at the client. This is kind of a sanity check, it should collect the field to return without visiting the stored data/decompressing it etc. 2> change the returned doc to return a single docValues=false stored=true field. That’ll exercise the whole fetch-from-disk-and-decompress cycle because all the stored values for a doc needs to be decompressed if you access even one. If that comes back in, say, < 1 second then the speed issues are either GC thrashing or your network would be my guess. If it’s in the same 10s range, then I’d be looking at GC and the like. 3> change the returned rows to, say, 100 while returning all rows. If you see a pretty linear relationship between the number of docs and the response time then at least we know where to dig. But the 1% CPU utilization makes me suspect transmission. If it were read-from-disk-and-decompress I’d expect more CPU due to the decompression phase unless you have a very slow disk. So apparently CursorMark was a red herring? it is _vaguely_ possible that bumping the documentCache higher might help, but frankly I doubt it. That would help if you had a situation where you were having to re-access the stored data from disk for several different components in a single query, but I don’t think that pertains. But then I’m surprised you’re seeing this at all so what I think, as an author I read had a character say, “ain’t worth a goldfish fart” in the face of evidence. Best, Erick > On Feb 11, 2020, at 12:33 PM, Walter Underwood <wun...@wunderwood.org> wrote: > > Good questions. Here is the QTime for rows=1000. Looks pretty reasonable. I’d > blame the slowness on the VPN connection, but the median response time of > 10,000 msec is measured at the server. > > The client is in Python, using wt=json. Average document size in JSON is 5132 > bytes. The system should not be IO bound, but I’ll check. The instances have > 31 GB of memory, shards are 40 GB on SSD. I don’t think I set up JVM > monitoring on this cluster, so I can’t see if the GC is thrashing. > > 2020-02-11T08:51:33 INFO QTime=401 > 2020-02-11T08:51:34 INFO QTime=612 > 2020-02-11T08:51:34 INFO QTime=492 > 2020-02-11T08:51:35 INFO QTime=513 > 2020-02-11T08:51:36 INFO QTime=458 > 2020-02-11T08:51:36 INFO QTime=419 > 2020-02-11T08:51:46 INFO QTime=477 > 2020-02-11T08:51:47 INFO QTime=479 > 2020-02-11T08:51:47 INFO QTime=457 > 2020-02-11T08:51:50 INFO QTime=553 > 2020-02-11T08:51:50 INFO QTime=658 > 2020-02-11T08:51:52 INFO QTime=379 > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > >> On Feb 11, 2020, at 6:28 AM, Erick Erickson <erickerick...@gmail.com> wrote: >> >> Wow, that’s pretty horrible performance. >> >> Yeah, I was conflating a couple of things here. Now it’s clear. >> >> If you specify rows=1, what do you get in response time? I’m wondering if >> your time is spent just assembling the response rather than searching. You’d >> have to have massive docs for that to be the case, kind of a shot in the >> dark. >> The assembly step requires the docs be read off disk, decompressed and then >> transmitted, but 10 seconds is ridiculous for that. I’m starting to wonder >> about >> being I/O bound either disk wise or network, but I’m pretty sure you’ve >> already >> thought about that. >> >> You are transmitting things around your servers given your statement that you >> are seeing the searches distributed, which is also a waste, but again I >> wouldn’t >> expect it to be that bad. >> >> Hmmm, quick thing to check: What are the QTime’s reported? Those are >> exclusive of assembling the return packet. If they were a few milliseconds >> and >> your response back at the client was 10s, that’d be a clue. >> >> Best, >> Erick >> >>> On Feb 11, 2020, at 2:13 AM, Walter Underwood <wun...@wunderwood.org> wrote: >>> >>> sort=“id asc” >>> >>> wunder >>> Walter Underwood >>> wun...@wunderwood.org >>> http://observer.wunderwood.org/ (my blog) >>> >>>> On Feb 10, 2020, at 9:50 PM, Tim Casey <tca...@gmail.com> wrote: >>>> >>>> Walter, >>>> >>>> When you do the query, what is the sort of the results? >>>> >>>> tim >>>> >>>> On Mon, Feb 10, 2020 at 8:44 PM Walter Underwood <wun...@wunderwood.org> >>>> wrote: >>>> >>>>> I’ll back up a bit, since it is sort of an X/Y problem. >>>>> >>>>> I have an index with four shards and 17 million documents. I want to dump >>>>> all the docs in JSON, label each one with a classifier, then load them >>>>> back >>>>> in with the labels. This is a one-time (or rare) bootstrap of the >>>>> classified data. This will unblock testing and relevance work while we get >>>>> the classifier hooked into the indexing pipeline. >>>>> >>>>> Because I’m dumping all the fields, we can’t rely on docValues. >>>>> >>>>> It is OK if it takes a few hours. >>>>> >>>>> Right now, it is running about 1.7 Mdoc/hour, so roughly 10 hours. That is >>>>> 16 threads searching id:0* through id:f*, fetching 1000 rows each time, >>>>> using cursorMark and distributed search. Median response time is 10 s. CPU >>>>> usage is about 1%. >>>>> >>>>> It is all pretty grubby and it seems like there could be a better way. >>>>> >>>>> wunder >>>>> Walter Underwood >>>>> wun...@wunderwood.org >>>>> http://observer.wunderwood.org/ (my blog) >>>>> >>>>>> On Feb 10, 2020, at 3:39 PM, Erick Erickson <erickerick...@gmail.com> >>>>> wrote: >>>>>> >>>>>> Any field that’s unique per doc would do, but yeah, that’s usually an ID. >>>>>> >>>>>> Hmmm, I don’t see why separate queries for 0-f are necessary if you’re >>>>> firing >>>>>> at individual replicas. Each replica should have multiple UUIDs that >>>>> start with 0-f. >>>>>> >>>>>> Unless I misunderstand and you’re just firing off, say, 16 threads at >>>>> the entire >>>>>> collection rather than individual shards which would work too. But for >>>>> individual >>>>>> shards I think you need to look for all possible IDs... >>>>>> >>>>>> Erick >>>>>> >>>>>>> On Feb 10, 2020, at 5:37 PM, Walter Underwood <wun...@wunderwood.org> >>>>> wrote: >>>>>>> >>>>>>> >>>>>>>> On Feb 10, 2020, at 2:24 PM, Walter Underwood <wun...@wunderwood.org> >>>>> wrote: >>>>>>>> >>>>>>>> Not sure if range queries work on a UUID field, ... >>>>>>> >>>>>>> A search for id:0* took 260 ms, so it looks like they work just fine. >>>>> I’ll try separate queries for 0-f. >>>>>>> >>>>>>> wunder >>>>>>> Walter Underwood >>>>>>> wun...@wunderwood.org >>>>>>> http://observer.wunderwood.org/ (my blog) >>>>>>> >>>>>> >>>>> >>>>> >>> >> >