Re: cursorMark and shards? (6.6.2)

2020-02-11 Thread Erick Erickson
Curiouser and curiouser. So two possibilities are just the time it takes to assemble the packet and the time it takes to send it back. Three more experiments then. 1> change the returned doc to return a single docValues=true field. My claim: The response will be very close to the 400-600 ms

Re: cursorMark and shards? (6.6.2)

2020-02-11 Thread Walter Underwood
Good questions. Here is the QTime for rows=1000. Looks pretty reasonable. I’d blame the slowness on the VPN connection, but the median response time of 10,000 msec is measured at the server. The client is in Python, using wt=json. Average document size in JSON is 5132 bytes. The system should

Re: cursorMark and shards? (6.6.2)

2020-02-11 Thread Erick Erickson
Wow, that’s pretty horrible performance. Yeah, I was conflating a couple of things here. Now it’s clear. If you specify rows=1, what do you get in response time? I’m wondering if your time is spent just assembling the response rather than searching. You’d have to have massive docs for that to

Re: cursorMark and shards? (6.6.2)

2020-02-10 Thread Walter Underwood
sort=“id asc” wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 10, 2020, at 9:50 PM, Tim Casey wrote: > > Walter, > > When you do the query, what is the sort of the results? > > tim > > On Mon, Feb 10, 2020 at 8:44 PM Walter Underwood >

Re: cursorMark and shards? (6.6.2)

2020-02-10 Thread Tim Casey
Walter, When you do the query, what is the sort of the results? tim On Mon, Feb 10, 2020 at 8:44 PM Walter Underwood wrote: > I’ll back up a bit, since it is sort of an X/Y problem. > > I have an index with four shards and 17 million documents. I want to dump > all the docs in JSON, label

Re: cursorMark and shards? (6.6.2)

2020-02-10 Thread Walter Underwood
I’ll back up a bit, since it is sort of an X/Y problem. I have an index with four shards and 17 million documents. I want to dump all the docs in JSON, label each one with a classifier, then load them back in with the labels. This is a one-time (or rare) bootstrap of the classified data. This

Re: cursorMark and shards? (6.6.2)

2020-02-10 Thread Michael Gibney
Possibly worth mentioning, although it might not be appropriate for your use case: if the fields you're interested in are configured with docValues, you could use streaming expressions (or directly handle thread-per-shard connections to the /export handler) and get everything in a single shot

Re: cursorMark and shards? (6.6.2)

2020-02-10 Thread Erick Erickson
Any field that’s unique per doc would do, but yeah, that’s usually an ID. Hmmm, I don’t see why separate queries for 0-f are necessary if you’re firing at individual replicas. Each replica should have multiple UUIDs that start with 0-f. Unless I misunderstand and you’re just firing off, say, 16

Re: cursorMark and shards? (6.6.2)

2020-02-10 Thread Walter Underwood
> On Feb 10, 2020, at 2:24 PM, Walter Underwood wrote: > > Not sure if range queries work on a UUID field, ... A search for id:0* took 260 ms, so it looks like they work just fine. I’ll try separate queries for 0-f. wunder Walter Underwood wun...@wunderwood.org

Re: cursorMark and shards? (6.6.2)

2020-02-10 Thread Walter Underwood
I’ll give that a shot. Not sure if range queries work on a UUID field, but I have thought of segmenting the ID space and running parallel queries on those. Right now it is sucking over 1.6 million docs per hour, so that is bearable. Making it 4X or 16 X faster would be nice, though. wunder

Re: cursorMark and shards? (6.6.2)

2020-02-10 Thread Erick Erickson
Not sure whether cursormark respects distrib=false, although I can easily see there being “complications” here. Hmmm, whenever I try to use distrib=false, I usually fire the query at the specific replica rather than use the shards parameter. IDK whether that’ll make any difference.

cursorMark and shards? (6.6.2)

2020-02-10 Thread Walter Underwood
I tried to get fancy and dump our content with one thread per shard, but it did distributed search anyway. I specified the shard using the “shards” param and set distrib=false. Is this a bug or expected behavior in 6.6.2? I did not see it mentioned in the docs. It is working fine with a