Curiouser and curiouser. So two possibilities are just the time it takes to
assemble the packet and the time it takes to send it back. Three more
experiments then.
1> change the returned doc to return a single docValues=true field. My claim:
The response will be very close to the 400-600 ms
Good questions. Here is the QTime for rows=1000. Looks pretty reasonable. I’d
blame the slowness on the VPN connection, but the median response time of
10,000 msec is measured at the server.
The client is in Python, using wt=json. Average document size in JSON is 5132
bytes. The system should
Wow, that’s pretty horrible performance.
Yeah, I was conflating a couple of things here. Now it’s clear.
If you specify rows=1, what do you get in response time? I’m wondering if
your time is spent just assembling the response rather than searching. You’d
have to have massive docs for that to
sort=“id asc”
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Feb 10, 2020, at 9:50 PM, Tim Casey wrote:
>
> Walter,
>
> When you do the query, what is the sort of the results?
>
> tim
>
> On Mon, Feb 10, 2020 at 8:44 PM Walter Underwood
>
Walter,
When you do the query, what is the sort of the results?
tim
On Mon, Feb 10, 2020 at 8:44 PM Walter Underwood
wrote:
> I’ll back up a bit, since it is sort of an X/Y problem.
>
> I have an index with four shards and 17 million documents. I want to dump
> all the docs in JSON, label
I’ll back up a bit, since it is sort of an X/Y problem.
I have an index with four shards and 17 million documents. I want to dump all
the docs in JSON, label each one with a classifier, then load them back in with
the labels. This is a one-time (or rare) bootstrap of the classified data. This
Possibly worth mentioning, although it might not be appropriate for
your use case: if the fields you're interested in are configured with
docValues, you could use streaming expressions (or directly handle
thread-per-shard connections to the /export handler) and get
everything in a single shot
Any field that’s unique per doc would do, but yeah, that’s usually an ID.
Hmmm, I don’t see why separate queries for 0-f are necessary if you’re firing
at individual replicas. Each replica should have multiple UUIDs that start with
0-f.
Unless I misunderstand and you’re just firing off, say, 16
> On Feb 10, 2020, at 2:24 PM, Walter Underwood wrote:
>
> Not sure if range queries work on a UUID field, ...
A search for id:0* took 260 ms, so it looks like they work just fine. I’ll try
separate queries for 0-f.
wunder
Walter Underwood
wun...@wunderwood.org
I’ll give that a shot.
Not sure if range queries work on a UUID field, but I have thought of
segmenting the ID space and running parallel queries on those.
Right now it is sucking over 1.6 million docs per hour, so that is bearable.
Making it 4X or 16 X faster would be nice, though.
wunder
Not sure whether cursormark respects distrib=false, although I can easily see
there being “complications” here.
Hmmm, whenever I try to use distrib=false, I usually fire the query at the
specific replica rather than use the shards parameter. IDK whether that’ll make
any difference.
I tried to get fancy and dump our content with one thread per shard, but it did
distributed search anyway. I specified the shard using the “shards” param and
set distrib=false.
Is this a bug or expected behavior in 6.6.2? I did not see it mentioned in the
docs.
It is working fine with a
12 matches
Mail list logo