Hi

Any distributed lookup is basically composed of two stages: the first
collecting all the matching documents from every shard and a second which
fetches additional information about specific ids (i.e stored, termVectors).

It can be seen in the logs of each shard (isShard=true), where first
request logs the num of hits that were received on the query by the
specific shard and a second that contains the ids fields (ids=...) for the
additional fetch.
At the end of both I get a total QTime of the query and the total num of
hits.

My question is about the case only id's are requested (fl=id). This query
should make only one request against a shard, while it actually does the
two of them.

Looks like the response builder has to go through these two stages no
matter what is the kind of query.

My question:
1. Is it normal the response builder has to go though both stages?
2. Does the first request gets internal lucene DocId's or the actual
uniqueKey id?
3. A query as above (fl=id), where is the Id read from? Is it fetched from
the stored file? or doc value file if exists? Because if fetched from the
stored, a high row param (say 1000 in my case) would need 1000 lookups
which could badly heart performance.

Thanks
Manuel

Reply via email to