The responses are collected by node so subsequent responses from the same node overwrite previous responses. Definitely a bug. Please open an issue.
On Mon, Oct 15, 2018 at 6:24 AM Shawn Heisey <apa...@elyograg.org> wrote: > On 10/14/2018 6:25 PM, dami...@gmail.com wrote: > > I had an issue with async backup on solr 6.5.1 reporting that the backup > > was complete when clearly it was not. I was using 12 shards across 6 > nodes. > > I only noticed this issue when one shard was much larger than the others. > > There were no answers here > > http://lucene.472066.n3.nabble.com/async-backup-td4342776.html > > One detail I thought I had written but isn't there: The backup did > fully complete -- all 30 shards were in the backup location. Not a lot > in each shard backup -- the collection was empty. It would be easy > enough to add a few thousand documents to the collection before doing > the backup. > > If the backup process reports that it's done before it's ACTUALLY done, > that's a bad thing. It's hard to say whether that problem is related to > the problem I described. Since I haven't dived into the code, I cannot > say for sure, but it honestly would not surprise me to find they are > connected. Every time I try to understand Collections API code, I find > it extremely difficult to follow. > > I'm sorry that you never got resolution on your problem. Do you know > whether that is still a problem in 7.x? Setting up a reproduction where > one shard is significantly larger than the others will take a little bit > of work. > > > I was focusing on the STATUS returned from the REQUESTSTATUS command, but > > looking again now I can see a response from only 6 shards, and each shard > > is from a different node. So this fits with what you're seeing. I assume > > your shards 1, 7, 9 are all on different nodes. > > I did not actually check, and the cloud example I was using isn't around > any more, but each of the shards in the status response were PROBABLY on > separate nodes. The cloud example was 3 nodes. It's an easy enough > scenario to replicate, and I provided enough details for anyone to do it. > > The person on IRC that reported this problem had a cluster of 15 nodes, > and the status response had ten shards (out of 30) mentioned. It was > shards 1-9 and shard 20. The suspicion is that there's something > hard-coded that limits it to 10 responses ... because without that, I > would expect the number of shards in the response to match the number of > nodes. > > Thanks, > Shawn > > -- Regards, Shalin Shekhar Mangar.