I have a situation where I am trying to setup a once daily cron job on the master node to delete old documents from the index based on our retention policy.
I delete only 1days worth of data based on my schema which deletes couple of 1000 docs and not more. This is a test cluster and the doc counts and size is not very high: Num Docs:515727; Max Doc:591322; Heap Memory Usage:-1; Deleted Docs:75595 And Index Version Gen Size Master (Searching) 1548694802284 51396 969.28 MB Master (Replicable) 1548694802284 51396 - Slave (Searching) 1548694802284 51396 969.28 MB Sometimes I notice the replication hangs. No errors but it is trying to download a segments_* file (e.g. segments_1bnx7) and just sits there. No logs. I am unable to stop replication (using abortfetch) once it reaches this state. Disable polling works (which is set to 60 seconds) but that doesn't help. The only thing that helps is a service solr stop/start. Then the next poll works, and the slave version/gen/size/doc count/deleted counts matches the master. Not every delete cron execution hangs. The segment file I notice being downloaded during the “hung” state is no longer available in the master. The master has already created a new segment* file. The cron job basically does this (min and max are a day dange): DELETE="\"started:[${MINDATE} TO ${MAXDELDATE}]\"" /opt/solr/bin/post -c <corename> -type application/json -out yes -commit yes -d {delete:{query:"$DELETE"}} Any ideas? Thanks. Ravi