Resume scroll-scan query?

2014-10-23 Thread Roger de Cordova Farias
I'm reindexing a ElasticSearch base with 50m docs using the scroll-scan request to retrieve all docs, but my reindexer program stopped at 30m Is there a way to redo the query to retrieve the left docs? Like using offset? Would the the internal order of the scan query be the same with a second

Re: Resume scroll-scan query?

2014-10-23 Thread John Smith
The scroll is available based on a timeout value you give it. Everytimetime you scroll you restart the countdown. You could track the last scroll id you used and try it again from there? On Thursday, 23 October 2014 12:47:02 UTC-4, Roger de Cordova Farias wrote: I'm reindexing a ElasticSearch

Re: Resume scroll-scan query?

2014-10-23 Thread Roger de Cordova Farias
Hmm, I was using a small ttl, just enough to process each scroll call, but I could try using a longer time to live and resuming from the last scroll_id in case of error That is a good idea, thanks 2014-10-23 17:12 GMT-02:00 John Smith java.dev@gmail.com: The scroll is available based on a

Re: Resume scroll-scan query?

2014-10-23 Thread John Smith
Small ttl is ok (well adjusted properly for you process) because everytime you call scroll it resets the ttl. So you don't need to put a 60m scroll time. It just has to be long enough to be able to process the next scroll id. I'm curious if you can re-use the scroll id. It's not specifically

Re: Resume scroll-scan query?

2014-10-23 Thread Roger de Cordova Farias
I know it resets the ttl on each scroll call, but since I don't have an automatic resuming process, I need to manually check the last scroll_id (I will log it to a file) and restart the reindexing program using it. That is why I need a longer ttl I just tested the re-use of the scroll_id. Looks