Hi guys, sorry for the delay, I meant to catch up on this issue earlier. I’ve two points that I would like to bring up here:
1. (Less important) What causes the delays? I think you should definitely investigate what causes the issue. The quickest thing to do on your own is to try to keep a log of execution times of code that is directly chained to the data source (that would be at least the mapper). I know doing that by hand is tedious, but tools for that won’t be available in the very near future. 2. (Much more important) The delays should not be an issue! HBase can restart a scan at any point within a region at fairly low cost, as long as you know the key from which you want to start reading. So, the idea would be to catch exactly the kinds of timeouts you are experiencing (maybe log a warning) and directly create a new scanner that is configured to start at the position of the last successfully retrieved tuple. This approach means we would need to keep a copy of the key of freshest tuple returned by each scanner in the input format. Of course, that comes with a certain cost, but my guess would be HBase keys usually are not overlay large and performance drop significantly. I have an unstable and outdated implementation of that approach somewhere in an old Stratosphere branch and I could try polish it up so Flavio can try it out. tl;dr If you can’t prevent the timeout, embrace it and simply start a new scan from where you left. Best, Marcus > On 27 Nov 2014, at 20:16, Flavio Pompermaier <[email protected]> wrote: > > Thanks Stephan for the support! Unfortunately we are not able to understand > what lineage of operators cause this problem.. > in our case we set the scan timeout to 15 minutes so I think we can exclude > garbage collection thus, probably, this is caused by the first option > (unfortunately HBase cannot block scans indefinitely..). > > What can we do to debug this problem? can you give us more detail or links to > the internals of such situations? it is not very clear to me the relation > between buffers, actions and pauses between two consecutive nextRecord() on > the same split of the inputFormat..
