In production I use short Pig scripts and schedule them with Azkaban
with dependencies setup, so that I can use Azkaban to restart long
data pipelines at the point of failure. I edit the failing pig script,
usually towards the end of the data pipeline, and restart the Azkaban
job. This saves hours and hours of repeated processing.

I wish Pig could do this. To resume at its point of failure when
re-run from the command line. Is this feasible?

Russell Jurney
twitter.com/rjurney
[email protected]
datasyndrome.com

Reply via email to