In production I use short Pig scripts and schedule them with Azkaban with dependencies setup, so that I can use Azkaban to restart long data pipelines at the point of failure. I edit the failing pig script, usually towards the end of the data pipeline, and restart the Azkaban job. This saves hours and hours of repeated processing.
I wish Pig could do this. To resume at its point of failure when re-run from the command line. Is this feasible? Russell Jurney twitter.com/rjurney [email protected] datasyndrome.com
