You could create a one-time job that processes historical data to match the updated format
On Tue, Mar 21, 2017 at 8:53 AM, Aditya Borde <bordec...@gmail.com> wrote: > Hello, > > I'm currently blocked with this issue: > > I have job "A" whose output is partitioned by one of the field - "col1" > Now job "B" reads the output of job "A". > > Here comes the problem. my job "A" output previously not been partitioned > by "col1" (this is recent change). > But the thing is now, all my previous data has not been partitioned by > "col1" for job "A". > If I want to run my job "B" without any issue with previous as well as > current data - it is failing as because : "inconsistent partition column > names" > > *Reading Path is something like - "file://path1/name/sample/"* ---> but > further it has directories *"day=2017-02-15/filling=5/xyz1"* > > Currently it is generating one more deeper directory input path --> " > */day=2017-02-15/filling=5/col1/xyz2"* > > "mergeSchema" - is not working here because my base path has multiple > directories under which files are residing. > > Can someone suggest me some effective solution here? > > Regards, > Aditya Borde > -- Regards, Matt Data Engineer https://www.linkedin.com/in/mdeaver http://mattdeav.pythonanywhere.com/