Hi,

we are getting data in following formats  into hdfs.
               yyyy/mm/dd/hh/mm/abc.xml
                                            /bcd.xml
                                            ..
                                            .. 4 more
each xml file has diff schema.

eg:eg: kfs/2011/05/1/9/01/abc.xml
                                       /bcd.xml .... 4more
           kfs/2011/05/1/9/02/abc.xml
                                       /bcd.xml .. 4more

For each minute we get 6 diff kinds xml data files and total of 5 hours.I
have written 6pig codes to process these xml files(to get into CSV format).
Processing each minute data is straight forward( just has to mention one
input path and one output path in pig scripts).
We are looking to process say 10minutes of data as a batch and then other
10minutes ..so on until last 10minutes of day and here I was wondering about
specifying input path and output path for pigscript dynamically for each
batch(ecah 10minutes) of data. manually param substitution will work. but ,
looking for method in a way that input and output paths are changed
dynamically.


Any help greatly appreciated.


Thanks,
Srinivas

Reply via email to