Hi all, Our dataset consists of multiple files. The name of each file reflects the creation date of the file. (e.g. 20101031.dat, 20101101.dat, etc) We need this date information for all relations inside the file, but there is no date field.
We first considered the possibility of accessing the file name through a UDF that implements LoadFunc, but it doesn't appear to be possible. In particular, 'location' in setLocation(String location, PigSplit split) only gives the original glob expression used in LOAD (such as '/test/data/*.dat'), and 'reader' in prepareToRead(RecordReader reader, PigSplit split) doesn't expose a method for file name access. Before we individually add the date field to every single file (which we want to leave as the last resort, considering the number of files we deal with), we were wondering if there's any way to access the file name within a pig script (including UDFs) especially when you load multiple files at the same time. Any help would be greatly appreciated. FYI, we are on Pig 0.7.0 running on top of Hadoop 0.20.2 Thanks, Sang