I asked a similar question before. Please see this thread

http://mail-archives.apache.org/mod_mbox/pig-user/201103.mbox/%[email protected]%3E

Shawn

On Tue, May 31, 2011 at 11:08 AM, Jonathan Coveney <[email protected]> wrote:
> Context: I have a bunch of files living in HDFS, and I think my jobs are
> failing on one of them... I want to output the files that the job is failing
> on.
>
> I thought that I could just make my own LoadFunc that followed the same
> methodology as PigStorage, but caught exceptions and logged the file that
> was given...this isn't working, however. I tried returning loadLocation, but
> that is the globbed input, not the input to the mapper. I also tried reading
> mapreduce.map.file.input and map.file.input from the Job given to
> setLocation, but both were null... I think this is where some of my
> ignorance as to pig's internal workings is coming into play, as I'm not sure
> when files are deglobbed and the splits are actually read. I tried using
> getLocations() from the PigSplit passed to prepareToRead but that was just
> the glob as well...
>
> My next thought would be to read make a RecordReader that outputs the file
> associated with its splits (as I assume that this should have to have the
> specific files it is processing?), but I thought I'd ask if there was a
> cleaner way before doing that...
>
> Thanks!
> Jon
>

Reply via email to