And to include the filename in the tuple with the data, I copied PigStorage (I'm loading csv), created a private PigSplit object, set this object in "prepareToRead", and added this code before returning the tuple in "getNext",
if (mSplit != null) { FileSplit fs = (FileSplit) mSplit.getWrappedSplit(); Path p = fs.getPath(); mProtoTuple.add(p.toString()); } And it works! Thanks again :-) -Kim On Thu, Feb 3, 2011 at 9:43 PM, Dexin Wang <wangde...@gmail.com> wrote: > wow, I almost got it right. Double quote, fails. Single quote, works. > > Thanks. > > On Thu, Feb 3, 2011 at 9:40 PM, Kim Vogt <k...@simplegeo.com> wrote: > > > This should work: > > > > grunt> B = FOREACH A GENERATE f1, 'filename-2011-02-03'; > > > > or > > > > grunt> B = FOREACH A GENERATE f1, '$paramName'; > > > > -Kim > > > > On Thu, Feb 3, 2011 at 8:32 PM, Dexin Wang <wangde...@gmail.com> wrote: > > > > > Similarly, is it possible to insert some literal values to a tuple > > stream? > > > > > > For example, when I invoke my Pig script, I already know what data > source > > > is > > > (say, it's from filename_2011-02-03), so I can just pass it to Pig > using > > > -param, and I want to insert this known file name to the tuple stream. > > How > > > can I do that? > > > > > > Example, I have: > > > > > > grunt> A = LOAD 'aa' AS (f1, f2); > > > grunt> DUMP A; > > > (aa,bb) > > > (cc,dd) > > > > > > I want to do something like: > > > > > > grunt> B = FOREACH A GENERATE f1, "filename-2011-02-03"; > > > > > > Thanks. > > > > > > On Thu, Feb 3, 2011 at 7:49 PM, Dmitriy Ryaboy <dvrya...@gmail.com> > > wrote: > > > > > > > In pig 6, you can hook into bindTo() and save the file name. > > > > > > > > In pig 8 you have to find your way to the underlying InputSplit via > > > > PigSplit.getWrappedSplit(), cast it as FileSplit, and call getPath() > > > > on it.. I think. Haven't done this. > > > > > > > > This will totally break if you have splitCombination turned on, of > > > > course, as pig can silently move to a different file under you, so > > > > you'd have to turn that off. > > > > > > > > D > > > > > > > > On Thu, Feb 3, 2011 at 3:52 PM, Kim Vogt <k...@simplegeo.com> wrote: > > > > > Hey, > > > > > > > > > > I have a bunch of files where the filename is significant. I'm > > loading > > > > the > > > > > files by supplying the top level directory that contains the files. > > Is > > > > > there a way to capture the filename of the file and append to the > > tuple > > > > of > > > > > data that's in that file? > > > > > > > > > > -Kim > > > > > > > > > > > > > > >