I switched to using the CSVLoader in piggybank, and appended the filepath to the current RecordReader instead.
-Kim On Thu, Feb 3, 2011 at 10:11 PM, Dmitriy Ryaboy <dvrya...@gmail.com> wrote: > There's a CSV loader in the piggybank that does proper CSV escaping, > if you are interested. > > On Thu, Feb 3, 2011 at 9:53 PM, Kim Vogt <k...@simplegeo.com> wrote: > > And to include the filename in the tuple with the data, I copied > PigStorage > > (I'm loading csv), created a private PigSplit object, set this object in > > "prepareToRead", and added this code before returning the tuple in > > "getNext", > > > > if (mSplit != null) { > > FileSplit fs = (FileSplit) mSplit.getWrappedSplit(); > > Path p = fs.getPath(); > > mProtoTuple.add(p.toString()); > > } > > > > And it works! Thanks again :-) > > > > -Kim > > > > On Thu, Feb 3, 2011 at 9:43 PM, Dexin Wang <wangde...@gmail.com> wrote: > > > >> wow, I almost got it right. Double quote, fails. Single quote, works. > >> > >> Thanks. > >> > >> On Thu, Feb 3, 2011 at 9:40 PM, Kim Vogt <k...@simplegeo.com> wrote: > >> > >> > This should work: > >> > > >> > grunt> B = FOREACH A GENERATE f1, 'filename-2011-02-03'; > >> > > >> > or > >> > > >> > grunt> B = FOREACH A GENERATE f1, '$paramName'; > >> > > >> > -Kim > >> > > >> > On Thu, Feb 3, 2011 at 8:32 PM, Dexin Wang <wangde...@gmail.com> > wrote: > >> > > >> > > Similarly, is it possible to insert some literal values to a tuple > >> > stream? > >> > > > >> > > For example, when I invoke my Pig script, I already know what data > >> source > >> > > is > >> > > (say, it's from filename_2011-02-03), so I can just pass it to Pig > >> using > >> > > -param, and I want to insert this known file name to the tuple > stream. > >> > How > >> > > can I do that? > >> > > > >> > > Example, I have: > >> > > > >> > > grunt> A = LOAD 'aa' AS (f1, f2); > >> > > grunt> DUMP A; > >> > > (aa,bb) > >> > > (cc,dd) > >> > > > >> > > I want to do something like: > >> > > > >> > > grunt> B = FOREACH A GENERATE f1, "filename-2011-02-03"; > >> > > > >> > > Thanks. > >> > > > >> > > On Thu, Feb 3, 2011 at 7:49 PM, Dmitriy Ryaboy <dvrya...@gmail.com> > >> > wrote: > >> > > > >> > > > In pig 6, you can hook into bindTo() and save the file name. > >> > > > > >> > > > In pig 8 you have to find your way to the underlying InputSplit > via > >> > > > PigSplit.getWrappedSplit(), cast it as FileSplit, and call > getPath() > >> > > > on it.. I think. Haven't done this. > >> > > > > >> > > > This will totally break if you have splitCombination turned on, of > >> > > > course, as pig can silently move to a different file under you, so > >> > > > you'd have to turn that off. > >> > > > > >> > > > D > >> > > > > >> > > > On Thu, Feb 3, 2011 at 3:52 PM, Kim Vogt <k...@simplegeo.com> > wrote: > >> > > > > Hey, > >> > > > > > >> > > > > I have a bunch of files where the filename is significant. I'm > >> > loading > >> > > > the > >> > > > > files by supplying the top level directory that contains the > files. > >> > Is > >> > > > > there a way to capture the filename of the file and append to > the > >> > tuple > >> > > > of > >> > > > > data that's in that file? > >> > > > > > >> > > > > -Kim > >> > > > > > >> > > > > >> > > > >> > > >> > > >