I switched to using the CSVLoader in piggybank, and appended the filepath to
the current RecordReader instead.

-Kim

On Thu, Feb 3, 2011 at 10:11 PM, Dmitriy Ryaboy <dvrya...@gmail.com> wrote:

> There's a CSV loader in the piggybank that does proper CSV escaping,
> if you are interested.
>
> On Thu, Feb 3, 2011 at 9:53 PM, Kim Vogt <k...@simplegeo.com> wrote:
> > And to include the filename in the tuple with the data, I copied
> PigStorage
> > (I'm loading csv), created a private PigSplit object, set this object in
> > "prepareToRead", and added this code before returning the tuple in
> > "getNext",
> >
> > if (mSplit != null) {
> >    FileSplit fs = (FileSplit) mSplit.getWrappedSplit();
> >    Path p = fs.getPath();
> >    mProtoTuple.add(p.toString());
> > }
> >
> > And it works!  Thanks again :-)
> >
> > -Kim
> >
> > On Thu, Feb 3, 2011 at 9:43 PM, Dexin Wang <wangde...@gmail.com> wrote:
> >
> >> wow, I almost got it right. Double quote, fails. Single quote, works.
> >>
> >> Thanks.
> >>
> >> On Thu, Feb 3, 2011 at 9:40 PM, Kim Vogt <k...@simplegeo.com> wrote:
> >>
> >> > This should work:
> >> >
> >> > grunt> B = FOREACH A GENERATE f1, 'filename-2011-02-03';
> >> >
> >> > or
> >> >
> >> > grunt> B = FOREACH A GENERATE f1, '$paramName';
> >> >
> >> > -Kim
> >> >
> >> > On Thu, Feb 3, 2011 at 8:32 PM, Dexin Wang <wangde...@gmail.com>
> wrote:
> >> >
> >> > > Similarly, is it possible to insert some literal values to a tuple
> >> > stream?
> >> > >
> >> > > For example, when I invoke my Pig script, I already know what data
> >> source
> >> > > is
> >> > > (say, it's from filename_2011-02-03), so I can just pass it to Pig
> >> using
> >> > > -param, and I want to insert this known file name to the tuple
> stream.
> >> > How
> >> > > can I do that?
> >> > >
> >> > > Example, I have:
> >> > >
> >> > > grunt> A = LOAD 'aa' AS (f1, f2);
> >> > > grunt> DUMP A;
> >> > > (aa,bb)
> >> > > (cc,dd)
> >> > >
> >> > > I want to do something like:
> >> > >
> >> > > grunt> B = FOREACH A GENERATE f1, "filename-2011-02-03";
> >> > >
> >> > > Thanks.
> >> > >
> >> > > On Thu, Feb 3, 2011 at 7:49 PM, Dmitriy Ryaboy <dvrya...@gmail.com>
> >> > wrote:
> >> > >
> >> > > > In pig 6, you can hook into bindTo() and save the file name.
> >> > > >
> >> > > > In pig 8 you have to find your way to the underlying InputSplit
> via
> >> > > > PigSplit.getWrappedSplit(), cast it as FileSplit, and call
> getPath()
> >> > > > on it.. I think. Haven't done this.
> >> > > >
> >> > > > This will totally break if you have splitCombination turned on, of
> >> > > > course, as pig can silently move to a different file under you, so
> >> > > > you'd have to turn that off.
> >> > > >
> >> > > > D
> >> > > >
> >> > > > On Thu, Feb 3, 2011 at 3:52 PM, Kim Vogt <k...@simplegeo.com>
> wrote:
> >> > > > > Hey,
> >> > > > >
> >> > > > > I have a bunch of files where the filename is significant.  I'm
> >> > loading
> >> > > > the
> >> > > > > files by supplying the top level directory that contains the
> files.
> >> >  Is
> >> > > > > there a way to capture the filename of the file and append to
> the
> >> > tuple
> >> > > > of
> >> > > > > data that's in that file?
> >> > > > >
> >> > > > > -Kim
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>

Reply via email to