What could be the exact command that you would use?  It should just be one
line right?

On Tue, Nov 30, 2010 at 12:58 PM, Dmitriy Ryaboy <dvrya...@gmail.com> wrote:

> An easier approach would be to just use PigStorage('|') to get the
> pipe-delimited fields, and use STRSPLIT to break up the third column into
> multiple columns.
>
> -D
>
> On Tue, Nov 30, 2010 at 9:26 AM, John Hui <john.m....@gmail.com> wrote:
>
> > You can try using  a customer storage parser.
> >
> > You can see a bunch of examples here..
> >
> >
> >
> pig-0.7.0/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage
> >
> > I wrote one for JSON.
> >
> > On Tue, Nov 30, 2010 at 12:16 PM, Yves Roy <yves....@cossette.com>
> wrote:
> >
> > > Hello:
> > >
> > > I hope this is not double posting.
> > >
> > > I want to do something simple:
> > >
> > > I have a data file, mydata.log,  formatted like this:
> > >
> > > a1 | b1 | c=foo&d=bar | e1
> > > a2 | b2 | c=john&d=doe | e2
> > > a3 | b3 | c=foo&d=doe | e3
> > > ...
> > >
> > > and I want to LOAD the data USING <something> in order to get the AS to
> > be
> > > (A,B,C,D, E) i.e. extract 2 fields from the third one.
> > >
> > > For example :
> > >
> > > data = LOAD 'mydata.log' USING <something> AS (A, B, C, D, E);
> > >
> > > i.e. I want the third field (i.e. the one formatted as
> 'cx=foox&dx=barx')
> > > to
> > > be parsed to yield the C and D in my AS list of fields
> > > so that later on I can do things like:
> > >
> > > data_cfoo = FILTER data BY c == 'foo';
> > > data_cfoo_ddoe = FILTER data_cfoo BY d='doe';
> > >
> > >
> > > There has to have a simple way way to do that ?
> > > Passing a regex, a ruby script or what else as a parameter to
> PigStorage,
> > > or
> > > using something else than PigStorage?
> > >
> > > Many thanks
> > >
> > > Yves
> > >
> > > YVES
> > > DE FJORD
> > >
> > >   YVES ROY DÉVELOPPEUR LOGICIEL DE FJORD
> > > 2100, RUE DRUMMOND, MONTRÉAL, QUÉBEC H3G 1X1 CANADA
> > > T 514 270 8782 #4572 / F 514 270 4162 / cossette.com
> > >
> >
>

Reply via email to