What could be the exact command that you would use? It should just be one line right?
On Tue, Nov 30, 2010 at 12:58 PM, Dmitriy Ryaboy <dvrya...@gmail.com> wrote: > An easier approach would be to just use PigStorage('|') to get the > pipe-delimited fields, and use STRSPLIT to break up the third column into > multiple columns. > > -D > > On Tue, Nov 30, 2010 at 9:26 AM, John Hui <john.m....@gmail.com> wrote: > > > You can try using a customer storage parser. > > > > You can see a bunch of examples here.. > > > > > > > pig-0.7.0/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage > > > > I wrote one for JSON. > > > > On Tue, Nov 30, 2010 at 12:16 PM, Yves Roy <yves....@cossette.com> > wrote: > > > > > Hello: > > > > > > I hope this is not double posting. > > > > > > I want to do something simple: > > > > > > I have a data file, mydata.log, formatted like this: > > > > > > a1 | b1 | c=foo&d=bar | e1 > > > a2 | b2 | c=john&d=doe | e2 > > > a3 | b3 | c=foo&d=doe | e3 > > > ... > > > > > > and I want to LOAD the data USING <something> in order to get the AS to > > be > > > (A,B,C,D, E) i.e. extract 2 fields from the third one. > > > > > > For example : > > > > > > data = LOAD 'mydata.log' USING <something> AS (A, B, C, D, E); > > > > > > i.e. I want the third field (i.e. the one formatted as > 'cx=foox&dx=barx') > > > to > > > be parsed to yield the C and D in my AS list of fields > > > so that later on I can do things like: > > > > > > data_cfoo = FILTER data BY c == 'foo'; > > > data_cfoo_ddoe = FILTER data_cfoo BY d='doe'; > > > > > > > > > There has to have a simple way way to do that ? > > > Passing a regex, a ruby script or what else as a parameter to > PigStorage, > > > or > > > using something else than PigStorage? > > > > > > Many thanks > > > > > > Yves > > > > > > YVES > > > DE FJORD > > > > > > YVES ROY DÉVELOPPEUR LOGICIEL DE FJORD > > > 2100, RUE DRUMMOND, MONTRÉAL, QUÉBEC H3G 1X1 CANADA > > > T 514 270 8782 #4572 / F 514 270 4162 / cossette.com > > > > > >