You can try using  a customer storage parser.

You can see a bunch of examples here..

pig-0.7.0/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage

I wrote one for JSON.

On Tue, Nov 30, 2010 at 12:16 PM, Yves Roy <yves....@cossette.com> wrote:

> Hello:
>
> I hope this is not double posting.
>
> I want to do something simple:
>
> I have a data file, mydata.log,  formatted like this:
>
> a1 | b1 | c=foo&d=bar | e1
> a2 | b2 | c=john&d=doe | e2
> a3 | b3 | c=foo&d=doe | e3
> ...
>
> and I want to LOAD the data USING <something> in order to get the AS to be
> (A,B,C,D, E) i.e. extract 2 fields from the third one.
>
> For example :
>
> data = LOAD 'mydata.log' USING <something> AS (A, B, C, D, E);
>
> i.e. I want the third field (i.e. the one formatted as 'cx=foox&dx=barx')
> to
> be parsed to yield the C and D in my AS list of fields
> so that later on I can do things like:
>
> data_cfoo = FILTER data BY c == 'foo';
> data_cfoo_ddoe = FILTER data_cfoo BY d='doe';
>
>
> There has to have a simple way way to do that ?
> Passing a regex, a ruby script or what else as a parameter to PigStorage,
> or
> using something else than PigStorage?
>
> Many thanks
>
> Yves
>
> YVES
> DE FJORD
>
>   YVES ROY DÉVELOPPEUR LOGICIEL DE FJORD
> 2100, RUE DRUMMOND, MONTRÉAL, QUÉBEC H3G 1X1 CANADA
> T 514 270 8782 #4572 / F 514 270 4162 / cossette.com
>

Reply via email to