I think content in the end of this link
http://elasticmapreduce.s3.amazonaws.com/samples/pig-apache/do-reports.pigwill
help you!!

On Tue, Jun 19, 2012 at 10:50 PM, Subir S <[email protected]> wrote:

> I suggest you load with 2 fields. (uri, query) split at '?' delimiter.
>
> Then use regex_extract to extract abc.com and regex_extract_all to
> extract query parameters.
>
> Use foreach...generate to make query into a map.
>
>
> On Tue, Jun 19, 2012 at 3:33 AM, Mohit Anchlia <[email protected]>wrote:
>
>> sorry that wasn't a link. It's my input to the pig. Basically what's
>> inside
>> params.dat. When I run those 3 pig lines I get empty output. What I want
>> is
>> something like this:
>>
>> http://abc.com/?a=v1&b=v2
>>
>> broken down into a map and also be able to preserve abc.com. Otherwise if
>> it's complex I can write UDFs
>>
>>
>> On Mon, Jun 18, 2012 at 1:04 PM, Subir S <[email protected]>
>> wrote:
>>
>> > I think link Mohit mentioned was his input. Not sure if i understood
>> > correctly.
>> >
>> > I suspect something related to the schema.
>> >
>> > http://pig.apache.org/docs/r0.9.1/basic.html#map-schema
>> >
>> > http://stackoverflow.com/a/8238591
>> >
>> > So when you load with delimiter '&', what will happen to the first
>> field?
>> > and how will the second field automatically become a map...I mean in
>> your
>> > schema... you mention only one field...not two fields..URL&QUERY
>> >
>> > Thanks, Subir
>> >
>> > On Tue, Jun 19, 2012 at 12:20 AM, Jonathan Coveney <[email protected]
>> > >wrote:
>> >
>> > > Your link does not work, I recommend using pastebin.
>> > >
>> > > 2012/6/18 Mohit Anchlia <[email protected]>
>> > >
>> > > > I am trying to parse URL using map type of pig. My query string is:
>> > > >
>> > > >
>> https://mail.google.com/mail/?tab=wm#drafts/13800c4ea3d11511&mail=123
>> > > >
>> > > > My very simple script for testing is this. But when I look at the
>> part
>> > > file
>> > > > it returns null.
>> > > >
>> > > > A = LOAD '/examples/map/input/params.dat' USING PigStorage('&') AS
>> > > > (M:map[]);
>> > > >
>> > > > rmf '/examples/map/output/';
>> > > >
>> > > > STORE B INTO '/examples/map/output/';
>> > > >
>> > > > I am working on analyzing clickstream data. For this I need to first
>> > > parse
>> > > > these strings into files representing dimensions and also do
>> > > sessionization
>> > > > on them before loading it into RDBMS.
>> > > >
>> > >
>> >
>>
>
>

Reply via email to