On Tue, Jun 19, 2012 at 10:46 AM, Subir S <[email protected]> wrote:
> I think content in the end of this link > > http://elasticmapreduce.s3.amazonaws.com/samples/pig-apache/do-reports.pigwill > help you!! > > thanks! I get 404 when I click on that link. > On Tue, Jun 19, 2012 at 10:50 PM, Subir S <[email protected]> > wrote: > > > I suggest you load with 2 fields. (uri, query) split at '?' delimiter. > > > > Then use regex_extract to extract abc.com and regex_extract_all to > > extract query parameters. > > > > Use foreach...generate to make query into a map. > > > > > > On Tue, Jun 19, 2012 at 3:33 AM, Mohit Anchlia <[email protected] > >wrote: > > > >> sorry that wasn't a link. It's my input to the pig. Basically what's > >> inside > >> params.dat. When I run those 3 pig lines I get empty output. What I want > >> is > >> something like this: > >> > >> http://abc.com/?a=v1&b=v2 > >> > >> broken down into a map and also be able to preserve abc.com. Otherwise > if > >> it's complex I can write UDFs > >> > >> > >> On Mon, Jun 18, 2012 at 1:04 PM, Subir S <[email protected]> > >> wrote: > >> > >> > I think link Mohit mentioned was his input. Not sure if i understood > >> > correctly. > >> > > >> > I suspect something related to the schema. > >> > > >> > http://pig.apache.org/docs/r0.9.1/basic.html#map-schema > >> > > >> > http://stackoverflow.com/a/8238591 > >> > > >> > So when you load with delimiter '&', what will happen to the first > >> field? > >> > and how will the second field automatically become a map...I mean in > >> your > >> > schema... you mention only one field...not two fields..URL&QUERY > >> > > >> > Thanks, Subir > >> > > >> > On Tue, Jun 19, 2012 at 12:20 AM, Jonathan Coveney < > [email protected] > >> > >wrote: > >> > > >> > > Your link does not work, I recommend using pastebin. > >> > > > >> > > 2012/6/18 Mohit Anchlia <[email protected]> > >> > > > >> > > > I am trying to parse URL using map type of pig. My query string > is: > >> > > > > >> > > > > >> https://mail.google.com/mail/?tab=wm#drafts/13800c4ea3d11511&mail=123 > >> > > > > >> > > > My very simple script for testing is this. But when I look at the > >> part > >> > > file > >> > > > it returns null. > >> > > > > >> > > > A = LOAD '/examples/map/input/params.dat' USING PigStorage('&') AS > >> > > > (M:map[]); > >> > > > > >> > > > rmf '/examples/map/output/'; > >> > > > > >> > > > STORE B INTO '/examples/map/output/'; > >> > > > > >> > > > I am working on analyzing clickstream data. For this I need to > first > >> > > parse > >> > > > these strings into files representing dimensions and also do > >> > > sessionization > >> > > > on them before loading it into RDBMS. > >> > > > > >> > > > >> > > >> > > > > >
