RE: Ideas for data processing

Sameer Tilak Wed, 05 Feb 2014 07:23:55 -0800

Steve,Thanks. Will try that now.


> From: steve.bernst...@deem.com
> To: user@pig.apache.org
> Subject: RE: Ideas for data processing
> Date: Tue, 4 Feb 2014 17:57:44 +0000
> 
> Sameer, did you check out the TOMAP function in the documentation?  The 
> example is close to yours.  I think with a nested FOREACH in combination with 
> TOMAP and you'd get there, though I haven't tried it myself.
> SB
> 
> ______________________
> Steve Bernstein
> VP/Analytics
> 
> 408.499.0961 MOBILE
> deem.com
> 
> -----Original Message-----
> From: Sameer Tilak [mailto:ssti...@live.com] 
> Sent: Monday, February 03, 2014 2:00 PM
> To: user@pig.apache.org
> Subject: Ideas for data processing
> 
> Hi everyone,
> We have data set in the following format:
> user1    item1    valueuser2    item1   valueuser3     item1   
> value...................user1     item2  valueuser20   item2  valueuser35   
> item2  value..................user2     item3 valueuser25   item3  
> value.......
> We have around 20 items and millions of users and not all users have entries 
> for all the items. We would like to transform this into
> user1 item1 value, item2, value, item3, value....user2 item4 value, item 18 
> value, item 19 value.....
> I can think of a couple of ways for doing this in Pig Latin. For example, one 
> way would be to create a map (where key is item name and value is the 
> associated value) and then fill out that map as you read the data. Then write 
> it out to a file. I am not sure how efficient will that be. I would love to 
> get suggestions for doing this in Pig Latin.
> 
>

RE: Ideas for data processing

Reply via email to