You can:
1. load all data,
2. use strsplit (http://pig.apache.org/docs/r0.9.2/func.html#strsplit) to
split your values into a tuple
3. convert your tuples into a bag (I used an UDF in python instead DF tobag
)
4. flatten your bag (http://pig.apache.org/docs/r0.9.2/basic.html#flatten)
I don't know
I'm not sure of an inbuilt way in Pig to ignore keys. May be you can
load the data as comma delimited and parse out all characters before
tab inclusive in a foreach Statement from the first field. You can use
tokenize or substring to achieve that.
May be there is a better way I'm not aware.
Sent
I have something like:
ABC1,2,3,4
I think it's the tab delimited.with ABC being the key and 1,2,3,4 as values.
I need to ignore ABC and then load with PigStorage(',') to parse comma
separated into separate fields. Is there an easy way to do this?
On Thu, Mar 8, 2012 at 6:20 PM, Prashant Kom
How are you loading it in Pig? Can you just ignore the first field (key)
with positional reference? What is the key-value delimiter used in your MR
job.
On Thu, Mar 8, 2012 at 2:56 PM, Mohit Anchlia wrote:
> I am trying to process the output which has key in it from the map-reduce
> job. Is there
I am trying to process the output which has key in it from the map-reduce
job. Is there a way I can ignore the key when I load data from that file?
When I load data in the variable I don't want the key in that alias.