Hi All,
I use PIG to process some of my data, and I face an issue. I have a lot of data, I want them to be sort and also group by key, and put into files (for later other java program to process them) For example, my data is: col-k1, col-k2, col-v1 I want the data is order by col-k1 and col-k2, and at the same time, the output file is separated by the key col-k1 only. I can find order by behavior below: cid:image001.png@01D13821.198FDD70 I like the idea of sampling, but how can I still enforce all e (in above example) into one file? So I want a balanced result set, but I don't want to a key goes to different reducer? How can I do it simply in Pig? I know I can do this by own MR code, but it will be quite troublesome because I have a lot of similar requirement. Anyone has any idea? BTW: I remember this sampling is a feature added later (because earlier version don't have this and result always group by the key), any parameter I can use to tune or disable this feature in PIG? Regards, Shuai