Yes, it is strongly recommended to use 0.8.1, which we fixed quite a few important bugs.
Daniel On Fri, Jul 22, 2011 at 6:30 AM, Andrew Clegg <andrew.cl...@gmail.com>wrote: > Hello again, > > I have a relation with the following schema: > > regrouped: {group: (artistid: int,country: int,week: > chararray),projected_joined_albums: {key: (artistid: int,country: > int,week: chararray),timestamp: long,albumid: int,numtracks: > long,reach: int,title_len: long}} > > having grouped the projected_joined_albums relation on key. > > However, when I store it using the default storage format: > > store regrouped into 'dupetest/regrouped'; > > The resulting file looks like this: > > (1000062,83,2011-06-13T00:00:00.000Z) > > {(1000062,1308268800,274377251,,1,11),(1000062,1308268800,275105079,,7,13),(1000062,1308268800,270919728,1,67,4)} > > The first column is the grouping field ('key'), this is correct. > However the second column is a bag of *flat* tuples, each having just > the artistid (an integer) as the initial element, where I would have > expected to find the entire 'key' tuple. > > The rest of the fields of each tuple are exactly as I would expect > them -- timestamp, albumid, numtracks, reach, title_len. > > Is this a bug? (Pig 0.8.0 from Cloudera CDH3u0 BTW) > > Also, it occurs to me that this may relate to the other question I > posted, about "foreach regrouped" with an inner order-by failing with > the following error: > > java.lang.ClassCastException: java.lang.Integer cannot be cast to > org.apache.pig.data.Tuple > > Assuming Pig's temp-file version of regrouped looks the same as the > the one I got from store, I could see how foreach might fall over, if > it was expecting the first element to be the key tuple but instead got > the artistid! > > Thanks again, > > Andrew. > > -- > > http://tinyurl.com/andrew-clegg-linkedin | http://twitter.com/andrew_clegg >