On 6 July 2012 15:53, Duckworth, Will <[email protected]> wrote: > Not sure of your desired "final output" but below is the pseudo code how I > solved a similar problem with pig and python. > > Use PigStorage with new-line as the delimiter (or whatever you are using to > denote a new line) in order to throw PIG a "fakie" and have it load the whole > line as the tuple. > > tv_in = load '$tv_in_path' using PigStorage('\n') as (line:chararray); > > Pass each line to a python UDF > > tv_in2 = foreach tv_in generate udf.explode_tv(line); > > That gets the whole line into the python UDF so that you can do your custom > parsing. > > Since you don't know the total number of item:minute pairs you are going to > have to decide what you want to return. > > You could do a bag of item:minute pairs something like: > R:bag{T:tuple(timestamp, userid, channeled, total_duration, > itemids:bag{iT:tuple(itemid, minutes)} or you could create a tuple for each > item:minute pair: R:bag{T:tuple(timestamp, userid, channeled, total_duration, > itemid, minutes)}. > > Hope this helps.
Very much so. Thanks, Will! Dan
