On 6 July 2012 15:53, Duckworth, Will <[email protected]> wrote:
> Not sure of your desired "final output" but below is the pseudo code how I 
> solved a similar problem with pig and python.
>
> Use PigStorage with new-line as the delimiter (or whatever you are using to 
> denote a new line) in order to throw PIG a "fakie" and have it load the whole 
> line as the tuple.
>
> tv_in = load '$tv_in_path' using PigStorage('\n') as (line:chararray);
>
> Pass each line to a python UDF
>
> tv_in2 = foreach tv_in generate udf.explode_tv(line);
>
> That gets the whole line into the python UDF so that you can do your custom 
> parsing.
>
> Since you don't know the total number of item:minute pairs you are going to 
> have to decide what you want to return.
>
> You could do a bag of item:minute pairs something like: 
> R:bag{T:tuple(timestamp, userid, channeled, total_duration, 
> itemids:bag{iT:tuple(itemid, minutes)} or you could create a tuple for each 
> item:minute pair: R:bag{T:tuple(timestamp, userid, channeled, total_duration, 
> itemid, minutes)}.
>
> Hope this helps.

Very much so. Thanks, Will!

Dan

Reply via email to