You can pass your time to a udf that rounds it down to the nearest 30 second interval and then group by course, interval to get counts for each course, interval.
On Thursday, October 27, 2011, Marco Cadetg <ma...@zattoo.com> wrote: > I have a problem where I don't know how or if pig is even suitable to solve > it. > > I have a schema like this: > > student-id,student-name,start-time,duration,course > 1,marco,1319708213,500,math > 2,ralf,1319708111,112,english > 3,greg,1319708321,333,french > 4,diva,1319708444,80,english > 5,susanne,1319708123,2000,math > 1,marco,1319708564,500,french > 2,ralf,1319708789,123,french > 7,fred,1319708213,5675,french > 8,laura,1319708233,123,math > 10,sab,1319708999,777,math > 11,fibo,1319708789,565,math > 6,dan,1319708456,50,english > 9,marco,1319708123,60,english > 12,bo,1319708456,345,math > 1,marco,1319708789,673,math > ... > ... > > I would like to retrieve a graph (interpolation) over time grouped by > course. Meaning how many students are learning for a course based on a 30 > sec interval. > The grouping by course is easy but from there I've no clue how I would > achieve the rest. I guess the rest needs to be achieved via some UDF > or is there any way how to this in pig? I often think that I need a "for > loop" or something similar in pig. > > Thanks for your help! > -Marco >