Pig 0.8.0 is released!

2010-12-17 Thread Daniel Dai
Pig team is happy to announce Pig 0.8.0 release. Apache Pig provides a high-level data-flow language and execution framework for parallel computation on Hadoop clusters. More details about Pig can be found at http://pig.apache.org/. The highlights of this release are scalar, custom partitioner

Re: Cumulative totals in an ORDERed relation.

2010-12-17 Thread Dmitriy Ryaboy
My interpretation was that he wants something more like this: in: {2, 5, 7, 1, 1, 3} out: {2, 7, 14, 15, 16, 19} .. which you can't get using a simple group/count. -D On Fri, Dec 17, 2010 at 3:36 PM, Zach Bailey wrote: > > Forgive me but I got one thing slightly wrong. Since you're wanting to

Re: Cumulative totals in an ORDERed relation.

2010-12-17 Thread Zach Bailey
Forgive me but I got one thing slightly wrong. Since you're wanting to do hourly totals and not daily totals you will want to change this line: > allDataISODates = FOREACH allData GENERATE string, > org.apache.pig.piggybank.evaluation.datetime.truncate.ISOToDay(org.apache.pig.piggybank.evaluat

Re: Cumulative totals in an ORDERed relation.

2010-12-17 Thread Zach Bailey
I believe what you're trying to do is this. You have some sort of data, and a timestamp: What you want to figure out is how many times each possible value of "data" appears in a certain time period (say, hourly). Let's say data can have three possible string values: {'a', 'b', 'c'} Your t

Re: Cumulative totals in an ORDERed relation.

2010-12-17 Thread Dmitriy Ryaboy
What you are suggesting seems to be a fundamentally single-threaded process (well, it can be parallelized, but it's not pretty and involves multiple passes), so it's not a good fit for the map-reduce paradigm (how would you do accumulative totals for 25 billion entries?). Pig tends to avoid implem

Cumulative totals in an ORDERed relation.

2010-12-17 Thread Kris Coward
Hello, Is there some sort of mechanism by which I could cause a value to accumulate within a relation? What I'd like to do is something along the lines of having a long called accumulator, and an outer bag called hourlyTotals with a schema of (hour:int, collected:int) accumulator = 0L; -- I know

CFP: The Second International Workshop on MapReduce and its Applications (MAPREDUCE'11)

2010-12-17 Thread Gilles Fedak
We apologize in advance if you receive multiple copies of this CFP. --- CALL FOR PAPERS The Second International Workshop on MapReduce and its Applications (MAPREDUCE'11)