Pig team is happy to announce Pig 0.8.0 release.
Apache Pig provides a high-level data-flow language and execution
framework for parallel computation on Hadoop clusters.
More details about Pig can be found at http://pig.apache.org/.
The highlights of this release are scalar, custom partitioner
My interpretation was that he wants something more like this:
in: {2, 5, 7, 1, 1, 3}
out: {2, 7, 14, 15, 16, 19}
.. which you can't get using a simple group/count.
-D
On Fri, Dec 17, 2010 at 3:36 PM, Zach Bailey wrote:
>
> Forgive me but I got one thing slightly wrong. Since you're wanting to
Forgive me but I got one thing slightly wrong. Since you're wanting to do
hourly totals and not daily totals you will want to change this line:
> allDataISODates = FOREACH allData GENERATE string,
> org.apache.pig.piggybank.evaluation.datetime.truncate.ISOToDay(org.apache.pig.piggybank.evaluat
I believe what you're trying to do is this. You have some sort of data, and a
timestamp:
What you want to figure out is how many times each possible value of "data"
appears in a certain time period (say, hourly).
Let's say data can have three possible string values: {'a', 'b', 'c'}
Your t
What you are suggesting seems to be a fundamentally single-threaded process
(well, it can be parallelized, but it's not pretty and involves multiple
passes), so it's not a good fit for the map-reduce paradigm (how would you
do accumulative totals for 25 billion entries?). Pig tends to avoid
implem
Hello,
Is there some sort of mechanism by which I could cause a value to
accumulate within a relation? What I'd like to do is something along the
lines of having a long called accumulator, and an outer bag called
hourlyTotals with a schema of (hour:int, collected:int)
accumulator = 0L; -- I know
We apologize in advance if you receive multiple copies of this CFP.
---
CALL FOR PAPERS
The Second International Workshop on
MapReduce and its Applications (MAPREDUCE'11)