I believe these are the ops supported in a nested foreach:
CROSS, DISTINCT, FILTER, FOREACH, LIMIT, and ORDER BY.
See:
http://pig.apache.org/docs/r0.10.0/basic.html#foreach
On Sep 17, 2012 1:55 PM, "Kannan Shah" wrote:
> I'm trying to group tuples by a key, sort by another key within each gro
Probably an easy one but...
After processing a file through a series of groupings, aggreagtions and
projections using flatten I end up with long concatenated names for each
field shown in this snippre t from the JsonStorage generated schema
{
"name"
:"enrollments_instructor_1::enrollment
Probably an easy one but...
After processing a file through a series of groupings, aggreagtions and
projections using flatten I end up with long concatenated names for each
field shown in this snippre t from the JsonStorage generated schema
{
"name"
:"enrollments_instructor_1::enrollment
I'm trying to group tuples by a key, sort by another key within each group,
and then pass the sorted list of tuples for each group to a perl script. I
need to use the perl script because I need to compute an aggregate quantity
that is dependent on the sort order, and I'm not much of a Java programm
Ok thanks for the clarification.
I am interested in this because I am new to Pig and am use to writing
RecordReaders for mapreduce that reuse the same objects so I thought the
same logic would apply here. I have not done any performance tests.
On 09/17/2012 01:30 AM, Dmitriy Ryaboy wrote:
An
The pie chart is generated by MemoryAnalyzer(http://www.eclipse.org/mat/) from
the heap dump when OOME happened.
I've increased all the parallelisms and set default_parallel to 3. It does not
work.
Still I don't know what the first MR job compiled by Pig is doing . Only 1
reducer all the time