Re: STREAM in foreach block

2012-09-17 Thread Dan Young
I believe these are the ops supported in a nested foreach: CROSS, DISTINCT, FILTER, FOREACH, LIMIT, and ORDER BY. See: http://pig.apache.org/docs/r0.10.0/basic.html#foreach On Sep 17, 2012 1:55 PM, "Kannan Shah" wrote: > I'm trying to group tuples by a key, sort by another key within each gro

Fwd: Removing unnecessary disambiguation marks

2012-09-17 Thread Robert Yerex
Probably an easy one but... After processing a file through a series of groupings, aggreagtions and projections using flatten I end up with long concatenated names for each field shown in this snippre t from the JsonStorage generated schema { "name" :"enrollments_instructor_1::enrollment

Removing unnecessary disambiguation marks

2012-09-17 Thread Robert Yerex
Probably an easy one but... After processing a file through a series of groupings, aggreagtions and projections using flatten I end up with long concatenated names for each field shown in this snippre t from the JsonStorage generated schema { "name" :"enrollments_instructor_1::enrollment

STREAM in foreach block

2012-09-17 Thread Kannan Shah
I'm trying to group tuples by a key, sort by another key within each group, and then pass the sorted list of tuples for each group to a perl script. I need to use the perl script because I need to compute an aggregate quantity that is dependent on the sort order, and I'm not much of a Java programm

Re: reuse same Tuple and ArrayList for every getNext call in LoadFunc?

2012-09-17 Thread Jim Donofrio
Ok thanks for the clarification. I am interested in this because I am new to Pig and am use to writing RecordReaders for mapreduce that reuse the same objects so I thought the same logic would apply here. I have not done any performance tests. On 09/17/2012 01:30 AM, Dmitriy Ryaboy wrote: An

Re: How can I split the data with more reducers?

2012-09-17 Thread Haitao Yao
The pie chart is generated by MemoryAnalyzer(http://www.eclipse.org/mat/) from the heap dump when OOME happened. I've increased all the parallelisms and set default_parallel to 3. It does not work. Still I don't know what the first MR job compiled by Pig is doing . Only 1 reducer all the time