ce one of my relations is small enough to fit in memory, I can force it
to use a map side (replicated) join. Now the plan looks like this:
Map(LOAD A, LOAD B, JOIN, FILTER) -> Combine(COUNT) -> Reduce(COUNT)
On 2/9/14 12:53 PM, "Enns, Steven" wrote:
>I am trying to ag
I am trying to aggregate on the cross product of two relations. It can be
done using a single M/R job but pig is using two. The pig code looks like
this:
C = cross A, B;
C = filter C by Š;
G = group C by x;
G = foreach G generate group, COUNT(G);
The resulting M/
m a file.
>
>
>This jira is work in progress, but hopefully it will be in next major
>released.
>
>Thanks,
>Cheolsoo
>
>
>
>On Sat, Apr 27, 2013 at 3:24 PM, Enns, Steven wrote:
>
>> Resending now that I am subscribed :)
>>
>> On 4/25/13 4:01 P
Resending now that I am subscribed :)
On 4/25/13 4:01 PM, "Enns, Steven" wrote:
>Hi everyone,
>
>I would like to override the input schema in AvroStorage to make a pig
>script robust to schema evolution. For example, suppose a new field is
>added to an avro schema wit
Hi everyone,
I would like to override the input schema in AvroStorage to make a pig
script robust to schema evolution. For example, suppose a new field is
added to an avro schema with a default value of null. If the input to a
pig script using this field includes both old and new data, AvroStora