C = DISTINCT B;

STORE C INTO '$OUTPUT';

-Kris

On Fri, May 18, 2012 at 04:55:23PM +0100, Brendan Gill wrote:
> Hi all,
> 
> We've been getting some funny outputs to some Pig jobs recently that
> contains a lot of duplicated data.  I'm wondering if the cause of this
> could be Pig, or if we must have duplicates in our raw data set (which is
> very possible).
> 
> We're running simple Pig jobs that are just filtering a subset of our data
> based on co-ordinates e.g.:
> 
> A =  LOAD '$INPUT' USING PigStorage('\t') as (entity_id: long, lat: double,
> lng: double);
> 
> B =  FILTER A BY (lat > 37.708) AND (lat < 37.817) AND (lng > -122.519) AND
> (lng < -122.356);
> 
> STORE B INTO '$OUTPUT';
> 
> Thanks.

-- 
Kris Coward                                     http://unripe.melon.org/
GPG Fingerprint: 2BF3 957D 310A FEEC 4733  830E 21A4 05C7 1FEB 12B3

Reply via email to