C = DISTINCT B; STORE C INTO '$OUTPUT';
-Kris On Fri, May 18, 2012 at 04:55:23PM +0100, Brendan Gill wrote: > Hi all, > > We've been getting some funny outputs to some Pig jobs recently that > contains a lot of duplicated data. I'm wondering if the cause of this > could be Pig, or if we must have duplicates in our raw data set (which is > very possible). > > We're running simple Pig jobs that are just filtering a subset of our data > based on co-ordinates e.g.: > > A = LOAD '$INPUT' USING PigStorage('\t') as (entity_id: long, lat: double, > lng: double); > > B = FILTER A BY (lat > 37.708) AND (lat < 37.817) AND (lng > -122.519) AND > (lng < -122.356); > > STORE B INTO '$OUTPUT'; > > Thanks. -- Kris Coward http://unripe.melon.org/ GPG Fingerprint: 2BF3 957D 310A FEEC 4733 830E 21A4 05C7 1FEB 12B3