How can I do distinct with foreach? Are those 2 separate statement like the one I posted or something different?
On Thu, Apr 12, 2012 at 7:49 AM, Gianmarco De Francisci Morales < [email protected]> wrote: > Hi, > > Distinct with the foreach is more efficient then grouping, as long as you > don't need the rest of the data you are better off with this solution. > > With the syntax A.FORM_ID, A.SET_ID you are invoking scalar projection, > that is you are telling Pig to treat the value as a scalar. The right > syntax is the first one (without the "A." in front). > > Cheers, > -- > Gianmarco > > > > On Wed, Apr 11, 2012 at 23:06, Mohit Anchlia <[email protected]> > wrote: > > > Thanks I tried something like this and it worked, but I have one more > > question: > > > > > > grunt> B = foreach A GENERATE FORM_ID, SET_ID; > > > > grunt> C= DISTINCT B; > > > > What's the different between foreach A GENERATE FORM_ID, SET_ID; and > > foreach A GENERATE A.FORM_ID, A.SET_ID;, To me they look the same but > > results are different. > > > > On Wed, Apr 11, 2012 at 1:57 PM, Prashant Kommireddi < > [email protected] > > >wrote: > > > > > You are doing a distinct on a Tuple, and not a Bag? > > > > > > In your example, DISTINCT on Field name on each record/tuple would not > > make > > > sense as its always a single value. You need to group by on a certain > key > > > before a distinct. > > > > > > > > > On Wed, Apr 11, 2012 at 1:53 PM, Mohit Anchlia <[email protected] > > > >wrote: > > > > > > > I am trying to get distinct from 2 fields in a record. something like > > > > select distinct a, b from c; So I wrote this in pig which is actually > > not > > > > working. I did: > > > > > > > > > > > > A = LOAD '/examples/form_out/part-m-00000' USING PigStorage('\t') AS > > > > (FILE_NAME:chararray,FORM_ID:chararray,SET_ID:chararray); > > > > > > > > B = foreach A {dist = DISTINCT A.FORM_ID, A.SET_ID; GENERATE dist;} > > > > > > > > ERROR 1000: Error during parsing. Invalid alias: A in {FILE_NAME: > > > chararray > > > > ... > > > > > > > > But this doesn't seem to be working. I thought A is a tuple and > form_id > > > and > > > > set_id are fields that I can do DISTINCT on. I saw similar example > > online > > > > but not exactly same. > > > > > > > > > >
