You can solve this using the DISTINCT operator to solve this, it will give
you only the unique entries and than you can count them.

Example:

data = LOAD '...' USING PigStorage() as (id:int, field1:chararray,
field2:chararray);
unique_data = DISTINCT data;
unique_count = FOREACH (GROUP unique_data all) GENERATE COUNT($1);
dump unique_count;


On Tue, Apr 2, 2013 at 2:05 PM, jamal sasha <jamalsha...@gmail.com> wrote:

> Hi,
>  I have data in hdfs like:
>
> id1,field1,field2
> 1,2,3
> 1,2,3
> 1,2,4
> 1,2,5
> I want to find the number of unique entries using pig..
> So here, number of unique entries are 3 ( as 1,2,3 is repeated twice)
>
> How do i find this?
>
> Thanks
>

Reply via email to