tested on pig .15 using your data and in local mode .. could not reproduce
issue ..
==================================================
final_by_lsn_g = GROUP final_by_lsn BY screen_name;
(Ian_hoch,{(en,Ian_hoch)})
(gwenshap,{(en,gwenshap)})
(p2people,{(en,p2people)})
(DoThisBest,{(en,DoThisBest)})
(wesleyyuhn1,{(en,wesleyyuhn1)})
(GuitartJosep,{(en,GuitartJosep)})
(Komalmittal91,{(en,Komalmittal91)})
(LornaGreenNWC,{(en,LornaGreenNWC)})
(W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ)})
(innovatesocialm,{(en,innovatesocialm)})
==================================================
final_by_lsn_g = GROUP final_by_lsn BY language;
(en,{(en,DoThisBest),(en,wesleyyuhn1),(en,W4_Jobs_in_ARZ),(en,p2people),(en,Ian_hoch),(en,Komalmittal91),(en,innovatesocialm),(en,gwenshap),(en,GuitartJosep),(en,LornaGreenNWC)})
==================================================
suggestions ..
> try in local mode to reporduce issue .. (if you have not already done so)
> close all old sessions and open a new one... (i know its dumb..but helped
me some times)
*Cheers !!*
Arvind
On Tue, Nov 17, 2015 at 8:09 AM, Sam Joe <[email protected]> wrote:
> Hi,
>
> I reproduced the issue with less columns as well.
>
> dump final_by_lsn;
>
> (en,LornaGreenNWC)
> (en,GuitartJosep)
> (en,gwenshap)
> (en,innovatesocialm)
> (en,Komalmittal91)
> (en,Ian_hoch)
> (en,p2people)
> (en,W4_Jobs_in_ARZ)
> (en,wesleyyuhn1)
> (en,DoThisBest)
>
> grunt> final_by_lsn_g = GROUP final_by_lsn BY screen_name;
>
>
> grunt> dump final_by_lsn_g;
>
> (gwenshap,{(en,gwenshap)})
> (p2people,{(en,p2people),(en,p2people),(en,p2people)})
> (GuitartJosep,{(en,GuitartJosep),(en,GuitartJosep),(en,GuitartJosep)})
>
> (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ)})
>
>
> Steps I tried to find the root-cause:
> - Removing special characters from the data
> - Setting the loglevel to 'Debug'
> However, I couldn't find a clue about the problem.
>
>
>
> Can someone please help me troubleshoot the issue?
>
> Thanks,
> Joel
>
> On Fri, Nov 13, 2015 at 12:18 PM, Steve Terrell <[email protected]>
> wrote:
>
> > Please try reproducing the problem with the smallest amount of data
> > possible. Use as few rows and the smallest strings possible that still
> > demonstrate the discrepancy. And then repost your problem. In doing so,
> > it will make your request easier to digest by the readers of group, and
> you
> > might even discover a problem in your original data if you can not
> > reproduce it on a smaller scale.
> >
> > Thanks,
> > Steve
> >
> > On Fri, Nov 13, 2015 at 10:28 AM, Sam Joe <[email protected]>
> wrote:
> >
> > > Hi,
> > >
> > > I am trying to group a table (final) containing 10 records, by a
> > > column screen_name using the following command.
> > >
> > > final_by_sn = GROUP final BY screen_name;
> > >
> > > When I dump final_by_sn table, only 4 records are returned as shown
> > below:
> > >
> > > grunt> dump final_by_sn;
> > >
> > > (gwenshap,{(.@bigdata used this photo in his blog post and made me
> > realize
> > > how much I miss Japan:
> > https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,2943
> > > )
> > > })
> > > (p2people,{(6 new @p2pLanguages jobs w/ #BigData #Hadoop skills
> > > http://t.co/UBAni5DPrw
> > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437
> > > ),(6
> > > new @p2pLanguages jobs w/ #BigData #Hadoop skills
> http://t.co/UBAni5DPrw
> > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437),(6 new
> @p2pLanguages
> > > jobs w/ #BigData #Hadoop skills http://t.co/UBAni5DPrw
> > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437)})
> > > (GuitartJosep,{(#BigData: What it can and can't do!
> > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140),(#BigData: What it
> > can
> > > and can't do! http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140
> > > ),(#BigData:
> > > What it can and can't do!
> > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140)})
> > > (W4_Jobs_in_ARZ,{(Big #Data #Lead Phoenix AZ (#job) wanted in #Arizona.
> > > #TechFetch http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433),(Big
> #Data
> > > #Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch
> > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433),(Big #Data #Lead
> > > Phoenix
> > > AZ (#job) wanted in #Arizona. #TechFetch
> > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433)})
> > >
> > > dump final;
> > >
> > > (RT @lordlancaster: Absolutely blown away by @SciTecDaresbury! 'Proper'
> > Big
> > > Data, Smart Cities, Internet of Things & more! #TechNorth
> > > http:/…,en,LornaGreenNWC,8,166,188,Mon May 12 10:19:39 +0000
> > > 2014,654395184428515332)
> > > (#BigData: What it can and can't do!
> > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,Thu Jun 18 10:20:02
> +0000
> > > 2015,654395189595869184)
> > > (.@bigdata used this photo in his blog post and made me realize how
> much
> > I
> > > miss Japan: https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,Mon Oct 15
> > > 20:49:39 +0000 2007,654395195581009920)
> > > ("Global Release [Big Data Book] Profit From Science" on @LinkedIn
> > > http://t.co/WnJ2HwthYF Congrats to George
> > > Danner!,en,innovatesocialm,,1517,1712,Wed Sep 12 13:46:43 +0000
> > > 2012,654395207065034752)
> > > (Hi, BesPardon Don't Forget to follow -->>
> http://t.co/Dahu964w5U
> > > Thanks.. http://t.co/9kKXJ0GQcT,en,Komalmittal91,,51,0,Thu Feb 12
> > 16:44:50
> > > +0000 2015,654395216208752641)
> > > (On Google Books, language, and the possible limits of big data
> > > https://t.co/OEebZSK952,en,Ian_hoch,,63,107,Fri Aug 31 16:25:09 +0000
> > > 2012,654395216057659392)
> > > (6 new @p2pLanguages jobs w/ #BigData #Hadoop skills
> > > http://t.co/UBAni5DPrw
> > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,Wed Mar 04 06:17:09
> +0000
> > > 2009,654395220373729280)
> > > (Big #Data #Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch
> > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,Fri Aug 29 09:32:31
> +0000
> > > 2014,654395236718911488)
> > > (#Appboy expands suite of #mobile #analytics @venturebeat @wesleyyuhn1
> > > http://t.co/85P6vEJg08 #MarTech #automation
> > > http://t.co/rWqzNNt1vW,en,wesleyyuhn1,,1531,1927,Mon Jul 21 12:35:12
> > +0000
> > > 2014,654395243975065600)
> > > (Best Cloud Hosting and CDN services for Web Developers
> > > http://t.co/9uf6IaUIlM #cdn #cloudcomputing #cloudhosting #webmasters
> > > #websites,en,DoThisBest,,816,1092,Mon Nov 26 18:34:20 +0000
> > > 2012,654395246025904128)
> > > grunt>
> > >
> > >
> > > Could you please help me understand why 6 records are eliminated while
> > > doing a group by?
> > >
> > > Thanks,
> > > Joel
> > >
> >
>