Hi Arvind, You are right. It works fine in local mode. No records eliminated.
I need to now find out why while using mapreduce mode some records are getting eliminated. Any suggestions on troubleshooting steps for finding out the root-cause in mapreduce mode? Which logs to be checked, etc. Appreciate any help! Thanks, Joel On Mon, Nov 16, 2015 at 11:32 PM, Arvind S <[email protected]> wrote: > tested on pig .15 using your data and in local mode .. could not reproduce > issue .. > ================================================== > final_by_lsn_g = GROUP final_by_lsn BY screen_name; > > (Ian_hoch,{(en,Ian_hoch)}) > (gwenshap,{(en,gwenshap)}) > (p2people,{(en,p2people)}) > (DoThisBest,{(en,DoThisBest)}) > (wesleyyuhn1,{(en,wesleyyuhn1)}) > (GuitartJosep,{(en,GuitartJosep)}) > (Komalmittal91,{(en,Komalmittal91)}) > (LornaGreenNWC,{(en,LornaGreenNWC)}) > (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ)}) > (innovatesocialm,{(en,innovatesocialm)}) > ================================================== > final_by_lsn_g = GROUP final_by_lsn BY language; > > > (en,{(en,DoThisBest),(en,wesleyyuhn1),(en,W4_Jobs_in_ARZ),(en,p2people),(en,Ian_hoch),(en,Komalmittal91),(en,innovatesocialm),(en,gwenshap),(en,GuitartJosep),(en,LornaGreenNWC)}) > ================================================== > > suggestions .. > > try in local mode to reporduce issue .. (if you have not already done so) > > close all old sessions and open a new one... (i know its dumb..but helped > me some times) > > > *Cheers !!* > Arvind > > On Tue, Nov 17, 2015 at 8:09 AM, Sam Joe <[email protected]> wrote: > > > Hi, > > > > I reproduced the issue with less columns as well. > > > > dump final_by_lsn; > > > > (en,LornaGreenNWC) > > (en,GuitartJosep) > > (en,gwenshap) > > (en,innovatesocialm) > > (en,Komalmittal91) > > (en,Ian_hoch) > > (en,p2people) > > (en,W4_Jobs_in_ARZ) > > (en,wesleyyuhn1) > > (en,DoThisBest) > > > > grunt> final_by_lsn_g = GROUP final_by_lsn BY screen_name; > > > > > > grunt> dump final_by_lsn_g; > > > > (gwenshap,{(en,gwenshap)}) > > (p2people,{(en,p2people),(en,p2people),(en,p2people)}) > > (GuitartJosep,{(en,GuitartJosep),(en,GuitartJosep),(en,GuitartJosep)}) > > > > > (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ)}) > > > > > > Steps I tried to find the root-cause: > > - Removing special characters from the data > > - Setting the loglevel to 'Debug' > > However, I couldn't find a clue about the problem. > > > > > > > > Can someone please help me troubleshoot the issue? > > > > Thanks, > > Joel > > > > On Fri, Nov 13, 2015 at 12:18 PM, Steve Terrell <[email protected]> > > wrote: > > > > > Please try reproducing the problem with the smallest amount of data > > > possible. Use as few rows and the smallest strings possible that still > > > demonstrate the discrepancy. And then repost your problem. In doing > so, > > > it will make your request easier to digest by the readers of group, and > > you > > > might even discover a problem in your original data if you can not > > > reproduce it on a smaller scale. > > > > > > Thanks, > > > Steve > > > > > > On Fri, Nov 13, 2015 at 10:28 AM, Sam Joe <[email protected]> > > wrote: > > > > > > > Hi, > > > > > > > > I am trying to group a table (final) containing 10 records, by a > > > > column screen_name using the following command. > > > > > > > > final_by_sn = GROUP final BY screen_name; > > > > > > > > When I dump final_by_sn table, only 4 records are returned as shown > > > below: > > > > > > > > grunt> dump final_by_sn; > > > > > > > > (gwenshap,{(.@bigdata used this photo in his blog post and made me > > > realize > > > > how much I miss Japan: > > > https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,2943 > > > > ) > > > > }) > > > > (p2people,{(6 new @p2pLanguages jobs w/ #BigData #Hadoop skills > > > > http://t.co/UBAni5DPrw > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437 > > > > ),(6 > > > > new @p2pLanguages jobs w/ #BigData #Hadoop skills > > http://t.co/UBAni5DPrw > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437),(6 new > > @p2pLanguages > > > > jobs w/ #BigData #Hadoop skills http://t.co/UBAni5DPrw > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437)}) > > > > (GuitartJosep,{(#BigData: What it can and can't do! > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140),(#BigData: What > it > > > can > > > > and can't do! http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140 > > > > ),(#BigData: > > > > What it can and can't do! > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140)}) > > > > (W4_Jobs_in_ARZ,{(Big #Data #Lead Phoenix AZ (#job) wanted in > #Arizona. > > > > #TechFetch http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433),(Big > > #Data > > > > #Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433),(Big #Data #Lead > > > > Phoenix > > > > AZ (#job) wanted in #Arizona. #TechFetch > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433)}) > > > > > > > > dump final; > > > > > > > > (RT @lordlancaster: Absolutely blown away by @SciTecDaresbury! > 'Proper' > > > Big > > > > Data, Smart Cities, Internet of Things & more! #TechNorth > > > > http:/…,en,LornaGreenNWC,8,166,188,Mon May 12 10:19:39 +0000 > > > > 2014,654395184428515332) > > > > (#BigData: What it can and can't do! > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,Thu Jun 18 10:20:02 > > +0000 > > > > 2015,654395189595869184) > > > > (.@bigdata used this photo in his blog post and made me realize how > > much > > > I > > > > miss Japan: https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,Mon Oct > 15 > > > > 20:49:39 +0000 2007,654395195581009920) > > > > ("Global Release [Big Data Book] Profit From Science" on @LinkedIn > > > > http://t.co/WnJ2HwthYF Congrats to George > > > > Danner!,en,innovatesocialm,,1517,1712,Wed Sep 12 13:46:43 +0000 > > > > 2012,654395207065034752) > > > > (Hi, BesPardon Don't Forget to follow -->> > > http://t.co/Dahu964w5U > > > > Thanks.. http://t.co/9kKXJ0GQcT,en,Komalmittal91,,51,0,Thu Feb 12 > > > 16:44:50 > > > > +0000 2015,654395216208752641) > > > > (On Google Books, language, and the possible limits of big data > > > > https://t.co/OEebZSK952,en,Ian_hoch,,63,107,Fri Aug 31 16:25:09 > +0000 > > > > 2012,654395216057659392) > > > > (6 new @p2pLanguages jobs w/ #BigData #Hadoop skills > > > > http://t.co/UBAni5DPrw > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,Wed Mar 04 06:17:09 > > +0000 > > > > 2009,654395220373729280) > > > > (Big #Data #Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,Fri Aug 29 09:32:31 > > +0000 > > > > 2014,654395236718911488) > > > > (#Appboy expands suite of #mobile #analytics @venturebeat > @wesleyyuhn1 > > > > http://t.co/85P6vEJg08 #MarTech #automation > > > > http://t.co/rWqzNNt1vW,en,wesleyyuhn1,,1531,1927,Mon Jul 21 12:35:12 > > > +0000 > > > > 2014,654395243975065600) > > > > (Best Cloud Hosting and CDN services for Web Developers > > > > http://t.co/9uf6IaUIlM #cdn #cloudcomputing #cloudhosting > #webmasters > > > > #websites,en,DoThisBest,,816,1092,Mon Nov 26 18:34:20 +0000 > > > > 2012,654395246025904128) > > > > grunt> > > > > > > > > > > > > Could you please help me understand why 6 records are eliminated > while > > > > doing a group by? > > > > > > > > Thanks, > > > > Joel > > > > > > > > > >
