Hi .. if you are reading json then ensure that the file content is parsed correct by pig before you do grouping. Simple dump sometimes does not show if the json was parsed into multiple columns or entire line was read as one string into the 1st column only.
*Cheers !!* Arvind On Wed, Nov 18, 2015 at 4:59 AM, Sam Joe <[email protected]> wrote: > Hi Arvind, > > You are right. It works fine in local mode. No records eliminated. > > I need to now find out why while using mapreduce mode some records are > getting eliminated. > > Any suggestions on troubleshooting steps for finding out the root-cause in > mapreduce mode? Which logs to be checked, etc. > > Appreciate any help! > > Thanks, > Joel > > On Mon, Nov 16, 2015 at 11:32 PM, Arvind S <[email protected]> wrote: > > > tested on pig .15 using your data and in local mode .. could not > reproduce > > issue .. > > ================================================== > > final_by_lsn_g = GROUP final_by_lsn BY screen_name; > > > > (Ian_hoch,{(en,Ian_hoch)}) > > (gwenshap,{(en,gwenshap)}) > > (p2people,{(en,p2people)}) > > (DoThisBest,{(en,DoThisBest)}) > > (wesleyyuhn1,{(en,wesleyyuhn1)}) > > (GuitartJosep,{(en,GuitartJosep)}) > > (Komalmittal91,{(en,Komalmittal91)}) > > (LornaGreenNWC,{(en,LornaGreenNWC)}) > > (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ)}) > > (innovatesocialm,{(en,innovatesocialm)}) > > ================================================== > > final_by_lsn_g = GROUP final_by_lsn BY language; > > > > > > > (en,{(en,DoThisBest),(en,wesleyyuhn1),(en,W4_Jobs_in_ARZ),(en,p2people),(en,Ian_hoch),(en,Komalmittal91),(en,innovatesocialm),(en,gwenshap),(en,GuitartJosep),(en,LornaGreenNWC)}) > > ================================================== > > > > suggestions .. > > > try in local mode to reporduce issue .. (if you have not already done > so) > > > close all old sessions and open a new one... (i know its dumb..but > helped > > me some times) > > > > > > *Cheers !!* > > Arvind > > > > On Tue, Nov 17, 2015 at 8:09 AM, Sam Joe <[email protected]> > wrote: > > > > > Hi, > > > > > > I reproduced the issue with less columns as well. > > > > > > dump final_by_lsn; > > > > > > (en,LornaGreenNWC) > > > (en,GuitartJosep) > > > (en,gwenshap) > > > (en,innovatesocialm) > > > (en,Komalmittal91) > > > (en,Ian_hoch) > > > (en,p2people) > > > (en,W4_Jobs_in_ARZ) > > > (en,wesleyyuhn1) > > > (en,DoThisBest) > > > > > > grunt> final_by_lsn_g = GROUP final_by_lsn BY screen_name; > > > > > > > > > grunt> dump final_by_lsn_g; > > > > > > (gwenshap,{(en,gwenshap)}) > > > (p2people,{(en,p2people),(en,p2people),(en,p2people)}) > > > (GuitartJosep,{(en,GuitartJosep),(en,GuitartJosep),(en,GuitartJosep)}) > > > > > > > > > (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ)}) > > > > > > > > > Steps I tried to find the root-cause: > > > - Removing special characters from the data > > > - Setting the loglevel to 'Debug' > > > However, I couldn't find a clue about the problem. > > > > > > > > > > > > Can someone please help me troubleshoot the issue? > > > > > > Thanks, > > > Joel > > > > > > On Fri, Nov 13, 2015 at 12:18 PM, Steve Terrell <[email protected] > > > > > wrote: > > > > > > > Please try reproducing the problem with the smallest amount of data > > > > possible. Use as few rows and the smallest strings possible that > still > > > > demonstrate the discrepancy. And then repost your problem. In doing > > so, > > > > it will make your request easier to digest by the readers of group, > and > > > you > > > > might even discover a problem in your original data if you can not > > > > reproduce it on a smaller scale. > > > > > > > > Thanks, > > > > Steve > > > > > > > > On Fri, Nov 13, 2015 at 10:28 AM, Sam Joe <[email protected]> > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > I am trying to group a table (final) containing 10 records, by a > > > > > column screen_name using the following command. > > > > > > > > > > final_by_sn = GROUP final BY screen_name; > > > > > > > > > > When I dump final_by_sn table, only 4 records are returned as shown > > > > below: > > > > > > > > > > grunt> dump final_by_sn; > > > > > > > > > > (gwenshap,{(.@bigdata used this photo in his blog post and made me > > > > realize > > > > > how much I miss Japan: > > > > https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,2943 > > > > > ) > > > > > }) > > > > > (p2people,{(6 new @p2pLanguages jobs w/ #BigData #Hadoop skills > > > > > http://t.co/UBAni5DPrw > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437 > > > > > ),(6 > > > > > new @p2pLanguages jobs w/ #BigData #Hadoop skills > > > http://t.co/UBAni5DPrw > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437),(6 new > > > @p2pLanguages > > > > > jobs w/ #BigData #Hadoop skills http://t.co/UBAni5DPrw > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437)}) > > > > > (GuitartJosep,{(#BigData: What it can and can't do! > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140),(#BigData: > What > > it > > > > can > > > > > and can't do! http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140 > > > > > ),(#BigData: > > > > > What it can and can't do! > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140)}) > > > > > (W4_Jobs_in_ARZ,{(Big #Data #Lead Phoenix AZ (#job) wanted in > > #Arizona. > > > > > #TechFetch http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433),(Big > > > #Data > > > > > #Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433),(Big #Data > #Lead > > > > > Phoenix > > > > > AZ (#job) wanted in #Arizona. #TechFetch > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433)}) > > > > > > > > > > dump final; > > > > > > > > > > (RT @lordlancaster: Absolutely blown away by @SciTecDaresbury! > > 'Proper' > > > > Big > > > > > Data, Smart Cities, Internet of Things & more! #TechNorth > > > > > http:/…,en,LornaGreenNWC,8,166,188,Mon May 12 10:19:39 +0000 > > > > > 2014,654395184428515332) > > > > > (#BigData: What it can and can't do! > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,Thu Jun 18 10:20:02 > > > +0000 > > > > > 2015,654395189595869184) > > > > > (.@bigdata used this photo in his blog post and made me realize how > > > much > > > > I > > > > > miss Japan: https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,Mon Oct > > 15 > > > > > 20:49:39 +0000 2007,654395195581009920) > > > > > ("Global Release [Big Data Book] Profit From Science" on @LinkedIn > > > > > http://t.co/WnJ2HwthYF Congrats to George > > > > > Danner!,en,innovatesocialm,,1517,1712,Wed Sep 12 13:46:43 +0000 > > > > > 2012,654395207065034752) > > > > > (Hi, BesPardon Don't Forget to follow -->> > > > http://t.co/Dahu964w5U > > > > > Thanks.. http://t.co/9kKXJ0GQcT,en,Komalmittal91,,51,0,Thu Feb 12 > > > > 16:44:50 > > > > > +0000 2015,654395216208752641) > > > > > (On Google Books, language, and the possible limits of big data > > > > > https://t.co/OEebZSK952,en,Ian_hoch,,63,107,Fri Aug 31 16:25:09 > > +0000 > > > > > 2012,654395216057659392) > > > > > (6 new @p2pLanguages jobs w/ #BigData #Hadoop skills > > > > > http://t.co/UBAni5DPrw > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,Wed Mar 04 06:17:09 > > > +0000 > > > > > 2009,654395220373729280) > > > > > (Big #Data #Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,Fri Aug 29 09:32:31 > > > +0000 > > > > > 2014,654395236718911488) > > > > > (#Appboy expands suite of #mobile #analytics @venturebeat > > @wesleyyuhn1 > > > > > http://t.co/85P6vEJg08 #MarTech #automation > > > > > http://t.co/rWqzNNt1vW,en,wesleyyuhn1,,1531,1927,Mon Jul 21 > 12:35:12 > > > > +0000 > > > > > 2014,654395243975065600) > > > > > (Best Cloud Hosting and CDN services for Web Developers > > > > > http://t.co/9uf6IaUIlM #cdn #cloudcomputing #cloudhosting > > #webmasters > > > > > #websites,en,DoThisBest,,816,1092,Mon Nov 26 18:34:20 +0000 > > > > > 2012,654395246025904128) > > > > > grunt> > > > > > > > > > > > > > > > Could you please help me understand why 6 records are eliminated > > while > > > > > doing a group by? > > > > > > > > > > Thanks, > > > > > Joel > > > > > > > > > > > > > > >
