Hi, Hdfs. I'm processing json data in hdfs.
For testing in local, I copied 2 rows to a text file. Thanks, Joel On Tuesday, November 17, 2015, Andrew Oliver <[email protected]> wrote: > What is your record source? Files or Hive or? > On Nov 17, 2015 6:29 PM, "Sam Joe" <[email protected] <javascript:;>> > wrote: > > > Hi Arvind, > > > > You are right. It works fine in local mode. No records eliminated. > > > > I need to now find out why while using mapreduce mode some records are > > getting eliminated. > > > > Any suggestions on troubleshooting steps for finding out the root-cause > in > > mapreduce mode? Which logs to be checked, etc. > > > > Appreciate any help! > > > > Thanks, > > Joel > > > > On Mon, Nov 16, 2015 at 11:32 PM, Arvind S <[email protected] > <javascript:;>> wrote: > > > > > tested on pig .15 using your data and in local mode .. could not > > reproduce > > > issue .. > > > ================================================== > > > final_by_lsn_g = GROUP final_by_lsn BY screen_name; > > > > > > (Ian_hoch,{(en,Ian_hoch)}) > > > (gwenshap,{(en,gwenshap)}) > > > (p2people,{(en,p2people)}) > > > (DoThisBest,{(en,DoThisBest)}) > > > (wesleyyuhn1,{(en,wesleyyuhn1)}) > > > (GuitartJosep,{(en,GuitartJosep)}) > > > (Komalmittal91,{(en,Komalmittal91)}) > > > (LornaGreenNWC,{(en,LornaGreenNWC)}) > > > (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ)}) > > > (innovatesocialm,{(en,innovatesocialm)}) > > > ================================================== > > > final_by_lsn_g = GROUP final_by_lsn BY language; > > > > > > > > > > > > (en,{(en,DoThisBest),(en,wesleyyuhn1),(en,W4_Jobs_in_ARZ),(en,p2people),(en,Ian_hoch),(en,Komalmittal91),(en,innovatesocialm),(en,gwenshap),(en,GuitartJosep),(en,LornaGreenNWC)}) > > > ================================================== > > > > > > suggestions .. > > > > try in local mode to reporduce issue .. (if you have not already done > > so) > > > > close all old sessions and open a new one... (i know its dumb..but > > helped > > > me some times) > > > > > > > > > *Cheers !!* > > > Arvind > > > > > > On Tue, Nov 17, 2015 at 8:09 AM, Sam Joe <[email protected] > <javascript:;>> > > wrote: > > > > > > > Hi, > > > > > > > > I reproduced the issue with less columns as well. > > > > > > > > dump final_by_lsn; > > > > > > > > (en,LornaGreenNWC) > > > > (en,GuitartJosep) > > > > (en,gwenshap) > > > > (en,innovatesocialm) > > > > (en,Komalmittal91) > > > > (en,Ian_hoch) > > > > (en,p2people) > > > > (en,W4_Jobs_in_ARZ) > > > > (en,wesleyyuhn1) > > > > (en,DoThisBest) > > > > > > > > grunt> final_by_lsn_g = GROUP final_by_lsn BY screen_name; > > > > > > > > > > > > grunt> dump final_by_lsn_g; > > > > > > > > (gwenshap,{(en,gwenshap)}) > > > > (p2people,{(en,p2people),(en,p2people),(en,p2people)}) > > > > > (GuitartJosep,{(en,GuitartJosep),(en,GuitartJosep),(en,GuitartJosep)}) > > > > > > > > > > > > > > (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ)}) > > > > > > > > > > > > Steps I tried to find the root-cause: > > > > - Removing special characters from the data > > > > - Setting the loglevel to 'Debug' > > > > However, I couldn't find a clue about the problem. > > > > > > > > > > > > > > > > Can someone please help me troubleshoot the issue? > > > > > > > > Thanks, > > > > Joel > > > > > > > > On Fri, Nov 13, 2015 at 12:18 PM, Steve Terrell < > [email protected] <javascript:;> > > > > > > > wrote: > > > > > > > > > Please try reproducing the problem with the smallest amount of data > > > > > possible. Use as few rows and the smallest strings possible that > > still > > > > > demonstrate the discrepancy. And then repost your problem. In > doing > > > so, > > > > > it will make your request easier to digest by the readers of group, > > and > > > > you > > > > > might even discover a problem in your original data if you can not > > > > > reproduce it on a smaller scale. > > > > > > > > > > Thanks, > > > > > Steve > > > > > > > > > > On Fri, Nov 13, 2015 at 10:28 AM, Sam Joe <[email protected] > <javascript:;>> > > > > wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > I am trying to group a table (final) containing 10 records, by a > > > > > > column screen_name using the following command. > > > > > > > > > > > > final_by_sn = GROUP final BY screen_name; > > > > > > > > > > > > When I dump final_by_sn table, only 4 records are returned as > shown > > > > > below: > > > > > > > > > > > > grunt> dump final_by_sn; > > > > > > > > > > > > (gwenshap,{(.@bigdata used this photo in his blog post and made > me > > > > > realize > > > > > > how much I miss Japan: > > > > > https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,2943 > > > > > > ) > > > > > > }) > > > > > > (p2people,{(6 new @p2pLanguages jobs w/ #BigData #Hadoop skills > > > > > > http://t.co/UBAni5DPrw > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437 > > > > > > ),(6 > > > > > > new @p2pLanguages jobs w/ #BigData #Hadoop skills > > > > http://t.co/UBAni5DPrw > > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437),(6 new > > > > @p2pLanguages > > > > > > jobs w/ #BigData #Hadoop skills http://t.co/UBAni5DPrw > > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437)}) > > > > > > (GuitartJosep,{(#BigData: What it can and can't do! > > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140),(#BigData: > > What > > > it > > > > > can > > > > > > and can't do! http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140 > > > > > > ),(#BigData: > > > > > > What it can and can't do! > > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140)}) > > > > > > (W4_Jobs_in_ARZ,{(Big #Data #Lead Phoenix AZ (#job) wanted in > > > #Arizona. > > > > > > #TechFetch http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433 > ),(Big > > > > #Data > > > > > > #Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch > > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433),(Big #Data > > #Lead > > > > > > Phoenix > > > > > > AZ (#job) wanted in #Arizona. #TechFetch > > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433)}) > > > > > > > > > > > > dump final; > > > > > > > > > > > > (RT @lordlancaster: Absolutely blown away by @SciTecDaresbury! > > > 'Proper' > > > > > Big > > > > > > Data, Smart Cities, Internet of Things & more! #TechNorth > > > > > > http:/…,en,LornaGreenNWC,8,166,188,Mon May 12 10:19:39 +0000 > > > > > > 2014,654395184428515332) > > > > > > (#BigData: What it can and can't do! > > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,Thu Jun 18 > 10:20:02 > > > > +0000 > > > > > > 2015,654395189595869184) > > > > > > (.@bigdata used this photo in his blog post and made me realize > how > > > > much > > > > > I > > > > > > miss Japan: https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,Mon > Oct > > > 15 > > > > > > 20:49:39 +0000 2007,654395195581009920) > > > > > > ("Global Release [Big Data Book] Profit From Science" on > @LinkedIn > > > > > > http://t.co/WnJ2HwthYF Congrats to George > > > > > > Danner!,en,innovatesocialm,,1517,1712,Wed Sep 12 13:46:43 +0000 > > > > > > 2012,654395207065034752) > > > > > > (Hi, BesPardon Don't Forget to follow -->> > > > > http://t.co/Dahu964w5U > > > > > > Thanks.. http://t.co/9kKXJ0GQcT,en,Komalmittal91,,51,0,Thu Feb > 12 > > > > > 16:44:50 > > > > > > +0000 2015,654395216208752641) > > > > > > (On Google Books, language, and the possible limits of big data > > > > > > https://t.co/OEebZSK952,en,Ian_hoch,,63,107,Fri Aug 31 16:25:09 > > > +0000 > > > > > > 2012,654395216057659392) > > > > > > (6 new @p2pLanguages jobs w/ #BigData #Hadoop skills > > > > > > http://t.co/UBAni5DPrw > > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,Wed Mar 04 > 06:17:09 > > > > +0000 > > > > > > 2009,654395220373729280) > > > > > > (Big #Data #Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch > > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,Fri Aug 29 > 09:32:31 > > > > +0000 > > > > > > 2014,654395236718911488) > > > > > > (#Appboy expands suite of #mobile #analytics @venturebeat > > > @wesleyyuhn1 > > > > > > http://t.co/85P6vEJg08 #MarTech #automation > > > > > > http://t.co/rWqzNNt1vW,en,wesleyyuhn1,,1531,1927,Mon Jul 21 > > 12:35:12 > > > > > +0000 > > > > > > 2014,654395243975065600) > > > > > > (Best Cloud Hosting and CDN services for Web Developers > > > > > > http://t.co/9uf6IaUIlM #cdn #cloudcomputing #cloudhosting > > > #webmasters > > > > > > #websites,en,DoThisBest,,816,1092,Mon Nov 26 18:34:20 +0000 > > > > > > 2012,654395246025904128) > > > > > > grunt> > > > > > > > > > > > > > > > > > > Could you please help me understand why 6 records are eliminated > > > while > > > > > > doing a group by? > > > > > > > > > > > > Thanks, > > > > > > Joel > > > > > > > > > > > > > > > > > > > > >
