It's a Hadoop error, and a transient one at that (usually). Hadoop tends to keep chugging when this happens, unless it keeps on happening.
On Mon, Jan 9, 2012 at 4:31 PM, Daniel Dai <[email protected]> wrote: > This is more like a hadoop issue. Check the dfs UI to see if data nodes are > up. > > On Mon, Jan 9, 2012 at 4:18 PM, Michael Lok <[email protected]> wrote: > >> Hi folks, >> >> Not sure if this is related to Pig or Hadoop in general; but I'm >> posting this here since I'm running Pig scripts :) >> >> Anyway, I've been trying to perform a CROSS join between 2 files which >> results in ~1 billion records. My Hadoop cluster has 4 data nodes. >> The namenode also serves as one of the data nodes as well (not >> recommended, but haven't had time to reconfigure this yet :P). After >> executing the Pig script, it threw the following exception at around >> 80+%: >> >> java.io.IOException: org.apache.hadoop.ipc.RemoteException: >> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not >> replicated yet:/user/root/out/_tempora >> ry/_attempt_201201091651_0001_r_000001_3/part-r-00001 >> at >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1517) >> at >> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:685) >> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563) >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388) >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:396) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) >> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382) >> at >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:464) >> at >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427) >> at >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:399) >> at >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261) >> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176) >> at >> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649) >> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417) >> at org.apache.hadoop.mapred.Child$4.run(Child.java:255) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:396) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) >> at org.apache.hadoop.mapred.Child.main(Child.java:249) >> >> Pig script shown below: >> >> ============================================================ >> set job.name 'vac cross 2'; >> set default_parallel 10; >> >> register lib/*.jar; >> >> define DIST com.pig.udf.Distance(); >> >> js = load 'js.csv' using PigStorage(',') as (ic:chararray, >> jsstate:chararray); >> >> vac = load 'vac.csv' using PigStorage(',') as (id:chararray, >> vacstate:chararray); >> >> cx = cross js, vac; >> >> d = foreach cx generate ic, jsstate, id, vacstate, DIST(jsstate, vacstate); >> >> store d into 'out' using PigStorage(','); >> ============================================================ >> >> Any help is greatly appreciated. >> >> >> Thanks! >>
