Sounds like a bug in the S3 implementation of FileSystem? Does this happen with pig 0.10 or 0.11?
On Mon, Mar 11, 2013 at 12:11 AM, Yang <teddyyyy...@gmail.com> wrote: > the following code gave null pointer exception > > > --------------------------------------------------------------------------------------- > > rbl_raw = load 's3://mybucket/rbl-logs/{2013/03/06,2013/03/05}' AS > (line:chararray); > > rbl = FOREACH rbl_raw GENERATE FLATTEN(loadrbl(line)) AS (x:chararray, > y:chararray); > > seo_rbl = FILTER rbl BY x IS NOT NULL AND y == 'seo_google'; > > rbl1 = GROUP seo_rbl BY x; > > STORE rbl1 INTO '/user/hadoop/blah' > > > ------------------------------------------------------------------------------- > > > > > Pig Stack Trace > --------------- > ERROR 2017: Internal error creating job configuration. > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: > ERROR 2017: Internal error creating job configuration. > at > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:750) > at > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:267) > at > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:151) > at org.apache.pig.PigServer.launchPlan(PigServer.java:1313) > at > org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1298) > at org.apache.pig.PigServer.execute(PigServer.java:1288) > at org.apache.pig.PigServer.executeBatch(PigServer.java:360) > at > org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:132) > at > > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193) > at > > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) > at org.apache.pig.Main.run(Main.java:568) > at org.apache.pig.Main.main(Main.java:114) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:187) > Caused by: java.lang.NullPointerException > at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:994) > at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:967) > at > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getTotalInputFileSize(JobControlCompiler.java:798) > at > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.estimateNumberOfReducers(JobControlCompiler.java:773) > at > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:611) > ... 17 more > > ================================================================================ > > > > version of pig is 0.9.2: > hadoop@ip-10-147-131-60:/mnt/run$ pig -version > > > Apache Pig version 0.9.2-amzn (rexported) > > > > > > > the weird thing is that if I take out the GROUP BY, it works fine; if I > take out the glob in the initial LOAD statement, and just load one dir, it > works fine; also if I load both dirs with the glob, then store the loaded > result after the loadrbl() UDF, then store the result in a intermediate > dir; then load the intermediate result and continue all the original > computation all the way to GROUP BY, it works fine too. > > > so why does the GROUP BY have a problem with the glob above? while they > are far apart and the intermediate steps all worked fine? > > > thanks > Yang >