Hi there,
Hadoop and Pig are new to me. When using them recently, I met a problem and
don’t know why. Maybe it’s easy for some one. Can anybody solve this? Thanks a
lot!
It’s about MAPREDUCE. Here is my .pig in short:
register ../biopig/target/biopig-job.jar;
%default reads 'test.fas';
A = load '$reads' using gov.jgi.meta.pig.storage.FastaStorage as (id:
chararray, d: int, seq: byte array);
…
blabla…
…
LG = foreach LG generate group.id1, group.id2;
GAP = mapreduce 'GPartition.jar' STORE A into 'input' LOAD 'output' as (id:char
array, read:chararray);
dump GAP;
Error message:
Pig Stack Trace
---------------
ERROR 1200: null
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during
parsing. null
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1725)
at org.apache.pig.PigServer$Graph.access$000(PigServer.java:1420)
at org.apache.pig.PigServer.parseAndBuild(PigServer.java:364)
at org.apache.pig.PigServer.executeBatch(PigServer.java:389)
at org.apache.pig.PigServer.executeBatch(PigServer.java:375)
at
org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:170)
at
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:747)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:228)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:203)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
at org.apache.pig.Main.run(Main.java:608)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: Failed to parse: null
at
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:198)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1712)
... 18 more
Caused by: java.lang.NullPointerException
at
org.apache.pig.parser.LogicalPlanBuilder.unquote(LogicalPlanBuilder.java:1329)
at
org.apache.pig.parser.LogicalPlanGenerator.mr_clause(LogicalPlanGenerator.java:18238)
at
org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1911)
at
org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102)
at
org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560)
at
org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421)
at
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:188)
... 19 more
================================================================================
As little useful message provided, I cannot figure out what caused this. Before
the “GAP” line, it works fine. If you add “DUMP LG” before the “GAP” line, you
can get proper results. So I think the “GAP" line causes the error.
For the GPartition.jar, I tested it using “$ hadoop jar Partition.jar” and it
works well. It read from the file “input” and store results to file “output”.
But, in fact it’s not a real mapreduce task, no mapper class nor reducer class
defined in it. It’s a serial program working with HDFS. Will this be a problem?
Or, just some syntax errors in my pig file?
BTW, I use Hadoop-2.7.1 and pig-0.15.0. Partition.jar under the same directory
of .pig file.