Hi, I'm running Pig 0.10.0 in local mode on some small text files. There is no intention to run it on Hadoop at all. We have a job that runs every 5 minutes and about 3% of the time, the job fails with the error below. It happens at random places within the Pig Script.
2012-10-19 14:15:37,719 [Thread-15] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0004 java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator .processInput(PhysicalOperator.java:286) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperat ors.POProject.getNext(POProject.java:158) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperat ors.POProject.getNext(POProject.java:360) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator .getNext(PhysicalOperator.java:330) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat ors.POForEach.processPlan(POForEach.java:332) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat ors.POForEach.getNext(POForEach.java:284) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator .processInput(PhysicalOperator.java:290) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat ors.POFilter.getNext(POFilter.java:95) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator .processInput(PhysicalOperator.java:290) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat ors.POForEach.getNext(POForEach.java:233) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator .processInput(PhysicalOperator.java:290) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat ors.POLocalRearrange.getNext(POLocalRearrange.java:256) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperat ors.POUnion.getNext(POUnion.java:165) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBa se.runPipeline(PigGenericMapBase.java:271) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBa se.map(PigGenericMapBase.java:266) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBa se.map(PigGenericMapBase.java:64) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) In the Pig Log, I get ERROR 2244: Job failed, hadoop does not return any error message org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job failed, hadoop does not return any error message at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193 ) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165 ) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) at org.apache.pig.Main.run(Main.java:555) at org.apache.pig.Main.main(Main.java:111) ============================================================================ ==== Pig script is attached. Any help gratefully received Thanks Malc
--Load data from input fie indata = LOAD '$input' USING PigStorage(',') AS (utc_ts:chararray, local_ts:chararray, timezone:chararray, region:chararray, hostname:chararray, stat_type:chararray, stat_key:chararray, stat_value:long); /********************************************************************************** * Output: ats_stat * * Description: Generate output file of ATS data to load into the ats_stat_tbl * **********************************************************************************/ ats_total_errors = FILTER indata BY stat_key == 'IntegraStatistics/TotalErrors'; ats_total_txns = FILTER indata BY stat_key == 'IntegraStatistics/TotalTransactions'; ats_resp_time = FILTER indata BY stat_key == 'IntegraStatistics/UWMAResponseTime'; ats_join_data = JOIN ats_total_errors BY (utc_ts,local_ts,timezone,region,hostname), ats_total_txns BY (utc_ts,local_ts,timezone,region,hostname), ats_resp_time BY (utc_ts,local_ts,timezone,region,hostname); ats_out_data = FOREACH ats_join_data GENERATE $0,$1,$2,$3,$4,$7,$15,$23; STORE ats_out_data INTO '$outdir/ats_stat.dat.$uniq_id' USING PigStorage(','); /********************************************************************************** * Output: ldap_stat * * Description: Generate output file of LDAP data to load into the ldap_stat_tbl * **********************************************************************************/ ldap_total_errors = FILTER indata BY stat_key == 'LDAPStatistics/FailedRequests'; ldap_total_txns = FILTER indata BY stat_key == 'LDAPStatistics/TotalRequests'; ldap_resp_time = FILTER indata BY stat_key == 'LDAPStatistics/UWMAResponseTime'; ldap_join_data = JOIN ldap_total_errors BY (utc_ts,local_ts,timezone,region,hostname), ldap_total_txns BY (utc_ts,local_ts,timezone,region,hostname), ldap_resp_time BY (utc_ts,local_ts,timezone,region,hostname); ldap_out_data = FOREACH ldap_join_data GENERATE $0,$1,$2,$3,$4,$7,$15,$23; STORE ldap_out_data INTO '$outdir/ldap_stat.dat.$uniq_id' USING PigStorage(','); /********************************************************************************** * Output: pcrf_stat * * Description: Generate output file of PCRF data to load into the pcrf_stat_tbl * **********************************************************************************/ pcrf_total_errors = FILTER indata BY stat_key == 'PcrfStatistics/TotalErrors'; pcrf_total_txns = FILTER indata BY stat_key == 'PcrfStatistics/TotalRequestsSent'; pcrf_resp_time = FILTER indata BY stat_key == 'PcrfStatistics/UWMAResponseTime'; pcrf_join_data = JOIN pcrf_total_errors BY (utc_ts,local_ts,timezone,region,hostname), pcrf_total_txns BY (utc_ts,local_ts,timezone,region,hostname), pcrf_resp_time BY (utc_ts,local_ts,timezone,region,hostname); pcrf_out_data = FOREACH pcrf_join_data GENERATE $0,$1,$2,$3,$4,$7,$15,$23; STORE pcrf_out_data INTO '$outdir/pcrf_stat.dat.$uniq_id' USING PigStorage(','); /********************************************************************************** * Output: sess_stat * * Description: Generate output file of Session Counts data to load into the * * sess_stat_tbl * **********************************************************************************/ sess_active = FILTER indata BY stat_key == 'SessionStatistics/ActiveSessions'; sess_total = FILTER indata BY stat_key == 'SessionStatistics/TotalSessions'; sess_duration = FILTER indata BY stat_key == 'SessionStatistics/UWMASessionLength'; sess_join_data = JOIN sess_active BY (utc_ts,local_ts,timezone,region,hostname), sess_total BY (utc_ts,local_ts,timezone,region,hostname), sess_duration BY (utc_ts,local_ts,timezone,region,hostname); sess_out_data = FOREACH sess_join_data GENERATE $0,$1,$2,$3,$4,$7,$15,$23; STORE sess_out_data INTO '$outdir/sess_stat.dat.$uniq_id' USING PigStorage(','); /********************************************************************************** * Output: radius_tps * * Description: Generate output file of Radius TPS data to load into the * * radius_tps_tbl * **********************************************************************************/ radius_tps_total_interims = FILTER indata BY stat_key == 'RadiusStatistics/RadiusInterims'; radius_tps_total_starts = FILTER indata BY stat_key == 'RadiusStatistics/RadiusStarts'; radius_tps_total_stops = FILTER indata BY stat_key == 'RadiusStatistics/RadiusStops'; radius_tps_join_data = JOIN radius_tps_total_interims BY (utc_ts,local_ts,timezone,region,hostname), radius_tps_total_starts BY (utc_ts,local_ts,timezone,region,hostname), radius_tps_total_stops BY (utc_ts,local_ts,timezone,region,hostname); radius_tps_out_data = FOREACH radius_tps_join_data GENERATE $0,$1,$2,$3,$4,$7,$15,$23; STORE radius_tps_out_data INTO '$outdir/radius_tps.dat.$uniq_id' USING PigStorage(','); /********************************************************************************** * Output: radius_bcast * * Description: Generate output file of Radius Broadcast data to load into * the radius_bcast_tbl * **********************************************************************************/ radius_bcast_total_errors = FILTER indata BY stat_key == 'RadiusBroadcast/TotalErrors'; radius_bcast_total_txns = FILTER indata BY stat_key == 'RadiusBroadcast/TotalTransactions'; radius_bcast_resp_time = FILTER indata BY stat_key == 'RadiusBroadcast/ResponseTime'; radius_bcast_join_data = JOIN radius_bcast_total_errors BY (utc_ts,local_ts,timezone,region,hostname), radius_bcast_total_txns BY (utc_ts,local_ts,timezone,region,hostname), radius_bcast_resp_time BY (utc_ts,local_ts,timezone,region,hostname); radius_bcast_out_data = FOREACH radius_bcast_join_data GENERATE $0,$1,$2,$3,$4,$7,$15,$23; STORE radius_bcast_out_data INTO '$outdir/radius_bcast.dat.$uniq_id' USING PigStorage(',');