Hi Lei, It seems there is something wrong with creating a sampler. The ORDER command is not trivial, it works by creating a sampler. I guess something went wrong with it: Input path does not exist: file:/home/dliu/ApacheLogAnalysisWithPig/pigsample_259943398_1365820592017 I suppose pigsample is not a name that you used in your script, so maybe Pig failed to create a sample file. Try to run the job on HDFS, we'll see what happens. I see that you are using the local filesystem: file:/....
Best Regards On Sat, Apr 13, 2013 at 1:56 PM, Lei Liu <mr.good...@gmail.com> wrote: > I am sure it's not that. The ORDER command fails the whole thing. If I > remove the ORDER command, the same script runs just fine except the result > is not in order. > > > On Sat, Apr 13, 2013 at 4:54 PM, Prasanth J <buckeye.prasa...@gmail.com > >wrote: > > > From the error logs, it seems like input file doesn't exist or not > > accessible. > > > > > Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: > > > Input path does not exist: > > > > > > file:/home/dliu/ApacheLogAnalysisWithPig/pigsample_259943398_1365820592017 > > > > can you please check if the input path in $LOGS is proper? > > > > Thanks > > -- Prasanth > > > > On Apr 12, 2013, at 11:02 PM, Lei Liu <mr.good...@gmail.com> wrote: > > > > > Hi, I am using Pig to analyze the percentage of each UserAgents from an > > > apache log. The following program failed because of ORDER command at > the > > > very last (the result variable is correct and can be dumped out > > correctly). > > > I am relative new to Pig and could not figure it out so need you guys > to > > > help. Following is the program and error message. Thanks! > > > > > > logs = LOAD '$LOGS' USING ApacheCombinedLogLoader AS (remoteHost, > hyphen, > > > user, time, method, uri, protocol, statusCode, responseSize, referer, > > > userAgent); > > > > > > uarows = FOREACH logs GENERATE userAgent; > > > total = FOREACH (GROUP uarows ALL) GENERATE COUNT(uarows) as count; > > > dump total; > > > > > > gpuarows = GROUP uarows BY userAgent; > > > result = FOREACH gpuarows { > > > subtotal = COUNT(uarows); > > > GENERATE flatten(group) as ua, subtotal AS SUB_TOTAL, > > > 100*(double)subtotal/(double)total.count AS percentage; > > > }; > > > orderresult = ORDER result BY SUB_TOTAL DESC; > > > dump orderresult; > > > > > > -- what's weird is that 'dump result' works just fine, so it's the > ORDER > > > line makes trouble > > > > > > Errors: > > > 2013-04-13 10:36:32,409 [Thread-48] INFO > > org.apache.hadoop.mapred.MapTask > > > - record buffer = 262144/327680 > > > 2013-04-13 10:36:32,437 [Thread-48] WARN > > > org.apache.hadoop.mapred.LocalJobRunner - job_local_0005 > > > java.lang.RuntimeException: > > > org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path > > > does not exist: > > > > > > file:/home/dliu/ApacheLogAnalysisWithPig/pigsample_259943398_1365820592017 > > > at > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:157) > > > at > > > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) > > > at > > > > > > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) > > > at > > > > > > org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:677) > > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756) > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > > > at > > > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > > > Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: > > > Input path does not exist: > > > > > > file:/home/dliu/ApacheLogAnalysisWithPig/pigsample_259943398_1365820592017 > > > at > > > > > > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235) > > > at > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37) > > > at > > > > > > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252) > > > at > > org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:177) > > > at > > > org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:124) > > > at > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:131) > > > ... 6 more > > > 2013-04-13 10:36:32,525 [main] INFO > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > > - HadoopJobId: job_local_0005 > > > 2013-04-13 10:36:32,526 [main] INFO > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > > - Processing aliases orderresult > > > 2013-04-13 10:36:32,526 [main] INFO > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > > - detailed locations: M: orderresult[19,14] C: R: > > > 2013-04-13 10:36:37,536 [main] WARN > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > > - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig > to > > > stop immediately on failure. > > > 2013-04-13 10:36:37,536 [main] INFO > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > > - job job_local_0005 has failed! Stop running all dependent jobs > > > 2013-04-13 10:36:37,536 [main] INFO > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > > - 100% complete > > > 2013-04-13 10:36:37,537 [main] ERROR > > > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) > failed! > > > 2013-04-13 10:36:37,538 [main] INFO > > > org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: > > > > > > HadoopVersion PigVersion UserId StartedAt FinishedAt > > Features > > > 1.0.4 0.11.0 dliu 2013-04-13 10:35:50 2013-04-13 10:36:37 > > > GROUP_BY,ORDER_BY > > > > > > Some jobs have failed! Stop running all dependent jobs > > > > > > Job Stats (time in seconds): > > > JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime > > > MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime > > > MedianReducetime Alias Feature Outputs > > > job_local_0002 1 1 n/a n/a n/a n/a n/a n/a > > > 1-18,logs,total,uarows MULTI_QUERY,COMBINER > > > job_local_0003 1 1 n/a n/a n/a n/a n/a n/a > > > gpuarows,result GROUP_BY,COMBINER > > > job_local_0004 1 1 n/a n/a n/a n/a n/a n/a > > > orderresult SAMPLER > > > > > > Failed Jobs: > > > JobId Alias Feature Message Outputs > > > job_local_0005 orderresult ORDER_BY Message: Job failed! > Error - > > > NA file:/tmp/temp-1225021115/tmp-62411972, > > > > > > Input(s): > > > Successfully read 0 records from: > > > "file:///home/dliu/ApacheLogAnalysisWithPig/access.log" > > > > > > Output(s): > > > Failed to produce result in "file:/tmp/temp-1225021115/tmp-62411972" > > > > > > Counters: > > > Total records written : 0 > > > Total bytes written : 0 > > > Spillable Memory Manager spill count : 0 > > > Total bags proactively spilled: 0 > > > Total records proactively spilled: 0 > > > > > > Job DAG: > > > job_local_0002 -> job_local_0003, > > > job_local_0003 -> job_local_0004, > > > job_local_0004 -> job_local_0005, > > > job_local_0005 > > > > > > > > > 2013-04-13 10:36:37,539 [main] INFO > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > > - Some jobs have failed! Stop running all dependent jobs > > > 2013-04-13 10:36:37,541 [main] ERROR org.apache.pig.tools.grunt.Grunt - > > > ERROR 1066: Unable to open iterator for alias orderresult > > > Details at logfile: > > > /home/dliu/ApacheLogAnalysisWithPig/pig_1365820535568.log > > > > >