Hi Lei,

It seems there is something wrong with creating a sampler. The ORDER
 command is not trivial, it works by creating a sampler. I guess something
went wrong with it:
Input path
does not exist:
file:/home/dliu/ApacheLogAnalysisWithPig/pigsample_259943398_1365820592017
I suppose pigsample is not a name that you used in your script, so maybe
Pig failed to create a sample file. Try to run the job on HDFS, we'll see
what happens. I see that you are using the local filesystem: file:/....

Best Regards


On Sat, Apr 13, 2013 at 1:56 PM, Lei Liu <mr.good...@gmail.com> wrote:

> I am sure it's not that. The ORDER command fails the whole thing. If I
> remove the ORDER command, the same script runs just fine except the result
> is not in order.
>
>
> On Sat, Apr 13, 2013 at 4:54 PM, Prasanth J <buckeye.prasa...@gmail.com
> >wrote:
>
> > From the error logs, it seems like input file doesn't exist or not
> > accessible.
> >
> > > Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
> > > Input path does not exist:
> > >
> >
> file:/home/dliu/ApacheLogAnalysisWithPig/pigsample_259943398_1365820592017
> >
> > can you please check if the input path in $LOGS is proper?
> >
> > Thanks
> > -- Prasanth
> >
> > On Apr 12, 2013, at 11:02 PM, Lei Liu <mr.good...@gmail.com> wrote:
> >
> > > Hi, I am using Pig to analyze the percentage of each UserAgents from an
> > > apache log. The following program failed because of ORDER command at
> the
> > > very last (the result variable is correct and can be dumped out
> > correctly).
> > > I am relative new to Pig and could not figure it out so need you guys
> to
> > > help. Following is the program and error message. Thanks!
> > >
> > > logs = LOAD '$LOGS' USING ApacheCombinedLogLoader AS (remoteHost,
> hyphen,
> > > user, time, method, uri, protocol, statusCode, responseSize, referer,
> > > userAgent);
> > >
> > > uarows = FOREACH logs GENERATE userAgent;
> > > total = FOREACH (GROUP uarows ALL) GENERATE COUNT(uarows) as count;
> > > dump total;
> > >
> > > gpuarows = GROUP uarows BY userAgent;
> > > result = FOREACH gpuarows {
> > >       subtotal = COUNT(uarows);
> > >       GENERATE flatten(group) as ua, subtotal AS SUB_TOTAL,
> > > 100*(double)subtotal/(double)total.count AS percentage;
> > >       };
> > > orderresult = ORDER result BY SUB_TOTAL DESC;
> > > dump orderresult;
> > >
> > > -- what's weird is that 'dump result' works just fine, so it's the
> ORDER
> > > line makes trouble
> > >
> > > Errors:
> > > 2013-04-13 10:36:32,409 [Thread-48] INFO
> >  org.apache.hadoop.mapred.MapTask
> > > - record buffer = 262144/327680
> > > 2013-04-13 10:36:32,437 [Thread-48] WARN
> > > org.apache.hadoop.mapred.LocalJobRunner - job_local_0005
> > > java.lang.RuntimeException:
> > > org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
> > > does not exist:
> > >
> >
> file:/home/dliu/ApacheLogAnalysisWithPig/pigsample_259943398_1365820592017
> > >    at
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:157)
> > >    at
> > > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
> > >    at
> > >
> >
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> > >    at
> > >
> >
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:677)
> > >    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
> > >    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> > >    at
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> > > Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
> > > Input path does not exist:
> > >
> >
> file:/home/dliu/ApacheLogAnalysisWithPig/pigsample_259943398_1365820592017
> > >    at
> > >
> >
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
> > >    at
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)
> > >    at
> > >
> >
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
> > >    at
> > org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:177)
> > >    at
> > > org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:124)
> > >    at
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:131)
> > >    ... 6 more
> > > 2013-04-13 10:36:32,525 [main] INFO
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > - HadoopJobId: job_local_0005
> > > 2013-04-13 10:36:32,526 [main] INFO
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > - Processing aliases orderresult
> > > 2013-04-13 10:36:32,526 [main] INFO
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > - detailed locations: M: orderresult[19,14] C:  R:
> > > 2013-04-13 10:36:37,536 [main] WARN
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig
> to
> > > stop immediately on failure.
> > > 2013-04-13 10:36:37,536 [main] INFO
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > - job job_local_0005 has failed! Stop running all dependent jobs
> > > 2013-04-13 10:36:37,536 [main] INFO
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > - 100% complete
> > > 2013-04-13 10:36:37,537 [main] ERROR
> > > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s)
> failed!
> > > 2013-04-13 10:36:37,538 [main] INFO
> > > org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
> > >
> > > HadoopVersion    PigVersion    UserId    StartedAt    FinishedAt
> >  Features
> > > 1.0.4    0.11.0    dliu    2013-04-13 10:35:50    2013-04-13 10:36:37
> > > GROUP_BY,ORDER_BY
> > >
> > > Some jobs have failed! Stop running all dependent jobs
> > >
> > > Job Stats (time in seconds):
> > > JobId    Maps    Reduces    MaxMapTime    MinMapTIme    AvgMapTime
> > > MedianMapTime    MaxReduceTime    MinReduceTime    AvgReduceTime
> > > MedianReducetime    Alias    Feature    Outputs
> > > job_local_0002    1    1    n/a    n/a    n/a    n/a    n/a    n/a
> > > 1-18,logs,total,uarows    MULTI_QUERY,COMBINER
> > > job_local_0003    1    1    n/a    n/a    n/a    n/a    n/a    n/a
> > > gpuarows,result    GROUP_BY,COMBINER
> > > job_local_0004    1    1    n/a    n/a    n/a    n/a    n/a    n/a
> > > orderresult    SAMPLER
> > >
> > > Failed Jobs:
> > > JobId    Alias    Feature    Message    Outputs
> > > job_local_0005    orderresult    ORDER_BY    Message: Job failed!
> Error -
> > > NA    file:/tmp/temp-1225021115/tmp-62411972,
> > >
> > > Input(s):
> > > Successfully read 0 records from:
> > > "file:///home/dliu/ApacheLogAnalysisWithPig/access.log"
> > >
> > > Output(s):
> > > Failed to produce result in "file:/tmp/temp-1225021115/tmp-62411972"
> > >
> > > Counters:
> > > Total records written : 0
> > > Total bytes written : 0
> > > Spillable Memory Manager spill count : 0
> > > Total bags proactively spilled: 0
> > > Total records proactively spilled: 0
> > >
> > > Job DAG:
> > > job_local_0002    ->    job_local_0003,
> > > job_local_0003    ->    job_local_0004,
> > > job_local_0004    ->    job_local_0005,
> > > job_local_0005
> > >
> > >
> > > 2013-04-13 10:36:37,539 [main] INFO
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > - Some jobs have failed! Stop running all dependent jobs
> > > 2013-04-13 10:36:37,541 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> > > ERROR 1066: Unable to open iterator for alias orderresult
> > > Details at logfile:
> > > /home/dliu/ApacheLogAnalysisWithPig/pig_1365820535568.log
> >
> >
>

Reply via email to