Re: PigServer not connecting to HDFS?

Jeff Zhang Wed, 27 Oct 2010 18:08:36 -0700

You need to put hadoop conf file under class path, otherwise you will
always connect local file system




On Wed, Oct 27, 2010 at 4:25 PM, Zach Bailey <zach.bai...@dataclip.com> wrote:
>
>                        Hi all,Facing a weird problem and wondering if anyone 
> has run into this before. I've been playing with PigServer to 
> programmatically run some simple pig scripts and it does not seem to be 
> connecting to HDFS when I pass in ExecType.MAPREDUCE.I am running in 
> pseudo-distributed mode and have the tasktracker and namenode both running on 
> default ports. When I run scripts by using "pig script.pig" or from the grunt 
> console it connects to hdfs and works fine.Do I need to specify some 
> additional properties in the PigServer constructor, or construct a custom 
> PigContext? I had assumed that by passing ExecType.MAPREDUCE and using the 
> defaults, everything would be fine.Would really appreciate any insight or 
> anecdotes of others using PigServer and how they have it set up. Thanks a 
> bunch!-ZachHere is the code I'm using:PigServer pigServer = new 
> PigServer("mapreduce");pigServer.setBatchOn();pigServer.registerScript("/Users/zach/Desktop/test.pig");List<ExecJob>
>  jobs = pigServer.executeBat
>  ch();and
>  here is the log output:0    [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine  - Connecting 
> to hadoop file system at: file:///622  [main] INFO  
> org.apache.pig.impl.logicalLayer.optimizer.PruneColumns  - No column pruned 
> for pages622  [main] INFO  
> org.apache.pig.impl.logicalLayer.optimizer.PruneColumns  - No map keys pruned 
> for pages659  [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics  - 
> Initializing JVM Metrics with processName=JobTracker, sessionId=751  [main] 
> INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine  - 
> (Name: Store(file:///output:PigStorage) - 1-70 Operator Key: 1-70)789  [main] 
> INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>   - MR plan size before optimization: 1790  [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>   - MR plan size after optimization: 1815  [main] INFO  
> org.apache.hadoop.metrics.jvm.JvmMetri
>  cs  - C
> annot initialize JVM Metrics with processName=JobTracker, sessionId= - 
> already initialized822  [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics 
>  - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - 
> already initialized822  [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>   - mapred.job.reduce.markreset.buffer.percent is not set, set to default 
> 0.32534 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>   - Setting up single store job2582 [main] INFO  
> org.apache.hadoop.metrics.jvm.JvmMetrics  - Cannot initialize JVM Metrics 
> with processName=JobTracker, sessionId= - already initialized2582 [main] INFO 
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>   - 1 map-reduce job(s) waiting for submission.2590 [Thread-4] WARN  
> org.apache.hadoop.mapred.JobClient  - Use GenericOptionsParser for parsing 
> the arguments. Applications should imp
>  lement T
> ool for the same.2746 [Thread-4] INFO  
> org.apache.hadoop.metrics.jvm.JvmMetrics  - Cannot initialize JVM Metrics 
> with processName=JobTracker, sessionId= - already initialized2765 [Thread-4] 
> INFO  org.apache.hadoop.metrics.jvm.JvmMetrics  - Cannot initialize JVM 
> Metrics with processName=JobTracker, sessionId= - already initialized3083 
> [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>   - 0% complete3084 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>   - 100% complete3084 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>   - 1 map reduce job(s) failed!3085 [main] WARN  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher  - 
> There is no log file to write to.3085 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher  - 
> Backend error message during job 
> submissionorg.apache.pig.backend.executionengine
>  .ExecExc
> eption: ERROR 2118: Unable to create input splits for: file:///input    at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:269)
>        at 
> org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)        
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)   
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)     at 
> org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378) at 
> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
>    at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279) 
>      at java.lang.Thread.run(Thread.java:637)Caused by: 
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does 
> not exist: file:/input  at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
>    at 
> org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:55)
>     at org.apac
>  he.hadoo
> p.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)     
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:258)
>        ... 7 more3092 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>   - Failed to produce result in: "file:///output"3092 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>   - Failed!



-- 
Best Regards

Jeff Zhang

Re: PigServer not connecting to HDFS?

Reply via email to