You need to put hadoop conf file under class path, otherwise you will always connect local file system
On Wed, Oct 27, 2010 at 4:25 PM, Zach Bailey <zach.bai...@dataclip.com> wrote: > > Hi all,Facing a weird problem and wondering if anyone > has run into this before. I've been playing with PigServer to > programmatically run some simple pig scripts and it does not seem to be > connecting to HDFS when I pass in ExecType.MAPREDUCE.I am running in > pseudo-distributed mode and have the tasktracker and namenode both running on > default ports. When I run scripts by using "pig script.pig" or from the grunt > console it connects to hdfs and works fine.Do I need to specify some > additional properties in the PigServer constructor, or construct a custom > PigContext? I had assumed that by passing ExecType.MAPREDUCE and using the > defaults, everything would be fine.Would really appreciate any insight or > anecdotes of others using PigServer and how they have it set up. Thanks a > bunch!-ZachHere is the code I'm using:PigServer pigServer = new > PigServer("mapreduce");pigServer.setBatchOn();pigServer.registerScript("/Users/zach/Desktop/test.pig");List<ExecJob> > jobs = pigServer.executeBat > ch();and > here is the log output:0 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to hadoop file system at: file:///622 [main] INFO > org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned > for pages622 [main] INFO > org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned > for pages659 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - > Initializing JVM Metrics with processName=JobTracker, sessionId=751 [main] > INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - > (Name: Store(file:///output:PigStorage) - 1-70 Operator Key: 1-70)789 [main] > INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size before optimization: 1790 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size after optimization: 1815 [main] INFO > org.apache.hadoop.metrics.jvm.JvmMetri > cs - C > annot initialize JVM Metrics with processName=JobTracker, sessionId= - > already initialized822 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics > - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - > already initialized822 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > - mapred.job.reduce.markreset.buffer.percent is not set, set to default > 0.32534 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > - Setting up single store job2582 [main] INFO > org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics > with processName=JobTracker, sessionId= - already initialized2582 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 1 map-reduce job(s) waiting for submission.2590 [Thread-4] WARN > org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing > the arguments. Applications should imp > lement T > ool for the same.2746 [Thread-4] INFO > org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics > with processName=JobTracker, sessionId= - already initialized2765 [Thread-4] > INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM > Metrics with processName=JobTracker, sessionId= - already initialized3083 > [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 0% complete3084 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 100% complete3084 [main] ERROR > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 1 map reduce job(s) failed!3085 [main] WARN > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - > There is no log file to write to.3085 [main] ERROR > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - > Backend error message during job > submissionorg.apache.pig.backend.executionengine > .ExecExc > eption: ERROR 2118: Unable to create input splits for: file:///input at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:269) > at > org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885) > at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) at > org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378) at > org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247) > at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279) > at java.lang.Thread.run(Thread.java:637)Caused by: > org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does > not exist: file:/input at > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224) > at > org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:55) > at org.apac > he.hadoo > p.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:258) > ... 7 more3092 [main] ERROR > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - Failed to produce result in: "file:///output"3092 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - Failed! -- Best Regards Jeff Zhang