Hello Haitao, Each time we run a MapReduce job, the job expects the output to be non-existent. If the output path is already there then FileAlreadyExists exception is thrown. And as we know that each Pig job is eventually a MapReduce job, it also expects the same.
Regards, Mohammad Tariq On Fri, Aug 10, 2012 at 11:18 PM, Alan Gates <ga...@hortonworks.com> wrote: > Usually that means the the directory you are trying to store to already > exists. Pig won't overwrite existing data. You should either move or remove > the directory or change the directory name in your store function. > > Alan. > > On Aug 9, 2012, at 7:42 PM, Haitao Yao wrote: > >> hi, all >> I got this while running pig script: >> >> 997: Unable to recreate exception from backend error: >> org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory >> hdfs://DC-hadoop01:9000/tmp/pig-temp/temp548500412/tmp-1456742965 already >> exists >> at >> org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:137) >> at >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecsHelper(PigOutputFormat.java:207) >> at >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:188) >> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:893) >> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:415) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136) >> at >> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:856) >> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:830) >> at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378) >> at >> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247) >> at >> org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279) >> at java.lang.Thread.run(Thread.java:722) >> >> >> But I checked the script , the directory: >> hdfs://DC-hadoop01:9000/tmp/pig-temp/temp548500412/tmp-1456742965 is not >> used by the script explicitly, so I think it is used by the pig to store tmp >> results. >> But why it exists? Isn't it unique? >> >> >> >> >> >> >> >> >> Haitao Yao >> yao.e...@gmail.com >> weibo: @haitao_yao >> Skype: haitao.yao.final >> >