Re: problem using s3 instead of hdfs

Hemanth Yamijala Tue, 16 Oct 2012 08:11:13 -0700

Parth,

I notice in the below stack trace that the LocalJobRunner, instead of the
JobTracker is being used. Are you sure this is a distributed cluster ?
Could you please check the value of mapred.job.tracker ?


Thanks
Hemanth

On Tue, Oct 16, 2012 at 8:02 PM, Parth Savani <pa...@sensenetworks.com>wrote:

> Hello Hemanth,
>         I set the hadoop staging directory to s3 location. However, it
> complains. Below is the error
>
> 12/10/16 10:22:47 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> processName=JobTracker, sessionId=
> Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS:
> s3n://ABCD:ABCD@ABCD/tmp/mapred/staging/psavani1821193643/.staging,
> expected: file:///
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:410)
>  at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:322)
> at
> org.apache.hadoop.fs.FilterFileSystem.makeQualified(FilterFileSystem.java:79)
>  at
> org.apache.hadoop.mapred.LocalJobRunner.getStagingAreaDir(LocalJobRunner.java:541)
> at
> org.apache.hadoop.mapred.JobClient.getStagingAreaDir(JobClient.java:1204)
>  at
> org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:102)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:839)
>  at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
> at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>  at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:476)
>  at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:506)
> at
> com.sensenetworks.macrosensedata.ParseLogsMacrosense.run(ParseLogsMacrosense.java:54)
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>  at
> com.sensenetworks.macrosensedata.ParseLogsMacrosense.main(ParseLogsMacrosense.java:121)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>  at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
>
>
> On Tue, Oct 16, 2012 at 3:11 AM, Hemanth Yamijala <
> yhema...@thoughtworks.com> wrote:
>
>> Hi,
>>
>> I've not tried this on S3. However, the directory mentioned in the
>> exception is based on the value of this particular configuration
>> key: mapreduce.jobtracker.staging.root.dir. This defaults
>> to ${hadoop.tmp.dir}/mapred/staging. Can you please set this to an S3
>> location and try ?
>>
>> Thanks
>> Hemanth
>>
>>
>> On Mon, Oct 15, 2012 at 10:43 PM, Parth Savani 
>> <pa...@sensenetworks.com>wrote:
>>
>>> Hello,
>>>       I am trying to run hadoop on s3 using distributed mode. However I
>>> am having issues running my job successfully on it. I get the following
>>> error
>>> I followed the instructions provided in this article ->
>>> http://wiki.apache.org/hadoop/AmazonS3
>>> I replaced the fs.default.name value in my hdfs-site.xml to
>>> s3n://ID:SECRET@BUCKET
>>> And I am running my job using the following: hadoop jar
>>> /path/to/my/jar/abcd.jar /input /output
>>> Where */input* is the folder name inside the s3 bucket
>>> (s3n://ID:SECRET@BUCKET/input)
>>> and */output *folder should created in my bucket (s3n://ID:SECRET@BUCKET
>>> /output)
>>> Below is the error i get. It is looking for job.jar on s3 and that path
>>> is on my server from where i am launching my job.
>>>
>>> java.io.FileNotFoundException: No such file or directory
>>> '/opt/data/hadoop/hadoop-mapred/mapred/staging/psavani/.staging/job_201207021606_1036/job.jar'
>>> at
>>> org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:412)
>>>  at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:207)
>>> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:157)
>>>  at
>>> org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1371)
>>> at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1352)
>>>  at
>>> org.apache.hadoop.mapred.JobLocalizer.localizeJobJarFile(JobLocalizer.java:273)
>>> at
>>> org.apache.hadoop.mapred.JobLocalizer.localizeJobFiles(JobLocalizer.java:381)
>>>  at
>>> org.apache.hadoop.mapred.JobLocalizer.localizeJobFiles(JobLocalizer.java:371)
>>> at
>>> org.apache.hadoop.mapred.DefaultTaskController.initializeJob(DefaultTaskController.java:222)
>>>  at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1372)
>>> at java.security.AccessController.doPri
>>>
>>>
>>>
>>>
>>>
>>
>

Re: problem using s3 instead of hdfs

Reply via email to