Hey Parth,

I don't think its possible to run MR by basing the FS over S3
completely. You can use S3 for I/O for your files, but your
fs.default.name (or fs.defaultFS) must be either file:/// or hdfs://
filesystems. This way, your MR framework can run/distribute its files
well, and also still be able to process S3 URLs passed as input or
output locations.

On Tue, Oct 23, 2012 at 11:02 PM, Parth Savani <pa...@sensenetworks.com> wrote:
> Hello Everyone,
>         I am trying to run a hadoop job with s3n as my filesystem.
> I changed the following properties in my hdfs-site.xml
>
> fs.default.name=s3n://KEY:VALUE@bucket/
> mapreduce.jobtracker.staging.root.dir=s3n://KEY:VALUE@bucket/tmp
>
> When i run the job from ec2, I get the following error
>
> The ownership on the staging directory
> s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging is not as expected. It is owned
> by   The directory must be owned by the submitter ec2-user or by ec2-user
> at
> org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:113)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:481)
>
> I am using cloudera CDH4 hadoop distribution. The error is thrown from
> JobSubmissionFiles.java class
>  public static Path getStagingDir(JobClient client, Configuration conf)
>   throws IOException, InterruptedException {
>     Path stagingArea = client.getStagingAreaDir();
>     FileSystem fs = stagingArea.getFileSystem(conf);
>     String realUser;
>     String currentUser;
>     UserGroupInformation ugi = UserGroupInformation.getLoginUser();
>     realUser = ugi.getShortUserName();
>     currentUser = UserGroupInformation.getCurrentUser().getShortUserName();
>     if (fs.exists(stagingArea)) {
>       FileStatus fsStatus = fs.getFileStatus(stagingArea);
>       String owner = fsStatus.getOwner();
>       if (!(owner.equals(currentUser) || owner.equals(realUser))) {
>          throw new IOException("The ownership on the staging directory " +
>                       stagingArea + " is not as expected. " +
>                       "It is owned by " + owner + ". The directory must " +
>                       "be owned by the submitter " + currentUser + " or " +
>                       "by " + realUser);
>       }
>       if (!fsStatus.getPermission().equals(JOB_DIR_PERMISSION)) {
>         LOG.info("Permissions on staging directory " + stagingArea + " are "
> +
>           "incorrect: " + fsStatus.getPermission() + ". Fixing permissions "
> +
>           "to correct value " + JOB_DIR_PERMISSION);
>         fs.setPermission(stagingArea, JOB_DIR_PERMISSION);
>       }
>     } else {
>       fs.mkdirs(stagingArea,
>           new FsPermission(JOB_DIR_PERMISSION));
>     }
>     return stagingArea;
>   }
>
>
>
> I think my job calls getOwner() which returns NULL since s3 does not have
> file permissions which results in the IO exception that i am getting.
>
> Any workaround for this? Any idea how i could you s3 as the filesystem with
> hadoop on distributed mode?



-- 
Harsh J

Reply via email to