Re: File Permissions on s3 FileSystem

Marcos Ortiz Tue, 23 Oct 2012 11:08:58 -0700

El 23/10/12 13:32, Parth Savani escribió:

Hello Everyone,
        I am trying to run a hadoop job with s3n as my filesystem.
I changed the following properties in my hdfs-site.xml


fs.default.name <http://fs.default.name>=s3n://KEY:VALUE@bucket/

A good practice to this is to use these two properties in thecore-site.xml, if you will use S3 often:

<property>
    <name>fs.s3.awsAccessKeyId</name>
    <value>AWS_ACCESS_KEY_ID</value>
</property>

<property>
    <name>fs.s3.awsSecretAccessKey</name>
    <value>AWS_SECRET_ACCESS_KEY</value>
</property>

After that, you can access to your URI with a more friendly way:
S3:
 s3://<s3-bucket>/<s3-filepath>

S3n:
 s3n://<s3-bucket>/<s3-filepath>

mapreduce.jobtracker.staging.root.dir=s3n://KEY:VALUE@bucket/tmp

When i run the job from ec2, I get the following error
The ownership on the staging directorys3n://KEY:VALUE@bucket/tmp/ec2-user/.staging is not as expected. It isowned by The directory must be owned by the submitter ec2-user or byec2-useratorg.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:113)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)atorg.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:481)
I am using cloudera CDH4 hadoop distribution. The error is thrown fromJobSubmissionFiles.java class
 public static Path getStagingDir(JobClient client, Configuration conf)
  throws IOException, InterruptedException {
    Path stagingArea = client.getStagingAreaDir();
    FileSystem fs = stagingArea.getFileSystem(conf);
    String realUser;
    String currentUser;
    UserGroupInformation ugi = UserGroupInformation.getLoginUser();
    realUser = ugi.getShortUserName();
currentUser =UserGroupInformation.getCurrentUser().getShortUserName();
    if (fs.exists(stagingArea)) {
      FileStatus fsStatus = fs.getFileStatus(stagingArea);
      String owner = fsStatus.*getOwner();*
      if (!(owner.equals(currentUser) || owner.equals(realUser))) {
throw new IOException("*The ownership on the stagingdirectory " +*
*                      stagingArea + " is not as expected. " + *
* "It is owned by " + owner + ". The directorymust " +** "be owned by the submitter " + currentUser + "or " +*
*                      "by " + realUser*);
      }
      if (!fsStatus.getPermission().equals(JOB_DIR_PERMISSION)) {
LOG.info("Permissions on staging directory " + stagingArea + "are " +"incorrect: " + fsStatus.getPermission() + ". Fixingpermissions " +
          "to correct value " + JOB_DIR_PERMISSION);
        fs.setPermission(stagingArea, JOB_DIR_PERMISSION);
      }
    } else {
      fs.mkdirs(stagingArea,
          new FsPermission(JOB_DIR_PERMISSION));
    }
    return stagingArea;
  }
I think my job calls getOwner() which returns NULL since s3 does nothave file permissions which results in the IO exception that i amgetting.

Which what user are you launching the job in EC2?

Any workaround for this? Any idea how i could you s3 as the filesystemwith hadoop on distributed mode?


Look here:
http://wiki.apache.org/hadoop/AmazonS3



10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: File Permissions on s3 FileSystem

Reply via email to