Hello Harsh,
         I am following steps based on this link:
http://wiki.apache.org/hadoop/AmazonS3

When i run the job, I am seeing that the hadoop places all the jars
required for the job on s3. However, when it tries to run the job, it
complains
The ownership on the staging directory
s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging
is not as expected. It is owned by   The directory must be owned by the
submitter ec2-user or by ec2-user

Some people have seemed to solved this problem of permissions here ->
https://issues.apache.org/jira/browse/HDFS-1333
But they have made changes to some hadoop java classes and I wonder if
there's an easy workaround.


On Wed, Oct 24, 2012 at 12:21 AM, Harsh J <ha...@cloudera.com> wrote:

> Hey Parth,
>
> I don't think its possible to run MR by basing the FS over S3
> completely. You can use S3 for I/O for your files, but your
> fs.default.name (or fs.defaultFS) must be either file:/// or hdfs://
> filesystems. This way, your MR framework can run/distribute its files
> well, and also still be able to process S3 URLs passed as input or
> output locations.
>
> On Tue, Oct 23, 2012 at 11:02 PM, Parth Savani <pa...@sensenetworks.com>
> wrote:
> > Hello Everyone,
> >         I am trying to run a hadoop job with s3n as my filesystem.
> > I changed the following properties in my hdfs-site.xml
> >
> > fs.default.name=s3n://KEY:VALUE@bucket/
> > mapreduce.jobtracker.staging.root.dir=s3n://KEY:VALUE@bucket/tmp
> >
> > When i run the job from ec2, I get the following error
> >
> > The ownership on the staging directory
> > s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging is not as expected. It is
> owned
> > by   The directory must be owned by the submitter ec2-user or by ec2-user
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:113)
> > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
> > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
> > at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844)
> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:481)
> >
> > I am using cloudera CDH4 hadoop distribution. The error is thrown from
> > JobSubmissionFiles.java class
> >  public static Path getStagingDir(JobClient client, Configuration conf)
> >   throws IOException, InterruptedException {
> >     Path stagingArea = client.getStagingAreaDir();
> >     FileSystem fs = stagingArea.getFileSystem(conf);
> >     String realUser;
> >     String currentUser;
> >     UserGroupInformation ugi = UserGroupInformation.getLoginUser();
> >     realUser = ugi.getShortUserName();
> >     currentUser =
> UserGroupInformation.getCurrentUser().getShortUserName();
> >     if (fs.exists(stagingArea)) {
> >       FileStatus fsStatus = fs.getFileStatus(stagingArea);
> >       String owner = fsStatus.getOwner();
> >       if (!(owner.equals(currentUser) || owner.equals(realUser))) {
> >          throw new IOException("The ownership on the staging directory "
> +
> >                       stagingArea + " is not as expected. " +
> >                       "It is owned by " + owner + ". The directory must
> " +
> >                       "be owned by the submitter " + currentUser + " or
> " +
> >                       "by " + realUser);
> >       }
> >       if (!fsStatus.getPermission().equals(JOB_DIR_PERMISSION)) {
> >         LOG.info("Permissions on staging directory " + stagingArea + "
> are "
> > +
> >           "incorrect: " + fsStatus.getPermission() + ". Fixing
> permissions "
> > +
> >           "to correct value " + JOB_DIR_PERMISSION);
> >         fs.setPermission(stagingArea, JOB_DIR_PERMISSION);
> >       }
> >     } else {
> >       fs.mkdirs(stagingArea,
> >           new FsPermission(JOB_DIR_PERMISSION));
> >     }
> >     return stagingArea;
> >   }
> >
> >
> >
> > I think my job calls getOwner() which returns NULL since s3 does not have
> > file permissions which results in the IO exception that i am getting.
> >
> > Any workaround for this? Any idea how i could you s3 as the filesystem
> with
> > hadoop on distributed mode?
>
>
>
> --
> Harsh J
>

Reply via email to