Using LinuxTaskController -- Local Dir Permissions Error

Aaron Kimball Fri, 23 Oct 2009 12:17:01 -0700

Hi all,

I'm trying to get the LinuxTaskController working (on the svn trunk) on a
pseudo-distributed cluster. It's being quite frustrating.


I compiled common, hdfs, and mapred jars with 'ant jar' and copied
everything together into the same directory structure. I then ran:

$ cd src/git/mapred/src/c++/task-controller
$ bash ./configure
$ make
$ cp task-controller ~/src/git/hadoop-common/bin
$ cd ~/src/git/hadoop-common/bin
$ sudo chown root:root task-controller
$ sudo chmod 6655 task-controller
$ ls -l task-controller
-rwSr-sr-x 1 root  root  45659 2009-10-23 00:31 task-controller

My configuration is pretty minimal; I've not set much in mapred-site.xml
besides mapred.job.tracker. I enabled the task controller with:
<property>
  <name>mapreduce.tasktracker.taskcontroller</name>
  <value>org.apache.hadoop.mapred.LinuxTaskController</value>
</property>

core-site just sets fs.default.name. hdfs-site is empty.
taskcontroller.cfg looks like:
mapreduce.cluster.local.dir=/tmp/hadoop-aaron/mapred/local
hadoop.pid.dir=/tmp
hadoop.log.dir=/home/aaron/src/git/hadoop-common/logs
hadoop.indent.str=#configured HADOOP_IDENT_STR

(NB, typo "hadoop.indent.str" -- this was in the template file. I can't
actually find a reference to either "hadoop.indent.str" or
"hadoop.ident.str" in the task-controller C source, so I don't think this
matters).

I can verify that task-controller can do some stuff. For example, I can
start another process (e.g., vim, find its pid, and then run)
$ `readlink -f task-controller` aaron 6 <pid-of-vim>
and task-controller will kill it. (The `readlink -f...` is needed because
task-controller expects to get its full absolute path as argv[0] or else it
segfaults on a malloc()... but that's another story.)

Here are the permissions on my mapred.local.dir:

aa...@jargon:/tmp/hadoop-aaron/mapred/local$ ls -l
total 8
drwxrwxr-x 2 aaron aaron 4096 2009-10-23 01:03 jobTracker
drwxr-xr-x 3 aaron aaron 4096 2009-10-23 01:01 taskTracker


I start Hadoop using the standard scripts
$ bin/start-dfs.sh
$ bin/start-mapred.sh

All of this is running as user "aaron", btw.

... so here's the problem -- I can't actually launch tasks!

I try running a trivial job, and here's the output to the client:

09/10/23 12:08:39 INFO mapreduce.JobSubmitter: number of splits:1
09/10/23 12:08:40 INFO mapreduce.Job: Running job: job_200910231205_0002
09/10/23 12:08:41 INFO mapreduce.Job:  map 0% reduce 0%
09/10/23 12:08:45 INFO mapreduce.Job: Task Id :
attempt_200910231205_0002_m_000002_0, Status : FAILED
Error initializing attempt_200910231205_0002_m_000002_0:
java.io.IOException: Not able to initialize job directories in any of the
configured local directories for job job_200910231205_0002
    at
org.apache.hadoop.mapreduce.server.tasktracker.Localizer.initializeJobDirs(Localizer.java:318)
    at
org.apache.hadoop.mapred.TaskTracker.localizeJobFiles(TaskTracker.java:904)
    at
org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:860)
    at
org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1849)
    at
org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:106)
    at
org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1814)

<repeated several more times for the subsequent task attempts>



... and here are the log messages that appears in tasktracker.log:
2009-10-23 12:09:02,268 INFO org.apache.hadoop.mapred.TaskTracker: Trying to
launch : attempt_200910231205_0002_m_000001_3 which needs 1 slots
2009-10-23 12:09:02,268 INFO org.apache.hadoop.mapred.TaskTracker: In
TaskLauncher, current free slots : 4 and trying to launch
attempt_200910231205_0002_m_000001_3 which needs 1 slots
2009-10-23 12:09:02,268 INFO
org.apache.hadoop.mapreduce.server.tasktracker.Localizer: User-directories
for the user aaron are already initialized on this TT. Not doing anything.
2009-10-23 12:09:02,271 WARN
org.apache.hadoop.mapreduce.server.tasktracker.Localizer: Not able to create
job directory
/tmp/hadoop-aaron/mapred/local/taskTracker/aaron/jobcache/job_200910231205_0002
2009-10-23 12:09:02,272 WARN org.apache.hadoop.mapred.TaskTracker: Error
initializing attempt_200910231205_0002_m_000001_3:
java.io.IOException: Not able to initialize job directories in any of the
configured local directories for job job_200910231205_0002
    at
org.apache.hadoop.mapreduce.server.tasktracker.Localizer.initializeJobDirs(Localizer.java:318)
    at
org.apache.hadoop.mapred.TaskTracker.localizeJobFiles(TaskTracker.java:904)
    at
org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:860)
    at
org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1849)
    at
org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:106)
    at
org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1814)





If I look inside the tasktracker local dir, here's what I see:
aa...@jargon:/tmp/hadoop-aaron/mapred/local$ cd taskTracker/
aa...@jargon:/tmp/hadoop-aaron/mapred/local/taskTracker$ ls -l
total 4
dr-xrws--- 4 aaron root 4096 2009-10-23 12:06 aaron

aa...@jargon:/tmp/hadoop-aaron/mapred/local/taskTracker$ cd aaron/
aa...@jargon:/tmp/hadoop-aaron/mapred/local/taskTracker/aaron$ ls -l
total 8
dr-xrws--- 2 aaron root 4096 2009-10-23 12:06 distcache
dr-xrws--- 2 aaron root 4096 2009-10-23 12:06 jobcache


both of those dirs are empty. The /tmp/hadoop-aaron/mapred/local/taskTracker
dir was created by the TT -- I had rm rf'd it before starting Hadoop, so all
those permissions under there are those as-set by the TT itself.

... so this means that the owning user (aaron) can't write to his own
directory, because I'm not a part of group 'root'! I tried setting u+w on
distcache and jobcache. Now it can create a job dir, but the job dir it
creates has these permissions:

aa...@jargon:/tmp/hadoop-aaron/mapred/local/taskTracker/aaron/jobcache$ ls
-l
total 4
dr-xrws--- 4 aaron root 4096 2009-10-23 12:13 job_200910231205_0003

... so it fails to make any attempt dirs.

Is there something I should be doing differently with Linux users/groups? Or
is this a bug in task-controller?

Thanks,
- Aaron

Using LinuxTaskController -- Local Dir Permissions Error

Reply via email to