i don't have the option of setting the map heap size to 2 GB since my
real environment is AWS EMR and the constraints are set.
http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html this
link is where i am currently reading on the meaning of io.sort.factor
and io.sort.mb.
it seems io.s
Hello,
Yes, that question on stackoverflow is mine as well. I first posted
there, but then realized that in this mailgroup there would probably
be more Hadoop knowledge.
And you where right! For some reason the userlog folder in the log
folder was owned by root instead of hadoopmachine. I have no
This looks like a log dir problem:
at
org.apache.hadoop.mapred.JobLocalizer.initializeJobLogDir(JobLocalizer.java:239)
looking through the source for JobLocalizer, it's trying to create a
folder under ${hadoop.log.dir}/userlogs. There's a similar question (i
assume it's yours) on StackOverflow:
ps -ef | grep hadoop shows that it is indeed "hadoopmachine" that is
running hadoop.
I su'ed into the user hadoopmachine (which is also the standard user I
login with in debian), and I can access the hdfs that way as well.
The free space should also not be a problem:
hadoopmachine@debian:~$ df -
The permissions look alright if TT too is run by 'hadoopmachine'. Can
you also check if you have adequate space free, reported by df -h
/home/hadoopmachine?
On Tue, Apr 3, 2012 at 10:28 PM, Bas Hickendorff
wrote:
> Thanks for your help!
> However, as far as I can see, the user has those rights.
>
Take a look which user is actually running hadoop
Ps -ef | grep hadoop
Su to that user and try to touch hdfs directory
Regards
Serge
On 4/3/12 10:44 AM, "Bas Hickendorff" wrote:
>Yes, it does, and it contains the file with input data (file is called
>"in").
>
>
>hadoopmachine@debian:~/hadoop-
Yes, it does, and it contains the file with input data (file is called "in").
hadoopmachine@debian:~/hadoop-1.0.1$ bin/hadoop fs -ls
Warning: $HADOOP_HOME is deprecated.
Found 1 items
drwxr-xr-x - hadoopmachine supergroup 0 2012-04-03 07:11
/user/hadoopmachine/input
hadoopmachine@de
Does this directory exist in HDFS
/user/hadoopmachine/input
???
Serge Blazhievsky
On 4/3/12 6:28 AM, "Bas Hickendorff" wrote:
>Hello all,
>
>My map-reduce operation on Hadoop (running on Debian) is correctly
>starting and finding the input file. However, just after starting the
>map reduce
Thanks for your help!
However, as far as I can see, the user has those rights.
I have in mapred-ste.xml :
mapred.local.dir
/home/hadoopmachine/hadoop_data/mapred
true
and the directories look like this:
hadoopmachine@debian:~$ cd /home/hadoopmachine/hadoop_data/mapred
h
Some of your TaskTrackers' mapred.local.dirs do not have proper r/w
permissions set on them. Make sure they are owned by the user that
runs the TT service and have read/write permission at least for that
user.
On Tue, Apr 3, 2012 at 6:58 PM, Bas Hickendorff
wrote:
> Hello all,
>
> My map-reduce o
Hi,
I am using the following code to generate cross product in hadoop.
package com.example.hadoopexamples.joinnew;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.List;
import java.util.StringTokenizer;
im
Jane,
From my first look, properties that can help you could be
- Increase io sort factor to 100
- Increase io.sort.mb to 512Mb
- increase map task heap size to 2GB.
If the task still stalls, try providing lesser input for each mapper.
Regards
Bejoy KS
On Tue, Apr 3, 2012 at 2:08 PM, Jane
i have a map reduce job that is generating a lot of intermediate key-value
pairs. for example, when i am 1/3 complete with my map phase, i may have
generated over 130,000,000 output records (which is about 9 gigabytes). to
get to the 1/3 complete mark is very fast (less than 10 minutes), but at
the
13 matches
Mail list logo