Re: how to fine tuning my map reduce job that is generating a lot of intermediate key-value pairs (a lot of I/O operations)

2012-04-03 Thread Jane Wayne
i don't have the option of setting the map heap size to 2 GB since my real environment is AWS EMR and the constraints are set. http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html this link is where i am currently reading on the meaning of io.sort.factor and io.sort.mb. it seems io.s

Re: How to find out what file Hadoop is looking for

2012-04-03 Thread Bas Hickendorff
Hello, Yes, that question on stackoverflow is mine as well. I first posted there, but then realized that in this mailgroup there would probably be more Hadoop knowledge. And you where right! For some reason the userlog folder in the log folder was owned by root instead of hadoopmachine. I have no

Re: How to find out what file Hadoop is looking for

2012-04-03 Thread Chris White
This looks like a log dir problem: at org.apache.hadoop.mapred.JobLocalizer.initializeJobLogDir(JobLocalizer.java:239) looking through the source for JobLocalizer, it's trying to create a folder under ${hadoop.log.dir}/userlogs. There's a similar question (i assume it's yours) on StackOverflow:

Re: How to find out what file Hadoop is looking for

2012-04-03 Thread Bas Hickendorff
ps -ef | grep hadoop shows that it is indeed "hadoopmachine" that is running hadoop. I su'ed into the user hadoopmachine (which is also the standard user I login with in debian), and I can access the hdfs that way as well. The free space should also not be a problem: hadoopmachine@debian:~$ df -

Re: How to find out what file Hadoop is looking for

2012-04-03 Thread Harsh J
The permissions look alright if TT too is run by 'hadoopmachine'. Can you also check if you have adequate space free, reported by df -h /home/hadoopmachine? On Tue, Apr 3, 2012 at 10:28 PM, Bas Hickendorff wrote: > Thanks for your help! > However, as far as I can see, the user has those rights. >

Re: How to find out what file Hadoop is looking for

2012-04-03 Thread Serge Blazhievsky
Take a look which user is actually running hadoop Ps -ef | grep hadoop Su to that user and try to touch hdfs directory Regards Serge On 4/3/12 10:44 AM, "Bas Hickendorff" wrote: >Yes, it does, and it contains the file with input data (file is called >"in"). > > >hadoopmachine@debian:~/hadoop-

Re: How to find out what file Hadoop is looking for

2012-04-03 Thread Bas Hickendorff
Yes, it does, and it contains the file with input data (file is called "in"). hadoopmachine@debian:~/hadoop-1.0.1$ bin/hadoop fs -ls Warning: $HADOOP_HOME is deprecated. Found 1 items drwxr-xr-x - hadoopmachine supergroup 0 2012-04-03 07:11 /user/hadoopmachine/input hadoopmachine@de

Re: How to find out what file Hadoop is looking for

2012-04-03 Thread Serge Blazhievsky
Does this directory exist in HDFS /user/hadoopmachine/input ??? Serge Blazhievsky On 4/3/12 6:28 AM, "Bas Hickendorff" wrote: >Hello all, > >My map-reduce operation on Hadoop (running on Debian) is correctly >starting and finding the input file. However, just after starting the >map reduce

Re: How to find out what file Hadoop is looking for

2012-04-03 Thread Bas Hickendorff
Thanks for your help! However, as far as I can see, the user has those rights. I have in mapred-ste.xml : mapred.local.dir /home/hadoopmachine/hadoop_data/mapred true and the directories look like this: hadoopmachine@debian:~$ cd /home/hadoopmachine/hadoop_data/mapred h

Re: How to find out what file Hadoop is looking for

2012-04-03 Thread Harsh J
Some of your TaskTrackers' mapred.local.dirs do not have proper r/w permissions set on them. Make sure they are owned by the user that runs the TT service and have read/write permission at least for that user. On Tue, Apr 3, 2012 at 6:58 PM, Bas Hickendorff wrote: > Hello all, > > My map-reduce o

Cross join/product in Map/Reduce

2012-04-03 Thread madhu phatak
Hi, I am using the following code to generate cross product in hadoop. package com.example.hadoopexamples.joinnew; import java.io.BufferedReader; import java.io.IOException; import java.io.InputStreamReader; import java.util.ArrayList; import java.util.List; import java.util.StringTokenizer; im

Re: how to fine tuning my map reduce job that is generating a lot of intermediate key-value pairs (a lot of I/O operations)

2012-04-03 Thread Bejoy Ks
Jane, From my first look, properties that can help you could be - Increase io sort factor to 100 - Increase io.sort.mb to 512Mb - increase map task heap size to 2GB. If the task still stalls, try providing lesser input for each mapper. Regards Bejoy KS On Tue, Apr 3, 2012 at 2:08 PM, Jane

how to fine tuning my map reduce job that is generating a lot of intermediate key-value pairs (a lot of I/O operations)

2012-04-03 Thread Jane Wayne
i have a map reduce job that is generating a lot of intermediate key-value pairs. for example, when i am 1/3 complete with my map phase, i may have generated over 130,000,000 output records (which is about 9 gigabytes). to get to the 1/3 complete mark is very fast (less than 10 minutes), but at the