These config settings depend on your MR job nature and resources available
on the node. Since increasing heap size affected the time dramatically I
assume that your jobs "like" memory. Can you describe your machines? Also,
make sure you don't have any network issues (slow network can cause slowness
Hi
the cluster has 12 nodes and the master node, I made a new test increasing
the child nodes memory to 2000m and the HADOOP_HEAP_SIZE to 2000m
and mapred.tasktracker.map.tasks.maximum and
mapred.tasktracker.reduce.tasks.maximum is 2 (like default) and now the time
is 6 minutes, but I think it is v
In case you need to process the files separately, use one MR job for each
file. You can add a single file as input. I believe you'll need to iterate
over all files in input dir and start job instance for each file. You can do
this in java code or in script or... depending on your case.
Alex Barana
How many nodes do you use for you "fully distributed" cluster?
Alex Baranau
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
On Wed, Nov 17, 2010 at 5:44 AM, Cornelio Iñigo
wrote:
> Hi
>
> I have a question to you:
>
> I developed a program using Hadoop, it has one
Hi,
When I set my inputFileFormat to take an input directory with three files
in, the job is processed on all three and the output is one containing the
result from all of them.
Instead I want the job to be repeated separately for each inputFile and hence a
different output.
Eg.
FYI, I commented out the Kernal version in the hadoop-ec2-env.sh script for the
c1.xlarge if statements (at the bottom).
Before it was using aki-427d952b
Now it's using aki-b51cf9dc
And I'm able to connect. Turns out the problem was a hang during the boot. This
should probably be changed in th
I've been running into an issue today.
I'm trying to procure 5 c1.xlarge instances on Amazon EC2. I was able to use
the 453820947548/bixolabs-hadoop-0.20.2-i386 AMI for my previous m1.large
instances, so I figured I could use the c1.xlarge instances with the x86_64
versions.
When I start these
We were getting SIGSEGV and fixed it by upgrading the JVM. We are
using 1.6.0_21 currently.
On Nov 16, 2010, at 3:50 PM, "Greg Langmead" wrote:
Newbie alert.
I have a Pig script I tested on small data and am now running it on
a larger
data set (85GB). My cluster is two machines right
Hi
I have a question to you:
I developed a program using Hadoop, it has one map function and one reduce
function (like WordCount) and in the map function I do all the process of my
data
when I run this program in a single node machine it takes like 7 minutes
(its a small dataset), in a pseudo-dis
Hi,
As an extension to the problem statement...Is it possible to fuse step 1 and 2
in to one step?
i.e. Can we have the map task to pick the input from an external filesystem
instead of HDFS.
Can FTPfileSystem/RawLocalFileSystem can be of any help here?
./zahoor
On 15-Nov-2010, at 3:10 PM, Seb
Hi,
On Wed, Nov 17, 2010 at 5:11 PM, Jaydeep Ayachit
wrote:
>
> - Configuration associated with this job
>
> - Job completion time
JobHistory.JobInfo is the class you're looking for perhaps.
--
Harsh J
www.harshj.com
Hello,
I need to retrieve some information from the submitted job. I can get
JobClient.getJob(JOBID) that returns RunningJob. I need to get
- Configuration associated with this job
- Job completion time
I could not see any APIs in RunningJob that can get this data. Any pointe
Are all the nodes being used? Go to :50030 on the web interface
after starting the job, and check whether the tasks are progressing together
on all nodes or not.
hari
On Wed, Nov 17, 2010 at 9:14 AM, Cornelio Iñigo
wrote:
> Hi
>
> I have a question to you:
>
> I developed a program using Hadoop,
Hi
I have a question to you:
I developed a program using Hadoop, it has one map function and one reduce
function (like WordCount) and in the map function I do all the process of my
data
when I run this program in a single node machine it takes like 7 minutes
(its a small dataset), in a pseudo-dis
Have you checked suggestions/examples here:
http://hadoop.apache.org/common/docs/current/cluster_setup.html? You
probably did, just in case. There's a lot of configuration options explained
with real-world examples.
Also useful:
http://www.cloudera.com/blog/2009/12/7-tips-for-improving-mapreduce-pe
15 matches
Mail list logo