Re: [Doubt]: Submission of Mapreduce from outside Hadoop Cluster

2011-06-30 Thread Harsh J
Narayanan, On Fri, Jul 1, 2011 at 11:28 AM, Narayanan K wrote: > Hi all, > > We are basically working on a research project and I require some help > regarding this. Always glad to see research work being done! What're you working on? :) > How do I submit a mapreduce job from outside the clust

Re: Dynamic Cluster Node addition

2011-06-30 Thread Harsh J
Paul, You can inspect the data used by your new nodes after the balancer operation runs. "hadoop dfsadmin -report" should tell you detailed stats about each of the DNs, or look at /fsck (Note: by default, the balancer operation may be bandwidth limited, for performance reasons and may take a whil

[Doubt]: Submission of Mapreduce from outside Hadoop Cluster

2011-06-30 Thread Narayanan K
Hi all, We are basically working on a research project and I require some help regarding this. I had a few basic doubts regarding submission of Map-Reduce jobs in Hadoop. 1. How do I submit a mapreduce job from outside the cluster i.e from a different machine outside the Hadoop cluste

Re: Dynamic Cluster Node addition

2011-06-30 Thread Paul Rimba
Hey Matei, what if you do the bin/hadoop-daemon.sh start tasktracker bin/hadoop-daemon.sh start datanode. Does it move the old data to the new slave? I run that scenario a couple of times and run the start-balancer.sh. It always says that the cluster is balanced. Does it mean that the has been s

Re: Dynamic Cluster Node addition

2011-06-30 Thread Matei Zaharia
You can have a new TaskTracker or DataNode join the cluster by just starting that daemon on the slave (e.g. bin/hadoop-daemon.sh start tasktracker) and making sure it is configured to connect to the right JobTracker or NameNode (through the mapred.job.tracker and fs.default.name properties in th

Dynamic Cluster Node addition

2011-06-30 Thread Paul Rimba
Hey there, i am trying to add a new datanode/tasktracker to a currently running cluster. Is this feasible? And if yes, how do i change the masters, slaves and dfs.replication(in hdfs-site.xml) configuration? can i add the new slave to the slaves configuration file while the cluster is running?

Re: Dead data nodes during job excution and failed tasks.

2011-06-30 Thread Allen Wittenauer
On Jun 30, 2011, at 12:36 PM, David Ginzburg wrote: > > Is it possible though the server runs with vm.swappiness =5 That only controls how aggressive the system swaps. If you eat all the RAM in user space, the system is going to start paging memory regardless of swappiness.

RE: Dead data nodes during job excution and failed tasks.

2011-06-30 Thread David Ginzburg
Is it possible though the server runs with vm.swappiness =5 > Subject: Re: Dead data nodes during job excution and failed tasks. > From: a...@apache.org > Date: Thu, 30 Jun 2011 11:46:25 -0700 > To: mapreduce-user@hadoop.apache.org > > > On Jun 30, 2011, at 10:01 AM, David Ginzburg wrote: >

Re: How does Hadoop manage memory?

2011-06-30 Thread Allen Wittenauer
On Jun 28, 2011, at 1:43 PM, Peter Wolf wrote: > Hello all, > > I am looking for the right thing to read... > > I am writing a MapReduce Speech Recognition application. I want to run many > Speech Recognizers in parallel. > > Speech Recognizers not only use a large amount of processor, they

Re: Emit an entire file

2011-06-30 Thread Allen Wittenauer
On Jun 28, 2011, at 6:19 AM, Jeremy Cunningham wrote: > I have lots of binary files stored in hdfs. I read them using Apache POI and > can search with no problems. I want to be able to search for keywords (which > I can do) and then copy the file that has the text out to a different > locatio

Re: Dead data nodes during job excution and failed tasks.

2011-06-30 Thread Allen Wittenauer
On Jun 30, 2011, at 10:01 AM, David Ginzburg wrote: > > Hi, > I am running a certain job which constantly cause dead data nodes (who come > back later, spontaneously ). Check your memory usage during the job run. Chances are good the DataNode is getting swapped out.