Re: Unexpected termination of a job

2010-03-04 Thread Arvind Sharma
Have you tried after increasing HEAP memory to your process ? Arvind From: Rakhi Khatwani rkhatw...@gmail.com To: common-user@hadoop.apache.org Sent: Wed, March 3, 2010 10:38:43 PM Subject: Re: Unexpected termination of a job Hi, I tried running it

should data be evenly distributed to each (physical) node

2010-03-04 Thread openresearch
I am building a small two node cluster following http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster) Every thing seems to be working, except I notice the data are NOT evenly distributed to each physical box. e.g., when I hadoop dfs -put 6G data. I am expecting

Re: should data be evenly distributed to each (physical) node

2010-03-04 Thread Edward Capriolo
On Thu, Mar 4, 2010 at 10:25 AM, openresearch qiming...@openresearchinc.com wrote: I am building a small two node cluster following http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster) Every thing seems to be working, except I notice the data are NOT evenly

Re: should data be evenly distributed to each (physical) node

2010-03-04 Thread Jean-Daniel Cryans
There's nothing like reading the manual: http://hadoop.apache.org/common/docs/r0.20.0/hdfs_design.html#Replica+Placement%3A+The+First+Baby+Steps Quote: For the common case, when the replication factor is three, HDFS’s placement policy is to put one replica on one node in the local rack, another

Re: can't start namenode

2010-03-04 Thread mike anderson
We have a single dfs.name.dir directory, in case it's useful the contents are: [m...@carr name]$ ls -l total 8 drwxrwxr-x 2 mike mike 4096 Mar 4 11:18 current drwxrwxr-x 2 mike mike 4096 Oct 8 16:38 image On Thu, Mar 4, 2010 at 12:00 PM, Todd Lipcon t...@cloudera.com wrote: Hi Mike, Was

Re: Hadoop as master's thesis

2010-03-04 Thread Amund Tveit
On Mon, Mar 1, 2010 at 3:01 PM, Tonci Buljan tonci.bul...@gmail.com wrote: Hello everyone,  I'm thinking of using Hadoop as a subject in my master's thesis in Computer Science. I'm supposed to solve some kind of a problem with Hadoop, but can't think of any :)). Here is an overview of

Re: Hbase VS Hive

2010-03-04 Thread Fitrah Elly Firdaus
On 03/04/2010 01:19 AM, Michael Segel wrote: Date: Thu, 4 Mar 2010 00:42:11 +0700 From: fitrah.fird...@gmail.com To: common-user@hadoop.apache.org Subject: Hbase VS Hive Hello Everyone I want to ask about Hbase and Hive. What is the different between Hbase and Hive? and then what is

Re: Will interactive password authentication fail talk between namenode-datanode/jobtracker-tasktracker?

2010-03-04 Thread Allen Wittenauer
On 3/3/10 3:38 PM, jiang licht licht_ji...@yahoo.com wrote: Here's my question, I have to type my password (not PASSPHRASE for key) due to some reverse name resolution problem when I do either SSH MASTER from SLAVE or SSH SLAVE from MASTER. Since my system admin told me all ports are open

Re: Hbase VS Hive

2010-03-04 Thread Edward Capriolo
On Thu, Mar 4, 2010 at 12:13 PM, Fitrah Elly Firdaus fitrah.fird...@gmail.com wrote: On 03/04/2010 01:19 AM, Michael Segel wrote: Date: Thu, 4 Mar 2010 00:42:11 +0700 From: fitrah.fird...@gmail.com To: common-user@hadoop.apache.org Subject: Hbase VS Hive Hello Everyone I want to ask

some doubts

2010-03-04 Thread Varun Thacker
I am using ubuntu Linux. I was able to get the standalone hadoop cluster running and run the wordcount example. before i start writing hadoop programs i wanted to compile the wordcount example on my own. So this is what i did to make the jar file on my own. javac -classpath

Re: Pipelining data from map to reduce

2010-03-04 Thread Ashutosh Chauhan
Bharath, This idea is kicking around in academia.. not made into apache yet.. https://issues.apache.org/jira/browse/MAPREDUCE-1211 You can get a working prototype from: http://code.google.com/p/hop/ Ashutosh On Thu, Mar 4, 2010 at 09:06, E. Sammer e...@lifeless.net wrote: On 3/4/10 12:00 PM,

Re: can't start namenode

2010-03-04 Thread Todd Lipcon
Hi Mike, Since you removed the edits, you restored to an earlier version of the namesystem. Thus, any files that were deleted since the last checkpoint will have come back. But, the blocks will have been removed from the datanodes. So, the NN is complaining since there are some files that have

Re: can't start namenode

2010-03-04 Thread mike anderson
Todd, That did the trick. Thanks to everyone for the quick responses and effective suggestions. -Mike On Thu, Mar 4, 2010 at 2:50 PM, Todd Lipcon t...@cloudera.com wrote: Hi Mike, Since you removed the edits, you restored to an earlier version of the namesystem. Thus, any files that were

Re: Hadoop on Azure

2010-03-04 Thread Yi Mao
You can take a look at www.zidata.com which provides a full windows experience on top of Hadoop. On Thu, Mar 4, 2010 at 12:41 PM, jawaid ekram jek...@hotmail.com wrote: Is there a Hadoop impletmentation on Azure cloud?

Re: Sorting

2010-03-04 Thread Arun C Murthy
Sample your input data and use the sample to drive your partitioner. Please take a look at TeraSort example in org.apache.hadoop.examples.terasort. Arun On Mar 3, 2010, at 9:21 AM, Aayush Garg wrote: Hi, Suppose I do need to sort a big file(in GB). How would I accomplish this task using

Re: Pipelining data from map to reduce

2010-03-04 Thread Jeff Hammerbacher
Also see Breaking the MapReduce Stage Barrier from UIUC: http://www.ideals.illinois.edu/bitstream/handle/2142/14819/breaking.pdf On Thu, Mar 4, 2010 at 11:41 AM, Ashutosh Chauhan ashutosh.chau...@gmail.com wrote: Bharath, This idea is kicking around in academia.. not made into apache yet..

Re: Pipelining data from map to reduce

2010-03-04 Thread Scott Carey
Interesting article. It claims to have the same fault tolerance but I don't see any explanation of how that can be. If a single mapper fails part-way through a task when it has transmitted partial results to a reducer, the whole job is corrupted. With the current barrier between map and

Re: Pipelining data from map to reduce

2010-03-04 Thread Edward Capriolo
I guess if you emmitted the key as task-id+ key you would have more overhead but if the data replayed the reducer could detect dups. Ed On 3/4/10, Scott Carey sc...@richrelevance.com wrote: Interesting article. It claims to have the same fault tolerance but I don't see any explanation of how

Data node cannot talk to name node Re: Will interactive password authentication fail talk between namenode-datanode/jobtracker-tasktracker?

2010-03-04 Thread jiang licht
Thanks Edward. Since the string of reverse mapping ... is just a warning, I guess it won't be a issue. Now, the namenode A is listening on port a. No data node sitting on a different box can talk to a...@a to join the cluster. But assign A also as a datanode is ok and this datanode can join