Re: HDFS Backup nodes

2011-12-13 Thread Todd Lipcon
On Tue, Dec 13, 2011 at 11:00 PM, M. C. Srivas wrote: > (a) I wasn't aware that Bookkeeper had progressed that far. I wonder > whether it would be able to keep up with the data rates that is required in > order to hold the NN log without falling behind. It's a good question - but one which has da

Re: HDFS Backup nodes

2011-12-13 Thread Konstantin Boudnik
On Tue, Dec 13, 2011 at 11:00PM, M. C. Srivas wrote: > Suresh, > > As of today, there is no option except to use NFS. And as you yourself > mention, the first HA prototype when it comes out will require NFS. Well, in the interest of full disclosure NFS is just one of the options and not the only

Re: HDFS Backup nodes

2011-12-13 Thread Todd Lipcon
On Tue, Dec 13, 2011 at 10:42 PM, M. C. Srivas wrote: > Any simple file meta-data test will cause the NN to spiral to death with > infinite GC.  For example, try create many many files. Or even simple > "stat" a bunch of file continuously. Sure. If I run "dd if=/dev/zero of=foo" my laptop will "s

Re: HDFS Backup nodes

2011-12-13 Thread M. C. Srivas
Suresh, As of today, there is no option except to use NFS. And as you yourself mention, the first HA prototype when it comes out will require NFS. (a) I wasn't aware that Bookkeeper had progressed that far. I wonder whether it would be able to keep up with the data rates that is required in orde

Re: HDFS Backup nodes

2011-12-13 Thread M. C. Srivas
On Tue, Dec 13, 2011 at 6:19 PM, Todd Lipcon wrote: > On Sun, Dec 11, 2011 at 10:47 PM, M. C. Srivas wrote: > > But if you use a Netapp, then the likelihood of the Netapp crashing is > > lower than the likelihood of a garbage-collection-of-death happening in > the > > NN. > > This is pure FUD.

Re: More cores Vs More Nodes ?

2011-12-13 Thread He Chen
Hi Brad This is a really interesting experiment. I am curious why you did not use 2 cores each machine but 32 nodes. That makes the number of CPU core in two groups equal. Chen On Tue, Dec 13, 2011 at 7:15 PM, Brad Sarsfield wrote: > Hi Prashant, > > In each case I had a single tasktracker per

Re: HDFS Backup nodes

2011-12-13 Thread Todd Lipcon
On Sun, Dec 11, 2011 at 10:47 PM, M. C. Srivas wrote: > But if you use a Netapp, then the likelihood of the Netapp crashing is > lower than the likelihood of a garbage-collection-of-death happening in the > NN. This is pure FUD. I've never seen a "garbage collection of death" ever in any NN with

RE: More cores Vs More Nodes ?

2011-12-13 Thread Brad Sarsfield
Hi Prashant, In each case I had a single tasktracker per node. I oversubscribed the total tasks per tasktracker/node by 1.5 x # of cores. So for the 64 core allocation comparison. In A: 8 cores; Each machine had a single tasktracker with 8 maps / 4 reduce slots for 12 task slots total p

Re: How Jobtracler stores tasktracker's information

2011-12-13 Thread Arun C Murthy
Moving to mapreduce-user@, bcc common-user@. Please use project specific lists. Take a look at JobTracker.heartbeat -> *Scheduler.assignTasks. After the scheduler 'assigns' tasks, the JT sends the corresponding 'LaunchTaskAction' to the TaskTracker. hth, Arun On Dec 13, 2011, at 12:59 AM, had

Re: HDFS Backup nodes

2011-12-13 Thread Suresh Srinivas
Srivas, As you may know already, NFS is just being used in the first prototype for HA. Two options for editlog store are: 1. Using BookKeeper. Work has already completed on trunk towards this. This will replace need for NFS to store the editlogs and is highly available. This solution will also b

remote hadoop in tomcat with jaas security

2011-12-13 Thread Avni, Itamar
Hi, Our application runs in Tomcat 5.5, Java 6.17, with JAAS. We give our own implementation to LoginModule and we start Tomcat with -Djava.security.auth.login.config. We use Hadoop 0.20.203.0. We want to execute Hadoop jobs, or Hadoop FileSystem methods from within our application, remotely.

Re: ArrayWritable usage

2011-12-13 Thread Brock Noland
Hi, ArrayWritable is a touch hard to use. Say you have an array of IntWritable[]. The get() method or ArrayWritable, after serializations/deserialization, does in fact return an array of type Writable. As such you cannot cast it directly to IntWritable[]. Individual elements are of type IntWritabl

Re: More cores Vs More Nodes ?

2011-12-13 Thread bharath vissapragada
Hey there, I agree with Tom's response. One can decide it based on the type of jobs you run. I have been working on Hive and I realized that increasing no. of cores would give very good performance boost because joins and stuff are compute oriented and consume a lot of CPU on reduce side. This may

Re: More cores Vs More Nodes ?

2011-12-13 Thread Alexander Pivovarov
more nodes means more IO on read on mapper step If you use combiners you might need to send only small amount of data over network to reducers Alexander On Tue, Dec 13, 2011 at 12:45 PM, real great.. wrote: > more cores might help in hadoop environments as there would be more data > locality.

Re: More cores Vs More Nodes ?

2011-12-13 Thread real great..
more cores might help in hadoop environments as there would be more data locality. your thoughts? On Tue, Dec 13, 2011 at 11:11 PM, Brad Sarsfield wrote: > Praveenesh, > > Your question is not naïve; in fact, optimal hardware design can > ultimately be a very difficult question to answer on what

RE: More cores Vs More Nodes ?

2011-12-13 Thread Tom Deutsch
It also helps to know the profile of your job in how you spec the machines. So in addition to Brad's response you should consider if you think your jobs will be more storage or compute oriented. Tom Deutsch Program Director Information Management

Re: More cores Vs More Nodes ?

2011-12-13 Thread Prashant Kommireddi
Hi Brad, how many taskstrackers did you have on each node in both cases? Thanks, Prashant Sent from my iPhone On Dec 13, 2011, at 9:42 AM, Brad Sarsfield wrote: > Praveenesh, > > Your question is not naïve; in fact, optimal hardware design can ultimately > be a very difficult question to answ

RE: More cores Vs More Nodes ?

2011-12-13 Thread Brad Sarsfield
Praveenesh, Your question is not naïve; in fact, optimal hardware design can ultimately be a very difficult question to answer on what would be "better". If you made me pick one without much information I'd go for more machines. But... It all depends; and there is no right answer :) Mo

Re: Where do i see Sysout statements after building example ?

2011-12-13 Thread Mark Kerzner
For me, they go two levels deeper - under 'userlogs' in logs, then in directory that stores the run logs. Here is what I see root@ip-10-84-123-125 :/var/log/hadoop/userlogs/job_201112120200_0010/attempt_201112120200_0010_r_02_0# ls log.index stderr stdout syslog and there, in stdout, I se

Re: Where do i see Sysout statements after building example ?

2011-12-13 Thread Bejoy Ks
Adding on to Harsh's response. If your Sysouts are on mapper or reducer classes, you can get the same from from WebUI as well, http://:50030/jobtracker.jsp . You need to select your job and drill down to individual task level. Regards Bejoy.K.S On Tue, Dec 13, 2011 at 10:30 PM, Harsh J wrote:

Re: Where do i see Sysout statements after building example ?

2011-12-13 Thread Harsh J
JobTracker sysouts would go to logs/*-jobtracker*.out On 13-Dec-2011, at 8:08 PM, ArunKumar wrote: > HI guys ! > > I have a single node set up as per > http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/ > 1>I have put some sysout statements in Jobtracker an

Newbee Question: Do I must load XML files in KV store?

2011-12-13 Thread thedba
I have a constant feed of large number of XML files that I would like to use MapReduce and Hive My questions are (1) Do I must load XML files into KV store before I can MapReduce (2) Do I must load XML files into KV store before I can use Hive? Thanks TheDBA -- View this message in context: h

Where do i see Sysout statements after building example ?

2011-12-13 Thread ArunKumar
HI guys ! I have a single node set up as per http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/ 1>I have put some sysout statements in Jobtracker and wordcount (src/examples/org/..) code 2>ant build 3>Ran example jar with wordcount Where do i find the syso

How Jobtracler stores tasktracker's information

2011-12-13 Thread hadoop anis
Anyone please tell this, I want to know from where Jobtracker sends task(taskid) to tasktarcker for scheduling. i.e where it creates taskid & tasktracker pairs Thanks & Regards, Mohmmadanis Moulavi Student, MTech (Computer Sci. & Engg.) Walchand college of Engg. Sangli (M.S.)