How Jobtracler stores tasktracker's information

2011-12-13 Thread hadoop anis
Anyone please tell this, I want to know from where Jobtracker sends task(taskid) to tasktarcker for scheduling. i.e where it creates taskid tasktracker pairs Thanks Regards, Mohmmadanis Moulavi Student, MTech (Computer Sci. Engg.) Walchand college of Engg. Sangli (M.S.)

Where do i see Sysout statements after building example ?

2011-12-13 Thread ArunKumar
HI guys ! I have a single node set up as per http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/ 1I have put some sysout statements in Jobtracker and wordcount (src/examples/org/..) code 2ant build 3Ran example jar with wordcount Where do i find the sysout

Newbee Question: Do I must load XML files in KV store?

2011-12-13 Thread thedba
I have a constant feed of large number of XML files that I would like to use MapReduce and Hive My questions are (1) Do I must load XML files into KV store before I can MapReduce (2) Do I must load XML files into KV store before I can use Hive? Thanks TheDBA -- View this message in context:

Re: Where do i see Sysout statements after building example ?

2011-12-13 Thread Harsh J
JobTracker sysouts would go to logs/*-jobtracker*.out On 13-Dec-2011, at 8:08 PM, ArunKumar wrote: HI guys ! I have a single node set up as per http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/ 1I have put some sysout statements in Jobtracker and

Re: Where do i see Sysout statements after building example ?

2011-12-13 Thread Bejoy Ks
Adding on to Harsh's response. If your Sysouts are on mapper or reducer classes, you can get the same from from WebUI as well, http://JT host:50030/jobtracker.jsp . You need to select your job and drill down to individual task level. Regards Bejoy.K.S On Tue, Dec 13, 2011 at 10:30 PM, Harsh J

Re: Where do i see Sysout statements after building example ?

2011-12-13 Thread Mark Kerzner
For me, they go two levels deeper - under 'userlogs' in logs, then in directory that stores the run logs. Here is what I see root@ip-10-84-123-125 :/var/log/hadoop/userlogs/job_201112120200_0010/attempt_201112120200_0010_r_02_0# ls log.index stderr stdout syslog and there, in stdout, I

RE: More cores Vs More Nodes ?

2011-12-13 Thread Brad Sarsfield
Praveenesh, Your question is not naïve; in fact, optimal hardware design can ultimately be a very difficult question to answer on what would be better. If you made me pick one without much information I'd go for more machines. But... It all depends; and there is no right answer :)

Re: More cores Vs More Nodes ?

2011-12-13 Thread Prashant Kommireddi
Hi Brad, how many taskstrackers did you have on each node in both cases? Thanks, Prashant Sent from my iPhone On Dec 13, 2011, at 9:42 AM, Brad Sarsfield b...@bing.com wrote: Praveenesh, Your question is not naïve; in fact, optimal hardware design can ultimately be a very difficult

RE: More cores Vs More Nodes ?

2011-12-13 Thread Tom Deutsch
It also helps to know the profile of your job in how you spec the machines. So in addition to Brad's response you should consider if you think your jobs will be more storage or compute oriented. Tom Deutsch Program Director Information

Re: More cores Vs More Nodes ?

2011-12-13 Thread real great..
more cores might help in hadoop environments as there would be more data locality. your thoughts? On Tue, Dec 13, 2011 at 11:11 PM, Brad Sarsfield b...@bing.com wrote: Praveenesh, Your question is not naïve; in fact, optimal hardware design can ultimately be a very difficult question to

Re: More cores Vs More Nodes ?

2011-12-13 Thread Alexander Pivovarov
more nodes means more IO on read on mapper step If you use combiners you might need to send only small amount of data over network to reducers Alexander On Tue, Dec 13, 2011 at 12:45 PM, real great.. greatness.hardn...@gmail.com wrote: more cores might help in hadoop environments as there

Re: More cores Vs More Nodes ?

2011-12-13 Thread bharath vissapragada
Hey there, I agree with Tom's response. One can decide it based on the type of jobs you run. I have been working on Hive and I realized that increasing no. of cores would give very good performance boost because joins and stuff are compute oriented and consume a lot of CPU on reduce side. This

Re: ArrayWritable usage

2011-12-13 Thread Brock Noland
Hi, ArrayWritable is a touch hard to use. Say you have an array of IntWritable[]. The get() method or ArrayWritable, after serializations/deserialization, does in fact return an array of type Writable. As such you cannot cast it directly to IntWritable[]. Individual elements are of type

remote hadoop in tomcat with jaas security

2011-12-13 Thread Avni, Itamar
Hi, Our application runs in Tomcat 5.5, Java 6.17, with JAAS. We give our own implementation to LoginModule and we start Tomcat with -Djava.security.auth.login.config. We use Hadoop 0.20.203.0. We want to execute Hadoop jobs, or Hadoop FileSystem methods from within our application, remotely.

Re: HDFS Backup nodes

2011-12-13 Thread Suresh Srinivas
Srivas, As you may know already, NFS is just being used in the first prototype for HA. Two options for editlog store are: 1. Using BookKeeper. Work has already completed on trunk towards this. This will replace need for NFS to store the editlogs and is highly available. This solution will also

Re: How Jobtracler stores tasktracker's information

2011-12-13 Thread Arun C Murthy
Moving to mapreduce-user@, bcc common-user@. Please use project specific lists. Take a look at JobTracker.heartbeat - *Scheduler.assignTasks. After the scheduler 'assigns' tasks, the JT sends the corresponding 'LaunchTaskAction' to the TaskTracker. hth, Arun On Dec 13, 2011, at 12:59 AM,

RE: More cores Vs More Nodes ?

2011-12-13 Thread Brad Sarsfield
Hi Prashant, In each case I had a single tasktracker per node. I oversubscribed the total tasks per tasktracker/node by 1.5 x # of cores. So for the 64 core allocation comparison. In A: 8 cores; Each machine had a single tasktracker with 8 maps / 4 reduce slots for 12 task slots total

Re: HDFS Backup nodes

2011-12-13 Thread Todd Lipcon
On Sun, Dec 11, 2011 at 10:47 PM, M. C. Srivas mcsri...@gmail.com wrote: But if you use a Netapp, then the likelihood of the Netapp crashing is lower than the likelihood of a garbage-collection-of-death happening in the NN. This is pure FUD. I've never seen a garbage collection of death ever

Re: More cores Vs More Nodes ?

2011-12-13 Thread He Chen
Hi Brad This is a really interesting experiment. I am curious why you did not use 2 cores each machine but 32 nodes. That makes the number of CPU core in two groups equal. Chen On Tue, Dec 13, 2011 at 7:15 PM, Brad Sarsfield b...@bing.com wrote: Hi Prashant, In each case I had a single

Re: HDFS Backup nodes

2011-12-13 Thread M. C. Srivas
Suresh, As of today, there is no option except to use NFS. And as you yourself mention, the first HA prototype when it comes out will require NFS. (a) I wasn't aware that Bookkeeper had progressed that far. I wonder whether it would be able to keep up with the data rates that is required in

Re: HDFS Backup nodes

2011-12-13 Thread Todd Lipcon
On Tue, Dec 13, 2011 at 10:42 PM, M. C. Srivas mcsri...@gmail.com wrote: Any simple file meta-data test will cause the NN to spiral to death with infinite GC.  For example, try create many many files. Or even simple stat a bunch of file continuously. Sure. If I run dd if=/dev/zero of=foo my

Re: HDFS Backup nodes

2011-12-13 Thread Konstantin Boudnik
On Tue, Dec 13, 2011 at 11:00PM, M. C. Srivas wrote: Suresh, As of today, there is no option except to use NFS. And as you yourself mention, the first HA prototype when it comes out will require NFS. Well, in the interest of full disclosure NFS is just one of the options and not the only

Re: HDFS Backup nodes

2011-12-13 Thread Todd Lipcon
On Tue, Dec 13, 2011 at 11:00 PM, M. C. Srivas mcsri...@gmail.com wrote: (a) I wasn't aware that Bookkeeper had progressed that far. I wonder whether it would be able to keep up with the data rates that is required in order to hold the NN log without falling behind. It's a good question - but