Anyone please tell this,
I want to know from where Jobtracker sends task(taskid) to
tasktarcker for scheduling.
i.e where it creates taskid tasktracker pairs
Thanks Regards,
Mohmmadanis Moulavi
Student,
MTech (Computer Sci. Engg.)
Walchand college of Engg. Sangli (M.S.)
HI guys !
I have a single node set up as per
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
1I have put some sysout statements in Jobtracker and wordcount
(src/examples/org/..) code
2ant build
3Ran example jar with wordcount
Where do i find the sysout
I have a constant feed of large number of XML files that I would like to use
MapReduce and Hive
My questions are
(1) Do I must load XML files into KV store before I can MapReduce
(2) Do I must load XML files into KV store before I can use Hive?
Thanks
TheDBA
--
View this message in context:
JobTracker sysouts would go to logs/*-jobtracker*.out
On 13-Dec-2011, at 8:08 PM, ArunKumar wrote:
HI guys !
I have a single node set up as per
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
1I have put some sysout statements in Jobtracker and
Adding on to Harsh's response.
If your Sysouts are on mapper or reducer classes, you can get the same from
from WebUI as well, http://JT host:50030/jobtracker.jsp . You need
to select your job and drill down to individual task level.
Regards
Bejoy.K.S
On Tue, Dec 13, 2011 at 10:30 PM, Harsh J
For me, they go two levels deeper - under 'userlogs' in logs, then in
directory that stores the run logs.
Here is what I see
root@ip-10-84-123-125
:/var/log/hadoop/userlogs/job_201112120200_0010/attempt_201112120200_0010_r_02_0#
ls
log.index stderr stdout syslog
and there, in stdout, I
Praveenesh,
Your question is not naïve; in fact, optimal hardware design can ultimately be
a very difficult question to answer on what would be better. If you made me
pick one without much information I'd go for more machines. But...
It all depends; and there is no right answer :)
Hi Brad, how many taskstrackers did you have on each node in both cases?
Thanks,
Prashant
Sent from my iPhone
On Dec 13, 2011, at 9:42 AM, Brad Sarsfield b...@bing.com wrote:
Praveenesh,
Your question is not naïve; in fact, optimal hardware design can ultimately
be a very difficult
It also helps to know the profile of your job in how you spec the
machines. So in addition to Brad's response you should consider if you
think your jobs will be more storage or compute oriented.
Tom Deutsch
Program Director
Information
more cores might help in hadoop environments as there would be more data
locality.
your thoughts?
On Tue, Dec 13, 2011 at 11:11 PM, Brad Sarsfield b...@bing.com wrote:
Praveenesh,
Your question is not naïve; in fact, optimal hardware design can
ultimately be a very difficult question to
more nodes means more IO on read on mapper step
If you use combiners you might need to send only small amount of data over
network to reducers
Alexander
On Tue, Dec 13, 2011 at 12:45 PM, real great.. greatness.hardn...@gmail.com
wrote:
more cores might help in hadoop environments as there
Hey there,
I agree with Tom's response. One can decide it based on the type of jobs
you run. I have been working on Hive and I realized that increasing no. of
cores would give very good performance boost because joins and stuff are
compute oriented and consume a lot of CPU on reduce side. This
Hi,
ArrayWritable is a touch hard to use. Say you have an array of
IntWritable[]. The get() method or ArrayWritable, after
serializations/deserialization, does in fact return an array of type
Writable. As such you cannot cast it directly to IntWritable[]. Individual
elements are of type
Hi,
Our application runs in Tomcat 5.5, Java 6.17, with JAAS.
We give our own implementation to LoginModule and we start Tomcat with
-Djava.security.auth.login.config.
We use Hadoop 0.20.203.0.
We want to execute Hadoop jobs, or Hadoop FileSystem methods from within our
application, remotely.
Srivas,
As you may know already, NFS is just being used in the first prototype for
HA.
Two options for editlog store are:
1. Using BookKeeper. Work has already completed on trunk towards this. This
will replace need for NFS to store the editlogs and is highly available.
This solution will also
Moving to mapreduce-user@, bcc common-user@. Please use project specific lists.
Take a look at JobTracker.heartbeat - *Scheduler.assignTasks.
After the scheduler 'assigns' tasks, the JT sends the corresponding
'LaunchTaskAction' to the TaskTracker.
hth,
Arun
On Dec 13, 2011, at 12:59 AM,
Hi Prashant,
In each case I had a single tasktracker per node. I oversubscribed the total
tasks per tasktracker/node by 1.5 x # of cores.
So for the 64 core allocation comparison.
In A: 8 cores; Each machine had a single tasktracker with 8 maps / 4
reduce slots for 12 task slots total
On Sun, Dec 11, 2011 at 10:47 PM, M. C. Srivas mcsri...@gmail.com wrote:
But if you use a Netapp, then the likelihood of the Netapp crashing is
lower than the likelihood of a garbage-collection-of-death happening in the
NN.
This is pure FUD.
I've never seen a garbage collection of death ever
Hi Brad
This is a really interesting experiment. I am curious why you did not use 2
cores each machine but 32 nodes. That makes the number of CPU core in two
groups equal.
Chen
On Tue, Dec 13, 2011 at 7:15 PM, Brad Sarsfield b...@bing.com wrote:
Hi Prashant,
In each case I had a single
Suresh,
As of today, there is no option except to use NFS. And as you yourself
mention, the first HA prototype when it comes out will require NFS.
(a) I wasn't aware that Bookkeeper had progressed that far. I wonder
whether it would be able to keep up with the data rates that is required in
On Tue, Dec 13, 2011 at 10:42 PM, M. C. Srivas mcsri...@gmail.com wrote:
Any simple file meta-data test will cause the NN to spiral to death with
infinite GC. For example, try create many many files. Or even simple
stat a bunch of file continuously.
Sure. If I run dd if=/dev/zero of=foo my
On Tue, Dec 13, 2011 at 11:00PM, M. C. Srivas wrote:
Suresh,
As of today, there is no option except to use NFS. And as you yourself
mention, the first HA prototype when it comes out will require NFS.
Well, in the interest of full disclosure NFS is just one of the options and
not the only
On Tue, Dec 13, 2011 at 11:00 PM, M. C. Srivas mcsri...@gmail.com wrote:
(a) I wasn't aware that Bookkeeper had progressed that far. I wonder
whether it would be able to keep up with the data rates that is required in
order to hold the NN log without falling behind.
It's a good question - but
23 matches
Mail list logo