You can extract the shuffle time from the job log.
Take a look at
https://github.com/rajvish/hadoop-summary
Raj
From: Bertrand Dechoux decho...@gmail.com
To: common-user@hadoop.apache.org
Sent: Monday, August 27, 2012 12:57 AM
Subject: Re: Measuring
Message -
From: Harsh J ha...@cloudera.com
To: common-user@hadoop.apache.org; Raj Vishwanathan rajv...@yahoo.com
Cc:
Sent: Saturday, August 25, 2012 4:02 AM
Subject: Re: doubt about reduce tasks and block writes
Raj's almost right. In times of high load or space fillup on a local
DN
But since node A has no TT running, it will not run map or reduce tasks. When
the reducer node writes the output file, the fist block will be written on the
local node and never on node A.
So, to answer the question, Node A will contain copies of blocks of all output
files. It wont contain the
You probably have speculative execution on. Extra maps and reduce tasks are run
in case some of them fail
Raj
Sent from my iPad
Please excuse the typos.
On Aug 16, 2012, at 11:36 AM, in.abdul in.ab...@gmail.com wrote:
Hi Gaurav,
Number map is not depents upon number block . It is really
Sean
Can you paste the name node and jobtracker logs.
It could be something as simple as disabling the firewall.
Raj
From: Barry, Sean F sean.f.ba...@intel.com
To: common-user@hadoop.apache.org common-user@hadoop.apache.org
Sent: Tuesday, July 31, 2012
Is there a requirement for the final reduce file to be sorted? If not, wouldn't
a map only job ( + a combiner, ) and a merge only job provide the answer?
Raj
From: Michael Segel michael_se...@hotmail.com
To: common-user@hadoop.apache.org
Sent: Tuesday, July
Could also be due to network issues. Number of sockets could be less or number
of threads could be less.
Raj
From: Harsh J ha...@cloudera.com
To: common-user@hadoop.apache.org
Sent: Friday, July 20, 2012 9:06 AM
Subject: Re: Datanode error
Pablo,
These
You are probably having a very low somaxconn parameter ( default centos has it
at 128 , if I remember correctly). You can check the value under
/proc/sys/net/core/somaxconn
Can you also check the value of ulimit -n? It could be low.
Raj
From: Ellis H.
Most probably you have a network problem. Check your hostname and IP address
mapping
From: Yongwei Xing jdxyw2...@gmail.com
To: common-user@hadoop.apache.org
Sent: Thursday, June 14, 2012 10:15 AM
Subject: Map works well, but Redue failed
Hi all
I run a
From: Harsh J ha...@cloudera.com
To: common-user@hadoop.apache.org
Sent: Tuesday, May 22, 2012 7:13 AM
Subject: Re: Map/Reduce Tasks Fails
Sandeep,
Is the same DN 10.0.25.149 reported across all failures? And do you
notice any machine patterns when
What kind of storage is attached to the data nodes ? This kind of error can
happen when the CPU is really busy with I/O or interrupts.
Can you run top or dstat on some of the data nodes to see how the system is
performing?
Raj
From: Sandeep Reddy P
and the fact that it happens only on some
nodes indicates a local problem.
Raj
From: Darrell Taylor darrell.tay...@gmail.com
To: common-user@hadoop.apache.org
Cc: Raj Vishwanathan rajv...@yahoo.com
Sent: Thursday, May 10, 2012 3:57 AM
Subject: Re: High load
When you say 'load', what do you mean? CPU load or something else?
Raj
From: Darrell Taylor darrell.tay...@gmail.com
To: common-user@hadoop.apache.org
Sent: Wednesday, May 9, 2012 9:52 AM
Subject: High load on datanode startup
Hi,
I wonder if someone could
...@gmail.com
To: common-user@hadoop.apache.org; Raj Vishwanathan rajv...@yahoo.com
Sent: Wednesday, May 9, 2012 2:40 PM
Subject: Re: High load on datanode startup
On Wed, May 9, 2012 at 10:23 PM, Raj Vishwanathan rajv...@yahoo.com wrote:
When you say 'load', what do you mean? CPU load or something
Keith
What is the the output for ulimit -n? Your value for number of open files is
probably too low.
Raj
From: Keith Thompson kthom...@binghamton.edu
To: common-user@hadoop.apache.org
Sent: Thursday, May 3, 2012 4:33 PM
Subject: Re: Reduce Hangs at 66%
I
Sunil
You could use identity mappers, a single identity reducer and by not having
output compression.,
Raj
From: Sunil S Nandihalli sunil.nandiha...@gmail.com
To: common-user@hadoop.apache.org
Sent: Tuesday, April 24, 2012 7:01 AM
Subject: Re: hadoop
To a verify that a server is running on a port ( 10,000 in this case ) and to
ensure that there are no firewall issues
run
telnet servername 1
The connection should succeed.
Raj
From: Edward Capriolo edlinuxg...@gmail.com
To:
David
Since the data generation and sorting is different hadoop jobs, you can
generate the data once and sort the same data as many times as as you want.
I don't think Teragen is deterministic.( or rather , the keys are random but
the text is deterministic if I remember correctly )
Raj
http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
From: hellooperator silversl...@gmail.com
To: core-u...@hadoop.apache.org
Sent: Wednesday, April 11, 2012 11:15 AM
Subject: Map Reduce Job Help
Hello,
I'm just starting
Lots of people seem to start with this.
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
Raj
From: Barry, Sean F sean.f.ba...@intel.com
To: common-user@hadoop.apache.org common-user@hadoop.apache.org
Sent: Wednesday,
Stijn,
The first block of the data , is always stored in the local node. Assuming that
you had a replication factor of 3, the node that generates the data will get
about 10GB of data and the other 20GB will be distributed among other nodes.
Raj
From:
is a node from where you are coping data from
If lets say you are using -copyFromLocal option
Regards
Serge
On 4/2/12 11:53 AM, Stijn De Weirdtstijn.dewei...@ugent.be wrote:
hi raj,
what is a local node? is it relative to the tasks that are started?
stijn
On 04/02/2012 07:28 PM, Raj
Aaron
You can get the details of how much data each mapper processed, on which node (
IP address actually!) from the job logs.
Raj
From: Ajay Srivastava ajay.srivast...@guavus.com
To: common-user@hadoop.apache.org common-user@hadoop.apache.org
Cc:
Aayush
You can use the following. Just play around with the pattern
property
namekeep.task.files.pattern/name
value.*_m_123456_0/value
descriptionKeep all files from tasks whose task names match the given
regular expression. Defaults to none./description
/property
Raj
Lol!
Raj
From: Mike Spreitzer mspre...@us.ibm.com
To: common-user@hadoop.apache.org
Sent: Friday, March 2, 2012 8:31 AM
Subject: Re: Hadoop pain points?
Interesting question. Do you want to be asking those who use Hadoop ---
or those who find it too
The master and slave files, if I remember correctly are used to start the
correct daemons on the correct nodes from the master node.
Raj
From: Joey Echeverria j...@cloudera.com
To: common-user@hadoop.apache.org common-user@hadoop.apache.org
Cc:
@hadoop.apache.org; Raj Vishwanathan rajv...@yahoo.com
Sent: Thursday, March 1, 2012 5:42 PM
Subject: Re: Adding nodes
Whatever Joey said is correct for Cloudera's distribution. For same, I am
not confident about other distribution as i haven't tried them.
Thanks,
Anil
On Thu, Mar 1, 2012 at 5:10
Manish
If you read the error message, it says connection refused. Big clue :-)
You probably have firewall configured.
Raj
Sent from my iPad
Please excuse the typos.
On Feb 12, 2012, at 1:41 AM, Manish Maheshwari mylogi...@gmail.com wrote:
Thanks,
I tried with hadoop-1.0.0 and JRE6 and
Here is what I understand
The RecordReader for the MTMappert takes the input split and cycles the records
among the available threads. It also ensures that the map outputs are
synchronized.
So what Bejoy says is what will happen for the wordcount program.
Raj
Harsh, All
This was one of the first questions that I asked. It is sometimes not clear
whether some parameters are site related or jab related or whether it belongs
to NN, JT , DN or TT.
If I get some time during the weekend , I will try and put this into a document
and see if it helps
Raj
Hi
Here is a piece of code that does the reverse of what you want; it takes a
bunch of compressed files ( gzip, in this case ) and converts them to text.
You can tweak the code to do the reverse
http://pastebin.com/mBHVHtrm
Raj
From: Xiaobin She
It says in the logs
org.apache.hadoop.mapred.ReduceTask: java.net.ConnectException: Connection
refused
You have a network problem. Do you have firewall enabled.
Raj
and please do not spam the list with the same message.
From: hadoop hive
Can you run any map/reduce jobs suchas word count?
Raj
Sent from my iPad
Please excuse the typos.
On Oct 17, 2011, at 5:18 PM, robpd robpodol...@yahoo.co.uk wrote:
Hi
I am new to Mahout and Hadoop. I'm currently trying to get the
SimpleKMeansClustering example from the Maout in Action
Really horriblr performance
Sent from my iPad
Please excuse the typos.
On Oct 8, 2011, at 12:12 AM, tom uno ltom...@gmail.com wrote:
release 0.21.0
Production System
20 nodes
read 100 megabits per second
write 10 megabits per second
performance normal?
Bejoy
You can find the job specific logs in two places. The first one is in the hdfs
ouput directory. The second place is under $HADOOP_HOME/logs/history
($HADOOP_HOME/logs/history/done)
Both these paces have the config file and the job logs for each submited job.
Sent from my iPad
Please
35 matches
Mail list logo