Re: Measuring Shuffle time for MR job

2012-08-27 Thread Raj Vishwanathan
You can extract the shuffle time from the job log. Take a look at  https://github.com/rajvish/hadoop-summary  Raj From: Bertrand Dechoux decho...@gmail.com To: common-user@hadoop.apache.org Sent: Monday, August 27, 2012 12:57 AM Subject: Re: Measuring

Re: doubt about reduce tasks and block writes

2012-08-26 Thread Raj Vishwanathan
Message - From: Harsh J ha...@cloudera.com To: common-user@hadoop.apache.org; Raj Vishwanathan rajv...@yahoo.com Cc: Sent: Saturday, August 25, 2012 4:02 AM Subject: Re: doubt about reduce tasks and block writes Raj's almost right. In times of high load or space fillup on a local DN

Re: doubt about reduce tasks and block writes

2012-08-24 Thread Raj Vishwanathan
But since node A has no TT running, it will not run map or reduce tasks. When the reducer node writes the output file, the fist block will be written on the local node and never on node A. So, to answer the question, Node A will contain copies of blocks of all output files. It wont contain the

Re: Number of Maps running more than expected

2012-08-16 Thread Raj Vishwanathan
You probably have speculative execution on. Extra maps and reduce tasks are run in case some of them fail Raj Sent from my iPad Please excuse the typos. On Aug 16, 2012, at 11:36 AM, in.abdul in.ab...@gmail.com wrote: Hi Gaurav, Number map is not depents upon number block . It is really

Re: Multinode cluster only recognizes 1 node

2012-08-01 Thread Raj Vishwanathan
Sean  Can you paste the name node and jobtracker logs. It could be something as simple as disabling the firewall. Raj From: Barry, Sean F sean.f.ba...@intel.com To: common-user@hadoop.apache.org common-user@hadoop.apache.org Sent: Tuesday, July 31, 2012

Re: Merge Reducers Output

2012-07-31 Thread Raj Vishwanathan
Is there a requirement for the final reduce file to be sorted? If not, wouldn't a map only job ( +  a combiner, ) and a merge only job provide the answer? Raj From: Michael Segel michael_se...@hotmail.com To: common-user@hadoop.apache.org Sent: Tuesday, July

Re: Datanode error

2012-07-20 Thread Raj Vishwanathan
Could also be due to network issues. Number of sockets could be less or number of threads could be less. Raj From: Harsh J ha...@cloudera.com To: common-user@hadoop.apache.org Sent: Friday, July 20, 2012 9:06 AM Subject: Re: Datanode error Pablo, These

Re: Error: Too Many Fetch Failures

2012-06-19 Thread Raj Vishwanathan
You are probably having a very low somaxconn parameter ( default centos has it at 128 , if I remember correctly). You can check the value under /proc/sys/net/core/somaxconn Can you also check the value of ulimit -n? It could be  low. Raj From: Ellis H.

Re: Map works well, but Redue failed

2012-06-15 Thread Raj Vishwanathan
Most probably you have a network problem. Check your hostname and IP address mapping From: Yongwei Xing jdxyw2...@gmail.com To: common-user@hadoop.apache.org Sent: Thursday, June 14, 2012 10:15 AM Subject: Map works well, but Redue failed Hi all I run a

Re: Map/Reduce Tasks Fails

2012-05-22 Thread Raj Vishwanathan
From: Harsh J ha...@cloudera.com To: common-user@hadoop.apache.org Sent: Tuesday, May 22, 2012 7:13 AM Subject: Re: Map/Reduce Tasks Fails Sandeep, Is the same DN 10.0.25.149 reported across all failures? And do you notice any machine patterns when

Re: Map/Reduce Tasks Fails

2012-05-22 Thread Raj Vishwanathan
What kind of storage is attached to the data nodes ? This kind of error can happen when the CPU is really busy with I/O or interrupts. Can you run top or dstat on some of the data nodes to see how the system is performing? Raj From: Sandeep Reddy P

Re: High load on datanode startup

2012-05-10 Thread Raj Vishwanathan
and the fact that it happens only on some nodes indicates a local problem. Raj From: Darrell Taylor darrell.tay...@gmail.com To: common-user@hadoop.apache.org Cc: Raj Vishwanathan rajv...@yahoo.com Sent: Thursday, May 10, 2012 3:57 AM Subject: Re: High load

Re: High load on datanode startup

2012-05-09 Thread Raj Vishwanathan
When you say 'load', what do you mean? CPU load or something else? Raj From: Darrell Taylor darrell.tay...@gmail.com To: common-user@hadoop.apache.org Sent: Wednesday, May 9, 2012 9:52 AM Subject: High load on datanode startup Hi, I wonder if someone could

Re: High load on datanode startup

2012-05-09 Thread Raj Vishwanathan
...@gmail.com To: common-user@hadoop.apache.org; Raj Vishwanathan rajv...@yahoo.com Sent: Wednesday, May 9, 2012 2:40 PM Subject: Re: High load on datanode startup On Wed, May 9, 2012 at 10:23 PM, Raj Vishwanathan rajv...@yahoo.com wrote: When you say 'load', what do you mean? CPU load or something

Re: Reduce Hangs at 66%

2012-05-03 Thread Raj Vishwanathan
Keith What is the the output for ulimit -n? Your value for number of open files is probably too low. Raj From: Keith Thompson kthom...@binghamton.edu To: common-user@hadoop.apache.org Sent: Thursday, May 3, 2012 4:33 PM Subject: Re: Reduce Hangs at 66% I

Re: hadoop streaming and a directory containing large number of .tgz files

2012-04-24 Thread Raj Vishwanathan
Sunil You could use identity mappers, a single identity reducer and by not having output compression., Raj From: Sunil S Nandihalli sunil.nandiha...@gmail.com To: common-user@hadoop.apache.org Sent: Tuesday, April 24, 2012 7:01 AM Subject: Re: hadoop

Re: Hive Thrift help

2012-04-16 Thread Raj Vishwanathan
To a verify that a server is running on a port ( 10,000 in this case ) and to ensure that there are no firewall issues run  telnet servername 1 The connection should succeed. Raj From: Edward Capriolo edlinuxg...@gmail.com To:

Re: Is TeraGen's generated data deterministic?

2012-04-14 Thread Raj Vishwanathan
David Since the data generation and sorting is different hadoop jobs, you can generate the data once and sort the same data as many times as as you want. I don't think Teragen is deterministic.( or rather , the keys are random but the text is deterministic if I remember correctly )  Raj

Re: Map Reduce Job Help

2012-04-11 Thread Raj Vishwanathan
http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/  From: hellooperator silversl...@gmail.com To: core-u...@hadoop.apache.org Sent: Wednesday, April 11, 2012 11:15 AM Subject: Map Reduce Job Help Hello, I'm just starting

Re: opensuse 12.1

2012-04-04 Thread Raj Vishwanathan
Lots of people seem to start with this. http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/  Raj From: Barry, Sean F sean.f.ba...@intel.com To: common-user@hadoop.apache.org common-user@hadoop.apache.org Sent: Wednesday,

Re: data distribution in HDFS

2012-04-02 Thread Raj Vishwanathan
Stijn, The first block of the data , is always stored in the local node. Assuming that you had a replication factor of 3, the node that generates the data will get about 10GB of data and the other 20GB will be distributed among other nodes. Raj  From:

Re: data distribution in HDFS

2012-04-02 Thread Raj Vishwanathan
is a node from where you are coping data from If lets say you are using -copyFromLocal option Regards Serge On 4/2/12 11:53 AM, Stijn De Weirdtstijn.dewei...@ugent.be  wrote: hi raj, what is a local node? is it relative to the tasks that are started? stijn On 04/02/2012 07:28 PM, Raj

Re: How to modify hadoop-wordcount example to display File-wise results.

2012-03-29 Thread Raj Vishwanathan
Aaron  You can get the details of how much data each mapper processed, on which node ( IP address actually!) from the job logs. Raj From: Ajay Srivastava ajay.srivast...@guavus.com To: common-user@hadoop.apache.org common-user@hadoop.apache.org Cc:

Re: Separating mapper intermediate files

2012-03-27 Thread Raj Vishwanathan
Aayush You can use the following. Just play around with the pattern  property   namekeep.task.files.pattern/name   value.*_m_123456_0/value   descriptionKeep all files from tasks whose task names match the given                regular expression. Defaults to none./description   /property Raj

Re: Hadoop pain points?

2012-03-02 Thread Raj Vishwanathan
Lol! Raj From: Mike Spreitzer mspre...@us.ibm.com To: common-user@hadoop.apache.org Sent: Friday, March 2, 2012 8:31 AM Subject: Re: Hadoop pain points? Interesting question.  Do you want to be asking those who use Hadoop --- or those who find it too

Re: Adding nodes

2012-03-01 Thread Raj Vishwanathan
The master and slave files, if I remember correctly are used to start the correct daemons on the correct nodes from the master node. Raj From: Joey Echeverria j...@cloudera.com To: common-user@hadoop.apache.org common-user@hadoop.apache.org Cc:

Re: Adding nodes

2012-03-01 Thread Raj Vishwanathan
@hadoop.apache.org; Raj Vishwanathan rajv...@yahoo.com Sent: Thursday, March 1, 2012 5:42 PM Subject: Re: Adding nodes Whatever Joey said is correct for Cloudera's distribution. For same, I am not confident about other distribution as i haven't tried them. Thanks, Anil On Thu, Mar 1, 2012 at 5:10

Re: Error in Formatting NameNode

2012-02-12 Thread Raj Vishwanathan
Manish If you read the error message, it says connection refused. Big clue :-) You probably have firewall configured. Raj Sent from my iPad Please excuse the typos. On Feb 12, 2012, at 1:41 AM, Manish Maheshwari mylogi...@gmail.com wrote: Thanks, I tried with hadoop-1.0.0 and JRE6 and

Re: Combining MultithreadedMapper threadpool size map.tasks.maximum

2012-02-10 Thread Raj Vishwanathan
Here is what I understand  The RecordReader for the MTMappert takes the input split and cycles the records among the available threads. It also ensures that the map outputs are synchronized.  So what Bejoy says is what will happen for the wordcount program.  Raj

Re: reference document which properties are set in which configuration file

2012-02-10 Thread Raj Vishwanathan
Harsh, All This was one of the first questions that  I asked. It is sometimes not clear whether some parameters are site related  or jab related or whether it belongs to NN, JT , DN or TT. If I get some time during the weekend , I will try and put this into a document and see if it helps Raj

Re: Can I write to an compressed file which is located in hdfs?

2012-02-07 Thread Raj Vishwanathan
Hi Here is a piece of code that does the reverse of what you want; it takes a bunch of compressed files ( gzip, in this case ) and converts them to text. You can tweak the code to do the reverse http://pastebin.com/mBHVHtrm  Raj From: Xiaobin She

Re: Problem in reduce phase(critical)

2012-02-03 Thread Raj Vishwanathan
It says in the logs org.apache.hadoop.mapred.ReduceTask: java.net.ConnectException: Connection refused  You have a network problem. Do you have firewall enabled. Raj and please do not spam the list with the same message.  From: hadoop hive

Re: SimpleKMeansCLustering - Failed to set permissions of path to 0700

2011-10-17 Thread Raj Vishwanathan
Can you run any map/reduce jobs suchas word count? Raj Sent from my iPad Please excuse the typos. On Oct 17, 2011, at 5:18 PM, robpd robpodol...@yahoo.co.uk wrote: Hi I am new to Mahout and Hadoop. I'm currently trying to get the SimpleKMeansClustering example from the Maout in Action

Re: performance normal?

2011-10-08 Thread Raj Vishwanathan
Really horriblr performance Sent from my iPad Please excuse the typos. On Oct 8, 2011, at 12:12 AM, tom uno ltom...@gmail.com wrote: release 0.21.0 Production System 20 nodes read 100 megabits per second write 10 megabits per second performance normal?

Re: Maintaining map reduce job logs - The best practices

2011-09-23 Thread Raj Vishwanathan
Bejoy You can find the job specific logs in two places. The first one is in the hdfs ouput directory. The second place is under $HADOOP_HOME/logs/history ($HADOOP_HOME/logs/history/done) Both these paces have the config file and the job logs for each submited job. Sent from my iPad Please