Re: why FairScheduler prefer to schedule MR jobs into the same node?

2013-08-09 Thread devdoer bird
Thanks. Let me try this configuration. I use the default setting. Is it set to true in default settings? 2013/8/10 Karthik Kambatla > It is possible that you have assignMultiple set to true in your fair > scheduler configuration - that leads to assigning as many tasks on a single > node heart

Re: Jobtracker page hangs ..again.

2013-08-09 Thread Patai Sangbutsarakum
Appreciate your input Bryant, i will try to reproduce and see the namenode log before, while, and after it pause. Wish me luck On Fri, Aug 9, 2013 at 2:09 PM, Bryan Beaudreault wrote: > When I've had problems with a slow jobtracker, i've found the issue to be > one of the following two (so far)

Re: why FairScheduler prefer to schedule MR jobs into the same node?

2013-08-09 Thread Karthik Kambatla
It is possible that you have assignMultiple set to true in your fair scheduler configuration - that leads to assigning as many tasks on a single node heartbeat as the node can accommodate. Setting it to false, would assign a single task on each heartbeat and can help in spreading out the tasks. O

Re: Jobtracker page hangs ..again.

2013-08-09 Thread Bryan Beaudreault
When I've had problems with a slow jobtracker, i've found the issue to be one of the following two (so far) possibilities: - long GC pause (I'm guessing this is not it based on your email) - hdfs is slow I haven't dived into the code yet, but circumstantially I've found that when you submit a job

Jobtracker page hangs ..again.

2013-08-09 Thread Patai Sangbutsarakum
A while back, i was fighting with the jobtracker page hangs when i browse to http://jobtracker:50030 browser doesn't show jobs info as usual which ends up because of allowing too much job history kept in jobtracker. Currently, i am setting up a new cluster 40g heap on the namenode and jobtracker i

Re: Not able to understand writing custom writable

2013-08-09 Thread Shahab Yunus
The overarching responsibility of a record reader is to return one record which in case of conventional, traditionally means one line. But as we see in this case, it cannot be always true. An xml file can have physically multiple lines but functionally they map to one record or one line. For this,

Re: Hadoop-HBase table hierarchical column scan

2013-08-09 Thread Narlin M
Correction: Its scan.addColumn(family, qualifier) and not scan.addFamily(family, qualifier) that I actually used. Thanks, Narlin M. On Fri, Aug 9, 2013 at 2:08 PM, Narlin M wrote: > I am fairly new to the hadoop-hbase environment having started working on > it very recently, so I hope I am wor

Hadoop-HBase table hierarchical column scan

2013-08-09 Thread Narlin M
I am fairly new to the hadoop-hbase environment having started working on it very recently, so I hope I am wording the question correctly. I am trying to read data from a hadoop-hbase table which has only one column family named 'DFLT'. This family contains hierarchical column qualifiers "/source:

getting errors on datanode/tasktracker logs

2013-08-09 Thread Jitendra Yadav
Hi, I'm getting below errors in log file while starting datanode and tasktracker. I'm using Hadoop 1.1.2 and java 1.7.0_21. mmap failed for CEN and END part of zip file mmap failed for CEN and END part of zip file mmap failed for CEN and END part of zip file mmap failed for CEN and END part of zi

Re: Hosting Hadoop

2013-08-09 Thread Nitin Pawar
check altiscale as well On Fri, Aug 9, 2013 at 3:05 AM, Dhaval Shah wrote: > Thanks for the list Marcos. I will go through the slides/links. I think > that's helpful > > Regards, > Dhaval > > -- > *From:* Marcos Luis Ortiz Valmaseda > *To:* Dhaval Shah > *Cc:* us

Not able to understand writing custom writable

2013-08-09 Thread jamal sasha
Hi, I am trying to understand, how to write my own writable. So basically trying to understand how to process records spanning multiple lines. Can some one break down to me, that what are the things needed to be considered in each method?? I am trying to understand this example: https://github

missing data when reading compressed files with my custom LineReader

2013-08-09 Thread Jesse Jaggars
I've built a custom LineReader-like class to back a new InputFormat. I wrote the new LineReader class to handle escaped whitespace that the util.LineReader doesn't handle. The data I'm reading might have lines that look something like this: foo\tbar\tbaz\\tmore\\ndata with escaped stuff\n I'd lik

Re: Hadoop upgrade

2013-08-09 Thread Robert Dyer
Actually, 1.2.1 is out (and marked stable). I see no reason not to upgrade. http://hadoop.apache.org/docs/r1.2.1/releasenotes.html As far as performance goes, when I upgraded our cluster from 1.0.4 to 1.1.2, our small jobs (that took about 1 min each) were taking about 20-30s less time. So ther

Re: Hadoop upgrade

2013-08-09 Thread Marcos Luis Ortiz Valmaseda
Regards, Viswanathan J. Like Hars said, the release notes described every bug fix, minor or major improvements with the link to the related JIRA for each one. Just a simple question? Why not upgrade directly to 1.2.0? There are lot of good improvements and bug fixes here too. See the Release note

Re: Hadoop upgrade

2013-08-09 Thread Harsh J
The link Jitendra provided lists all the changes exhaustively. What are you exactly looking for beyond that? For Performance related changes, they are probably noted, so just search the same page for Performance. On Fri, Aug 9, 2013 at 7:41 PM, Viswanathan J wrote: > I have seen these release not

Re: Hadoop upgrade

2013-08-09 Thread Viswanathan J
I have seen these release notes already, any other comment on this upgrade regarding MR Job processing and any performance improvement. On Aug 9, 2013 6:27 PM, "Jitendra Yadav" wrote: > Please refer Hadoop 1.1.2 release notes. > > http://hadoop.apache.org/docs/r1.1.2/releasenotes.html > > On Fri,

Re: MutableCounterLong metrics display in ganglia

2013-08-09 Thread Harsh J
The counter, being num-ops, should up-count and not reset. Note that your test may be at fault though - calling hsync may not always call NN#fsync(…) unless you are passing the proper flags to make it always do so. On Wed, Aug 7, 2013 at 4:27 PM, lei liu wrote: > I use hadoop-2.0.5 and config had

Re: Discrepancy in the values of consumed disk space by hadoop

2013-08-09 Thread Harsh J
There isn't a "discrepancy", but read on: DFS Used counts disk spaces across DNs. FSCK counts file lengths on HDFS. The former includes replicated data sizes, plus block checksum metadata consumed space. The latter does not. A small (but probably significant) percentage of your files are using rep

Re: Hadoop upgrade

2013-08-09 Thread Jitendra Yadav
Please refer Hadoop 1.1.2 release notes. http://hadoop.apache.org/docs/r1.1.2/releasenotes.html On Fri, Aug 9, 2013 at 5:41 PM, Viswanathan J wrote: > Hi, > > Planning to upgrade hadoop from 1.0.3 to 1.1.2, what are the key features > or advantages. >

Hadoop upgrade

2013-08-09 Thread Viswanathan J
Hi, Planning to upgrade hadoop from 1.0.3 to 1.1.2, what are the key features or advantages.

Re: why FairScheduler prefer to schedule MR jobs into the same node?

2013-08-09 Thread devdoer bird
The hadoop version is 1.0.3 2013/8/9 Sandy Ryza > Hi devdoer, > > What version are you using? > > -Sandy > > > On Thu, Aug 8, 2013 at 4:25 AM, devdoer bird wrote: > >> HI: >> >> I configure the FairScheduler with default settings and my job has 19 >> reduce tasks. I found that all the reduce

Re: Mapreduce for beginner

2013-08-09 Thread Olivier Austina
Hi Shahab, Thanks for the book and links. Regards Olivier 2013/8/9 Shahab Yunus > Given that your questions are very broad and at high level, I would > suggest that you should pick up a book or such to go through that. The > Hadoop: Definitive Guide by Tom White is a great book to start wit

Re: MutableCounterLong and MutableCounterLong class difference in metrics v2

2013-08-09 Thread Jun Ping Du
Hi Lei, MutableCounterLong is a type of counter which can be increased only (count number is often large comparing with MutableCounterInt). It is used a lot in Hadoop metrics system, i.e. DatanodeMetrics. You can find more details on metrics v2 in Hadoop wiki link ( http://wiki.apache.org/hado

Discrepancy in the values of consumed disk space by hadoop

2013-08-09 Thread Yogini Gulkotwar
Hi All, I have a CDH4 hadoop cluster setup with 3 datanodes and a data replication factor of 2. When I try to check the consumed dfs space, I get different values using the "hdfs dfsadmin -report" and "hdfs fsck" command. Could anyone please help me understand the reason behind the discrepancy in