Re: Hadoop graphing tools

2013-10-15 Thread Xuri Nagarin
use of the recipient(s) named above. If you are not that person, you are not authorized to review, use, copy, forward, distribute or otherwise disclose the information contained in the message. From: Xuri Nagarin secs...@gmail.com Reply-To: user@hadoop.apache.org user@hadoop.apache.org Date

Re: Improving MR job disk IO

2013-10-14 Thread Xuri Nagarin
. Data Solutions Software Engineer 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212 Google Voice: 412-256-8556 | www.rdx.com On Thu, Oct 10, 2013 at 4:50 PM, Xuri Nagarin secs...@gmail.com wrote: On Thu, Oct 10, 2013 at 1:27 PM, Pradeep Gollakota pradeep...@gmail.comwrote: I don't

Hadoop graphing tools

2013-10-14 Thread Xuri Nagarin
Hi, I am looking for some simple graphing tools to use with Hadoop (bar or line chart). Most google searches for hadoop graphing turns up results for much more complex graph analysis tool like Giraph. Any simple rrdtool like solutions for Hadoop? TIA, Xuri

Re: Hadoop graphing tools

2013-10-14 Thread Xuri Nagarin
...@gmail.com wrote: You mean a performance monitoring tool? I have not used any, but you should search for that, not graph. On 10/14/2013 08:03 PM, Xuri Nagarin wrote: Hi, I am looking for some simple graphing tools to use with Hadoop (bar or line chart). Most google searches for hadoop graphing

Re: Improving MR job disk IO

2013-10-14 Thread Xuri Nagarin
of a job. Hadoop does this reliably by monitoring all instances, restarting failed ones, etc. 3) You have way too much data to fit on one computer. Same as #2. You might not need Hadoop if you can run your programs without it. Lance On 10/14/2013 08:02 PM, Xuri Nagarin wrote: Yes, I tested

Improving MR job disk IO

2013-10-10 Thread Xuri Nagarin
Hi, I have a simple Grep job (from bundled examples) that I am running on a 11-node cluster. Each node is 2x8-core Intel Xeons (shows 32 CPUs with HT on), 64GB RAM and 8 x 1TB disks. I have mappers set to 20 per node. When I run the Grep job, I notice that CPU gets pegged to 100% on multiple

Re: Improving MR job disk IO

2013-10-10 Thread Xuri Nagarin
bound, you expect to see low CPU usage. On Thu, Oct 10, 2013 at 11:05 AM, Xuri Nagarin secs...@gmail.com wrote: Hi, I have a simple Grep job (from bundled examples) that I am running on a 11-node cluster. Each node is 2x8-core Intel Xeons (shows 32 CPUs with HT on), 64GB RAM and 8 x 1TB disks

Re: Improving MR job disk IO

2013-10-10 Thread Xuri Nagarin
for your responses. On Thu, Oct 10, 2013 at 12:29 PM, Xuri Nagarin secs...@gmail.com wrote: Thanks Pradeep. Does it mean this job is a bad candidate for MR? Interestingly, running the cmdline '/bin/grep' under a streaming job provides (1) Much better disk throughput and, (2) CPU load

Modifying Grep to read Sequence/Snappy files

2013-10-08 Thread Xuri Nagarin
Hi, I am trying to get the Grep example bundled with CDH to read Sequence/Snappy files. By default, the program throws errors trying to read Sequence/Snappy files: java.io.EOFException: Unexpected end of block in input stream at

Re: Cloudera Vs Hortonworks Vs MapR

2013-09-16 Thread Xuri Nagarin
...@apache.org wrote: Errr, what's wrong with discussing these types of issues on list? Nothing public here, and as long as it's kept to facts, this should not be a problem and Apache is a fine place to have such discussions. My 2c. -Original Message- From: Xuri Nagarin

Re: Cloudera Vs Hortonworks Vs MapR

2013-09-13 Thread Xuri Nagarin
kept to facts, this should not be a problem and Apache is a fine place to have such discussions. My 2c. -Original Message- From: Xuri Nagarin secs...@gmail.com Reply-To: user@hadoop.apache.org user@hadoop.apache.org Date: Thursday, September 12, 2013 4:39 PM To: user

Re: Cloudera Vs Hortonworks Vs MapR

2013-09-12 Thread Xuri Nagarin
I understand it can be contentious issue especially given that a lot of contributors to this list work for one or the other vendor or have some stake in any kind of evaluation. But, I see no reason why users should not be able to compare notes and share experiences. Over time, genuine pain points

TB per core sweet spot

2013-08-29 Thread Xuri Nagarin
Hi, I realize there is no perfect spec for data nodes as lot depends on use cases and work loads but I am curious if there are any rules of thumb or no-go zones in terms of how many terabytes per core is ok? So a few questions assuming 1 core per hdd holds: Is there a no-go zone in terms of

Re: Hadoop Clients (Hive,Pig) and Hadoop Cluster

2013-08-29 Thread Xuri Nagarin
Yes, ideally you want to setup a 4th gateway node to run clients. http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Security-Guide/AppxG-Setting-Up-Gateway.html On Thu, Aug 29, 2013 at 3:11 PM, Raj Hadoop hadoop...@yahoo.com wrote: Hi, I am trying to setup a