Using hadoop for Matrix Multiplication in NFS?

2009-11-12 Thread Gimick
Hi, I am new to hadoop. I am planning to do matrix multiplication(of order millions) using hadoop. I have a few queries regarding the above. i) Will using hadoop be a fix for this or should I try some other approaches? ii) I will be using it in NFS. Will using hadoop still be a good option?

hdfs disk space

2009-11-12 Thread Y G
Hi all when i use command "hadoop dfsadmin -report", there are some terms that i don't very understand, - what "DFS Used" mean? - what "Non DFS Used" mean? - when i delete all the subcontent of hdfs root, the "DFS Used" still is "1.96MB" as well as "Non DFS Used" still takes almost x

[Job Posting] Core Analytics Engineer (End-User Experience)

2009-11-12 Thread brien colwell
Hi all, We do large data analysis related to user experience and we're looking for a great engineer! If you want to apply your skills to new types of analysis on a scalable infrastructure, please read the full job description below. If you are interested please contact employm...@knoa.com wit

Re: About Hadoop pseudo distribution

2009-11-12 Thread Raymond Jennings III
If I understand you correctly you can run "jps" and see the java jvm's running on each machine - that should tell you if you are running in pseudo mode or not. --- On Thu, 11/12/09, kvorion wrote: > From: kvorion > Subject: About Hadoop pseudo distribution > To: core-u...@hadoop.apache.org > D

Re: Hadoop Node Monitoring

2009-11-12 Thread John Martyniak
Allen, Thanks for the insight. That is actually one of the things that I liked about the Ganglia, was that sense of drill down, you could see the active statistics at the highest level, and then drill down to a specific node, for the rest. Just as an aside, we use Nagios for our network

Re: Hadoop Node Monitoring

2009-11-12 Thread Allen Wittenauer
On 11/11/09 9:46 PM, "John Martyniak" wrote: > Is there a good solution for Hadoop node monitoring? I know that > Cacti and Ganglia are probably the two big ones, but are they the best > ones to use? Easiest to setup? Most thorough reporting, etc. > > I started to play with Ganglia, and the ins

Re: Building Hadoop from Source ?

2009-11-12 Thread Stephen Watt
Hi Sid Check out the "Building" section in this link - http://wiki.apache.org/hadoop/HowToRelease . Its pretty straight forward. If you choose to not remove the test targets expect the build to take upwards of 2 hours as it runs through all the unit tests. Kind regards Steve Watt From: Sidd

Building Hadoop from Source ?

2009-11-12 Thread Siddu
Hi all, I want to build hadoop from source rather than downloading the already built tar ball. Can someone please give me the steps or link to any pointers please Thanks in advance -- Regards, ~Sid~ I have never met a man so ignorant that i couldn't learn something from him

Help !! Hadoop installation to One machine has 24 CPU 16 disk (Each one 2 TB)

2009-11-12 Thread dgoker
Hi I installed the hadoop to one server which has following configurations 24 CPU, 72 GB RAM 17 Disk (2 TB) All configuration belongs to Hadoop and Pig are is default settings. ın order to run process efficiently waht should be the following configuration settings. The settings i find on

Re: About Hadoop pseudo distribution

2009-11-12 Thread Steve Loughran
kvorion wrote: Hi All, I have been trying to set up a hadoop cluster on a number of machines, a few of which are multicore machines. I have been wondering whether the hadoop pseudo distribution is something that can help me take advantage of the multiple cores on my machines. All the tutorials s

About Hadoop pseudo distribution

2009-11-12 Thread kvorion
Hi All, I have been trying to set up a hadoop cluster on a number of machines, a few of which are multicore machines. I have been wondering whether the hadoop pseudo distribution is something that can help me take advantage of the multiple cores on my machines. All the tutorials say that the pseu

Hadoop On Demand with different Resource Managers

2009-11-12 Thread Antonio D'Ettole
Hello everyone. I have a question on Hadoop on Demand. I know it works on clusters using the Torque resource manager (I've installed Torque on a toy cluster and run HOD successfully). I now might have the chance to work on a cluster which uses the SLURM resource manager and I'd like to run HOD on i

Re: Hadoop Node Monitoring

2009-11-12 Thread John Martyniak
Kevin, What did you think of Cloudera Desktop? Where you able to get it running with a vanilla hadoop install? -John On Nov 12, 2009, at 9:40 AM, Kevin Sweeney wrote: We're about in the same boat as you. We use Nagios and have Cacti for other things so I'll probably use it for hadoop as

Re: Hadoop Node Monitoring

2009-11-12 Thread John Martyniak
thanks for the info. So you are saying to install both cacti and ganglia, which is what I was kind of thinking to see which one I like the best, and which one gives the best info. The only thing is that the ganglia install is not straightforward. Do you have any recommendations for insta

Re: Hadoop Node Monitoring

2009-11-12 Thread Kevin Sweeney
We're about in the same boat as you. We use Nagios and have Cacti for other things so I'll probably use it for hadoop as well. Ganglia seems interesting but not too simple to setup. We also tried Cloudera Desktop which gives you a nice interface to see what's happening but it requires using Clouder

Re: Hadoop Node Monitoring

2009-11-12 Thread Edward Capriolo
Definatly check out my presentation above on cloudera's site link is above. Hadoop specific counters are available. Each component namenode, datanode, etc has counter objects associated with it. Hadoop allows you to push statistics at ganglia so this is one nice option. More of less once you get

Re: Hadoop Node Monitoring

2009-11-12 Thread John Martyniak
I do already use Nagios, and have been monitoring the availability etc, of the network. But I was hoping to get more insight into the load/workings of the hadoop network and Ganglia seemed like a good start. Do you use either Ganglia or Cacti, or something else? -John On Nov 12, 2009, at