Re: where is example of the configuration about multi nodes on one machine?

2010-11-30 Thread Lance Norskog
Ok, so the problem is you have a machine with 16 CPUs and a computation job that uses at most 3 of them. Is this right? What is the Mahout task? Do you know that it has good multi-Hadoop performance and tuning? What matters is that the Partitioner for the Mahout code can separate the computations.

Twitter Search + big Hadoop, Dec. 8th at Seattle Scalability Meetup

2010-11-30 Thread Bradford Stephens
Greetings, The Seattle Scalability Meetup isn't slacking for the holidays. We've got an awesome lineup for Wed, December 8 at 7pm: http://www.meetup.com/Seattle-Hadoop-HBase-NoSQL-Meetup/ -Jake Mannix from Twitter will talk about the Twitter Search infrastructure (with distributed Lucene) -Chris

Help with "Hadoop common not found." error when launching bin/start-dfs.sh.

2010-11-30 Thread Greg Troyan
I am building a cluster using Michael G. Noll's instructions found here: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ I have set up two single node clusters and they work fine. When I change their configurations to behave as a single cluster (by changi

Re: HDFS Rsync process??

2010-11-30 Thread hadoopman
On 11/30/2010 03:51 AM, Steve Loughran wrote: On 30/11/10 03:59, hadoopman wrote: you don't need all the files in the cluster in sync as a lot of them are intermediate and transient files. Instead use dfscopy to copy source files to the two clusters, this runs across the machines in the clu

Re: small files and number of mappers

2010-11-30 Thread Edward Capriolo
On Tue, Nov 30, 2010 at 3:21 AM, Harsh J wrote: > Hey, > > On Tue, Nov 30, 2010 at 4:56 AM, Marc Sturlese > wrote: >> >> Hey there, >> I am doing some tests and wandering which are the best practices to deal >> with very small files which are continuously being generated(1Mb or even >> less). >

Re: hadoop and ganglia without UDP multicast

2010-11-30 Thread Eric Fiala
Mark, You might want to try changing your [dfs|mapred|jvm|rpc].servers in hadoop-metrics.properties to point to your monitoring IP address ( 192.168.1.72?) rather than localhost. If you are relaying each node from local gmond than try to use the IP address to which gmond is bound (netstat -an | gre

Re: where is example of the configuration about multi nodes on one machine?

2010-11-30 Thread Matthew Foley
Here is a "recipe" for how to run multiple datanodes on a single server, posted to this list on Sept. 15: http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201009.mbox/%3c8a898c33-dc4e-418c-adc0-5689d434b...@yahoo-inc.com%3e If you're having trouble getting multiple cores utili

Re: HDFS Rsync process??

2010-11-30 Thread Alejandro Abdelnur
The other approach, if the DR cluster is idle or has enough excess capacity, would be running all the jobs on the input data in both clusters and perform checksums on the outputs to ensure everything is consistent. And you could take advantage and distribute ad hoc queries between the 2 clusters.

Map-Reduce Applicability With All-In Memory Data

2010-11-30 Thread Narinder Kumar
Hi All, We have a problem in hand which we would like to solve using Distributed and Parallel Processing. Brief context : We have a Map (Entity, Associated value). The entity can have a parent which in turn will have its parent and so on till we reach the head. I have to traverse this tree and do

Re: where is example of the configuration about multi nodes on one machine?

2010-11-30 Thread Steve Loughran
On 30/11/10 10:32, Adarsh Sharma wrote: Is it possible to run Hadoop in VMs on Production Clusters so that we have 1s of nodes on 100s of servers to achieve high performance through Cloud Computing. you don't achieve performance that way. You are better off with 1VM per physical host, and

Re: where is example of the configuration about multi nodes on one machine?

2010-11-30 Thread Hari Sreekumar
Machines is certainly better than VMs. If you are running 4 VMs on top of one machine with 128 GB RAM, each gets 32 GB. But the cost of 4 machines with 32 gigs RAM would be less than the cost of one machine with 128 GB, so then there's no point of going to hadoop right? Plus all the VMs would compe

Re: HDFS Rsync process??

2010-11-30 Thread Steve Loughran
On 30/11/10 03:59, hadoopman wrote: We have two Hadoop clusters in two separate buildings. Both clusters are loading the same data from the same sources (the second cluster is for DR). We're looking at how we can recover the primary cluster and catch it back up again as new data will continue t

Re: where is example of the configuration about multi nodes on one machine?

2010-11-30 Thread Adarsh Sharma
Is it possible to run Hadoop in VMs on Production Clusters so that we have 1s of nodes on 100s of servers to achieve high performance through Cloud Computing. or We have to simply configure Hadoop on 1s of commodity machines. Which i

Re: Re: Re: where is example of the configuration about multi nodes on one machine?

2010-11-30 Thread Hari Sreekumar
Try tweaking the mapred-site.xml config parameters.. these 2 parameters could help.. if you haven't tried already: mapred.job.reuse.jvm.num.tasks -1 mapred.tasktracker.map.tasks.maximum 32 mapred.tasktracker.reduce.tasks.maximum 16 mapred.child.java.opts

Re: Re: Re: where is example of the configuration about multi nodes on one machine?

2010-11-30 Thread rahul patodi
last option i gave was to run hadoop in fully distributed mode but you can run hadoop in pseudo distributed mode: http://hadoop-tutorial.blogspot.com/2010/11/running-hadoop-in-pseudo-distributed.html or standalone mode: http://hadoop-tutorial.blogspot.com/2010/11/running-hadoop-in-standalone-mode.

Re:Re: Re: where is example of the configuration about multi nodes on one machine?

2010-11-30 Thread beneo_7
>If you want to just use one machine, why do you want to use hadoop? Hadoop's >power lies in distributed computing. That being said, it is possible to use >hadoop on a single machine by using the pseudo-distributed mode (Read >http://hadoop.apache.org/common/docs/current/single_node_setup.html and

Re: Re: where is example of the configuration about multi nodes on one machine?

2010-11-30 Thread Hari Sreekumar
Hi beneo, If you want to just use one machine, why do you want to use hadoop? Hadoop's power lies in distributed computing. That being said, it is possible to use hadoop on a single machine by using the pseudo-distributed mode (Read http://hadoop.apache.org/common/docs/current/single_node_setup.ht

Re:Re: where is example of the configuration about multi nodes on one machine?

2010-11-30 Thread beneo_7
i'm sorry, but, are you sure?? At 2010-11-30 15:53:58,"rahul patodi" wrote: >you can create virtual machines on your single machine: >for you have to install sun virtual box(other tools are also available like >VMware) >now you can create as many virtual machine as you want >then you can create on

Re: small files and number of mappers

2010-11-30 Thread Harsh J
Hey, On Tue, Nov 30, 2010 at 4:56 AM, Marc Sturlese wrote: > > Hey there, > I am doing some tests and wandering which are the best practices to deal > with very small files which are continuously being generated(1Mb or even > less). Have a read: http://www.cloudera.com/blog/2009/02/the-small-fil

Re: Equivalence of MultipleInputs.addInputPath(...) without a JobConf

2010-11-30 Thread Harsh J
MultipleInputs for the new API is present in Hadoop 0.21 releases. It should reside in the org.apache.hadoop.mapreduce.* package. See: https://issues.apache.org/jira/browse/MAPREDUCE-369 for the issue. On Mon, Nov 29, 2010 at 10:56 PM, Alan Said wrote: > Hi all, > I'm having difficulties figurin