Can I startup 2 datanodes on 1 machine?

2008-10-07 Thread Zhou, Yunqing
Here I have an existing hadoop 0.17.1 cluster. Now I'd like to add a second disk on every machine. So can I startup multi datanodes on 1 machine? Or do I have to setup each machine with soft RAID configured ? (no RAID support on mainboards) Thanks

Re: Can I startup 2 datanodes on 1 machine?

2008-10-07 Thread Zhou, Yunqing
Thanks, I will try it then. On Tue, Oct 7, 2008 at 4:40 PM, Miles Osborne [EMAIL PROTECTED] wrote: you can specify multiple data directories in your conf file dfs.data.dir Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks. If

Gets sum of all integers between map tasks

2008-10-07 Thread Edward J. Yoon
I would like to get the spam probability P(word|category) of the words from an files of category (bad/good e-mails) as describe below. BTW, To computes it on reduce, I need a sum of spamTotal between map tasks. How can i get it? Map: /** * Counts word frequency */ public void

Connect to a virtual cluster

2008-10-07 Thread Adrian Fdz.
Hi! First of all, sorry for my English. I've been working with Hadoop the last weeks and I wonder if there is any virtual cluster where I can connect to, with a single machine, in order to submit jobs. Thanks

Re: Gets sum of all integers between map tasks

2008-10-07 Thread Miles Osborne
this is a well known problem. basically, you want to aggregate values computed at some previous step. --emit category,probability pairs and have the reducer simply sum-up the probabilities for a given category (it is the same task as summing-up the word counts) Miles 2008/10/7 Edward J. Yoon

graphics in hadoop

2008-10-07 Thread chandra
hi does hadoop support graphics packages for displaying some images..? -- Best Regards S.Chandravadana This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended

Re: Can I startup 2 datanodes on 1 machine?

2008-10-07 Thread Miles Osborne
you can specify multiple data directories in your conf file dfs.data.dir Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories,

Re: graphics in hadoop

2008-10-07 Thread Lukáš Vlček
Hi, Hadoop is a platform for distributed computing. Typically it runs on a cluster of dedicated servers (though expensive HW is not required), as far as I know it is not mean to be a platform for applications running on client. Hadoop is very general and not limitted by nature of the data, this

Re: graphics in hadoop

2008-10-07 Thread chandravadana
and is there any method for creating an image file in hadoop..? chandravadana wrote: hi does hadoop support graphics packages for displaying some images..? -- Best Regards S.Chandravadana This e-mail and any files transmitted with it are for the sole use of the intended

Re: Gets sum of all integers between map tasks

2008-10-07 Thread Edward J. Yoon
Oh-ha, that's simple. :) /Edward J. Yoon On Tue, Oct 7, 2008 at 7:14 PM, Miles Osborne [EMAIL PROTECTED] wrote: this is a well known problem. basically, you want to aggregate values computed at some previous step. --emit category,probability pairs and have the reducer simply sum-up the

Re: Weird problem running wordcount example from within Eclipse

2008-10-07 Thread Ski Gh3
I figure out the input directory part: I just need to add the $HADOOP_HOME/conf directory into the classpath in eclipse. However, now I ran into a new problem: now the program complains that it cannot find the class files for my mapper and reducer! The error message is as follows: 08/10/07

Re: architecture diagram

2008-10-07 Thread Alex Loddengaard
Thanks for the clarification, Samuel. I wasn't aware that parts of a line might be emitted depending on the split, while using TextInputFormat. Terrence, this means that you'll have to take the approach of collecting key = column_count, value = column_contents in your map step. Alex On Mon, Oct

Re: graphics in hadoop

2008-10-07 Thread Alex Loddengaard
Hadoop runs Java code, so you can do anything that Java could do. This means that you can create and/or analyze images. However, as Lukas has said, Hadoop runs on a cluster of computers and is used for data storage and processing. If you need to display images, then you'd have to take these

Re: Connect to a virtual cluster

2008-10-07 Thread Alex Loddengaard
Amazon EC2 and S3 is probably the easiest way for someone without a cluster to get jobs running. Take a look: EC2: http://aws.amazon.com/ec2/ http://wiki.apache.org/hadoop/AmazonEC2 S3: http://aws.amazon.com/s3/ http://wiki.apache.org/hadoop/AmazonS3 Hope this helps. Alex On Tue, Oct 7, 2008

NoSuchMethodException when running Map Task

2008-10-07 Thread Dan Benjamin
I've got a simple hadoop job running on an EC2 cluster using the scripts under src/contrib/ec2. The map tasks all fail with the following error: 08/10/07 15:11:00 INFO mapred.JobClient: Task Id : attempt_200810071501_0001_m_31_0, Status : FAILED java.lang.RuntimeException:

Re: NoSuchMethodException when running Map Task

2008-10-07 Thread Dan Benjamin
Sorry, I should have mentioned I'm using hadoop version 0.18.1 and java 1.6. Dan Benjamin wrote: I've got a simple hadoop job running on an EC2 cluster using the scripts under src/contrib/ec2. The map tasks all fail with the following error: 08/10/07 15:11:00 INFO mapred.JobClient: Task

dual core configuration

2008-10-07 Thread Elia Mazzawi
hello, I have some dual core nodes, and I've noticed hadoop is only running 1 instance, and so is only using 1 on the CPU's on each node. is there a configuration to tell it to run more than once? or do i need to turn each machine into 2 nodes? Thanks.

Questions regarding Hive metadata schema

2008-10-07 Thread Alan Gates
Hi, I've been looking over the db schema that hive uses to store it's metadata (package.jdo) and I had some questions: 1. What do the field names in the TYPES table mean? TYPE1, TYPE2, and TYPE_FIELDS are all unclear to me. 2. In the TBLS (tables) table, what is sd? 3. What does the

Re: Questions regarding Hive metadata schema

2008-10-07 Thread Prasad Chakka
Hi Alan, The objects are very closely associated with the Thrift API objects defined in src/contrib/hive/metastore/if/hive_metastore.thrift . It contains descriptions as to what each field is and it should most of your questions. ORM for this is at s/c/h/metastore/src/java/model/package.jdo. 2)

Re: nagios to monitor hadoop datanodes!

2008-10-07 Thread Stefan Groschupf
try jmx. There should be also jmx to snmp available somewhere. http://blogs.sun.com/jmxetc/entry/jmx_vs_snmp ~~~ 101tec Inc., Menlo Park, California web: http://www.101tec.com blog: http://www.find23.net On Oct 6, 2008, at 10:05 AM, Gerardo Velez wrote: Hi

Re: nagios to monitor hadoop datanodes!

2008-10-07 Thread Brian Bockelman
Hey Stefan, Is there any documentation for making JMX working in Hadoop? Brian On Oct 7, 2008, at 7:03 PM, Stefan Groschupf wrote: try jmx. There should be also jmx to snmp available somewhere. http://blogs.sun.com/jmxetc/entry/jmx_vs_snmp ~~~ 101tec Inc., Menlo

Re: Questions regarding Hive metadata schema

2008-10-07 Thread Jeff Hammerbacher
For translation purposes, SerDe's in Hive correspond to StoreFunc/LoadFunc pairs in Pig and Producers/Extractor pairs in SCOPE. I claim SCOPE's terminology is the most elegant and we should all standardize on their terminology, in this case at least. Joy claims that SerDe is a common term in the

Re: nagios to monitor hadoop datanodes!

2008-10-07 Thread 何永强
Hadoop already integrated jmx inside, you can extend them to implement what you want to monitor, it need to modify some code to add some counters or something like that. One thing you may need to be care is hadoop does not include any JMXConnectorServer inside, you need to start one

Re: IPC Client error | Too many files open

2008-10-07 Thread 何永强
try update jdk to 1.6, there is a bug for jdk 1.5 about nio. 在 2008-9-26,下午7:29,Goel, Ankur 写道: Hi Folks, We have developed a simple log writer in Java that is plugged into Apache custom log and writes log entries directly to our hadoop cluster (50 machines, quad core, each with 16 GB

Re: dual core configuration

2008-10-07 Thread Taeho Kang
You can have your node (tasktracker) running more than 1 task simultaneously. You may set mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum properties found in hadoop-site.xml file. You should change hadoop-site.xml file on all your slave nodes depending on how many

Re: dual core configuration

2008-10-07 Thread Alex Loddengaard
Taeho, I was going to suggest this change as well, but it's documented that mapred.tasktracker.map.tasks.maximum defaults to 2. Can you explain why Elia is only having one core utilized when this config option is set to 2? Here is the documentation I'm referring to: