Here I have an existing hadoop 0.17.1 cluster. Now I'd like to add a second
disk on every machine.
So can I startup multi datanodes on 1 machine? Or do I have to setup each
machine with soft RAID configured ? (no RAID support on mainboards)
Thanks
Thanks, I will try it then.
On Tue, Oct 7, 2008 at 4:40 PM, Miles Osborne [EMAIL PROTECTED] wrote:
you can specify multiple data directories in your conf file
dfs.data.dir Comma separated list of paths on the local filesystem
of a DataNode where it should store its blocks. If
I would like to get the spam probability P(word|category) of the words
from an files of category (bad/good e-mails) as describe below. BTW,
To computes it on reduce, I need a sum of spamTotal between map
tasks. How can i get it?
Map:
/**
* Counts word frequency
*/
public void
Hi!
First of all, sorry for my English.
I've been working with Hadoop the last weeks and I wonder if there is any
virtual cluster where I can connect to, with a single machine, in order to
submit jobs.
Thanks
this is a well known problem. basically, you want to aggregate values
computed at some previous step.
--emit category,probability pairs and have the reducer simply sum-up
the probabilities for a given category
(it is the same task as summing-up the word counts)
Miles
2008/10/7 Edward J. Yoon
hi
does hadoop support graphics packages for displaying some images..?
--
Best Regards
S.Chandravadana
This e-mail and any files transmitted with it are for the sole use of the
intended recipient(s) and may contain confidential and privileged information.
If you are not the intended
you can specify multiple data directories in your conf file
dfs.data.dir Comma separated list of paths on the local filesystem
of a DataNode where it should store its blocks. If this is a
comma-delimited list of directories, then data will be stored in all
named directories,
Hi,
Hadoop is a platform for distributed computing. Typically it runs on a
cluster of dedicated servers (though expensive HW is not required), as far
as I know it is not mean to be a platform for applications running on
client.
Hadoop is very general and not limitted by nature of the data, this
and is there any method for creating an image file in hadoop..?
chandravadana wrote:
hi
does hadoop support graphics packages for displaying some images..?
--
Best Regards
S.Chandravadana
This e-mail and any files transmitted with it are for the sole use of the
intended
Oh-ha, that's simple. :)
/Edward J. Yoon
On Tue, Oct 7, 2008 at 7:14 PM, Miles Osborne [EMAIL PROTECTED] wrote:
this is a well known problem. basically, you want to aggregate values
computed at some previous step.
--emit category,probability pairs and have the reducer simply sum-up
the
I figure out the input directory part: I just need to add the
$HADOOP_HOME/conf directory into the classpath in eclipse.
However, now I ran into a new problem: now the program complains that it
cannot find the class files for my mapper and reducer! The error message is
as follows:
08/10/07
Thanks for the clarification, Samuel. I wasn't aware that parts of a line
might be emitted depending on the split, while using TextInputFormat.
Terrence, this means that you'll have to take the approach of collecting key
= column_count, value = column_contents in your map step.
Alex
On Mon, Oct
Hadoop runs Java code, so you can do anything that Java could do. This
means that you can create and/or analyze images. However, as Lukas has
said, Hadoop runs on a cluster of computers and is used for data storage and
processing.
If you need to display images, then you'd have to take these
Amazon EC2 and S3 is probably the easiest way for someone without a cluster
to get jobs running. Take a look:
EC2:
http://aws.amazon.com/ec2/
http://wiki.apache.org/hadoop/AmazonEC2
S3:
http://aws.amazon.com/s3/
http://wiki.apache.org/hadoop/AmazonS3
Hope this helps.
Alex
On Tue, Oct 7, 2008
I've got a simple hadoop job running on an EC2 cluster using the scripts
under src/contrib/ec2. The map tasks all fail with the following error:
08/10/07 15:11:00 INFO mapred.JobClient: Task Id :
attempt_200810071501_0001_m_31_0, Status : FAILED
java.lang.RuntimeException:
Sorry, I should have mentioned I'm using hadoop version 0.18.1 and java 1.6.
Dan Benjamin wrote:
I've got a simple hadoop job running on an EC2 cluster using the scripts
under src/contrib/ec2. The map tasks all fail with the following error:
08/10/07 15:11:00 INFO mapred.JobClient: Task
hello,
I have some dual core nodes, and I've noticed hadoop is only running 1
instance, and so is only using 1 on the CPU's on each node.
is there a configuration to tell it to run more than once?
or do i need to turn each machine into 2 nodes?
Thanks.
Hi,
I've been looking over the db schema that hive uses to store it's
metadata (package.jdo) and I had some questions:
1. What do the field names in the TYPES table mean? TYPE1, TYPE2,
and TYPE_FIELDS are all unclear to me.
2. In the TBLS (tables) table, what is sd?
3. What does the
Hi Alan,
The objects are very closely associated with the Thrift API objects defined
in src/contrib/hive/metastore/if/hive_metastore.thrift . It contains
descriptions as to what each field is and it should most of your questions.
ORM for this is at s/c/h/metastore/src/java/model/package.jdo.
2)
try jmx. There should be also jmx to snmp available somewhere.
http://blogs.sun.com/jmxetc/entry/jmx_vs_snmp
~~~
101tec Inc., Menlo Park, California
web: http://www.101tec.com
blog: http://www.find23.net
On Oct 6, 2008, at 10:05 AM, Gerardo Velez wrote:
Hi
Hey Stefan,
Is there any documentation for making JMX working in Hadoop?
Brian
On Oct 7, 2008, at 7:03 PM, Stefan Groschupf wrote:
try jmx. There should be also jmx to snmp available somewhere.
http://blogs.sun.com/jmxetc/entry/jmx_vs_snmp
~~~
101tec Inc., Menlo
For translation purposes, SerDe's in Hive correspond to
StoreFunc/LoadFunc pairs in Pig and Producers/Extractor pairs in
SCOPE.
I claim SCOPE's terminology is the most elegant and we should all
standardize on their terminology, in this case at least. Joy claims
that SerDe is a common term in the
Hadoop already integrated jmx inside, you can extend them to
implement what you want to monitor, it need to modify some code to
add some counters or something like that.
One thing you may need to be care is hadoop does not include any
JMXConnectorServer inside, you need to start one
try update jdk to 1.6, there is a bug for jdk 1.5 about nio.
在 2008-9-26,下午7:29,Goel, Ankur 写道:
Hi Folks,
We have developed a simple log writer in Java that is plugged into
Apache custom log and writes log entries directly to our hadoop
cluster
(50 machines, quad core, each with 16 GB
You can have your node (tasktracker) running more than 1 task
simultaneously.
You may set mapred.tasktracker.map.tasks.maximum and
mapred.tasktracker.reduce.tasks.maximum properties found in
hadoop-site.xml file. You should change hadoop-site.xml file on all your
slave nodes depending on how many
Taeho, I was going to suggest this change as well, but it's documented that
mapred.tasktracker.map.tasks.maximum defaults to 2. Can you explain why
Elia is only having one core utilized when this config option is set to 2?
Here is the documentation I'm referring to:
26 matches
Mail list logo