On Mar 21, 2008, at 6:35 PM, Stephen J. Barr wrote:
Hello,
I am working on developing my first hadoop app from scratch. It is
a Monte-Carlo simulation, and I am using the PiEstimator code from
the examples as a reference. I believe I have what I want in
a .java file. However, I couldn't
Thank you. That worked (well, it pointed out all the bugs in my code,
which is a good start.)
朱盛凯 wrote:
Hi Stephen,
You can get an example of word count, there shows how to create jar archive
of you application codes.
$ mkdir wordcount_classes
$ javac -classpath ${HADOOP_HOME}/hadoop-${HADO
Hi Stephen,
You can get an example of word count, there shows how to create jar archive
of you application codes.
$ mkdir wordcount_classes
$ javac -classpath ${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar -d
wordcount_classes WordCount.java
$ jar -cvf /usr/joe/wordcount.jar -C wordcount_classe
Hi Stephen,
Here's a sample Hadoop app which has its build based on Ant:
http://code.google.com/p/ceteri-mapred/
Look in the "jyte" directory. A target called "prep.jar" simply uses
the task in Ant to build a JAR for Hadoop to use.
Yeah, I agree that docs and discussions seem to lean more t
Hello,
I am working on developing my first hadoop app from scratch. It is a
Monte-Carlo simulation, and I am using the PiEstimator code from the
examples as a reference. I believe I have what I want in a .java file.
However, I couldn't find any documentation on how to make that .java
file int
The namenode lazily instructs a Datanode to delete blocks. As a response to
every heartbeat from a Datanode, the Namenode instructs it to delete a maximum
on 100 blocks. Typically, the heartbeat periodicity is 3 seconds. The heartbeat
thread in the Datanode deletes the block files synchronously
After waiting a few hours (without having any load), the block number
and "DFS Used" space seems to go down...
My question is: is the hardware simply too weak/slow to send the block
deletion request to the datanodes in a timely manner, or do simply those
"crappy" HDDs cause the delay, since I no
The delay may be in reporting the deleted blocks as free on the web
interface as much as in actually marking them as deleted.
On 3/21/08 2:48 PM, "André Martin" <[EMAIL PROTECTED]> wrote:
> Right, I totally forgot about the replication factor... However
> sometimes I even noticed ratios of 5:1
I wouldn't call it a design feature so much as a consequence of background
processing in the NameNode to clean up the recently-closed files and reclaim
their blocks.
Jeff
> -Original Message-
> From: André Martin [mailto:[EMAIL PROTECTED]
> Sent: Friday, March 21, 2008 2:48 PM
> To: core-
Right, I totally forgot about the replication factor... However
sometimes I even noticed ratios of 5:1 for block numbers to files...
Is the delay for block deletion/reclaiming an intended behavior?
Jeff Eastman wrote:
That makes the math come out a lot closer (3*423763=1271289). I've also
notic
That makes the math come out a lot closer (3*423763=1271289). I've also
noticed there is some delay in reclaiming unused blocks so what you are
seeing in terms of block allocations do not surprise me.
> -Original Message-
> From: André Martin [mailto:[EMAIL PROTECTED]
> Sent: Friday, March
3 - the default one...
Jeff Eastman wrote:
What's your replication factor?
Jeff
-Original Message-
From: André Martin [mailto:[EMAIL PROTECTED]
Sent: Friday, March 21, 2008 2:25 PM
To: core-user@hadoop.apache.org
Subject: Performance / cluster scaling question
Hi everyone,
I ran a
What's your replication factor?
Jeff
> -Original Message-
> From: André Martin [mailto:[EMAIL PROTECTED]
> Sent: Friday, March 21, 2008 2:25 PM
> To: core-user@hadoop.apache.org
> Subject: Performance / cluster scaling question
>
> Hi everyone,
> I ran a distributed system that consists
Attached image can be found here:
http://www.andremartin.de/Performance-degradation.png
Hi everyone,
I ran a distributed system that consists of 50 spiders/crawlers and 8
server nodes with a Hadoop DFS cluster with 8 datanodes and a namenode...
Each spider has 5 job processing / data crawling threads and puts
crawled data as one complete file onto the DFS - additionally there are
I don't know the deep answer, but formatting your dfs creates a new
namespaceId that needs to be consistent across all slaves. Any data
directories containing old version ids will prevent the DataNode from
starting on that node. Maybe somebody who really knows the machinery can
elaborate to this.
yup, got it working with that technique.
pushed it out to 5 machines, things look good. appreciate the help.
what is it that causes this? i know i formatted the dfs more than once. is
that what does it? or just adding nodes, or... ?
-colin
On Fri, Mar 21, 2008 at 2:30 PM, Jeff Eastman <[E
I encountered this while I was starting out too, while moving from a single
node cluster to more nodes. I suggest clearing your hadoop-datastore
directory, reformatting the HDFS and restarting again. You are very close :)
Jeff
> -Original Message-
> From: Colin Freas [mailto:[EMAIL PROTECT
ah:
2008-03-21 14:06:05,526 ERROR org.apache.hadoop.dfs.DataNode:
java.io.IOException: Incompatible namespaceIDs in
/var/tmp/hadoop-datastore/hadoop/dfs/data: namenode namespaceID =
2121666262; datanode namespaceID = 2058961420
looks like i'm hitting this "Incompatible namespaceID" bug:
http://i
Thank you guys for all that good answers, I appreciate that.
Jean-Pierre.
On Mar 21, 2008, at 12:47 PM, Ted Dunning wrote:
The default number of reducers is 4. It is unlikely that a user who
doesn't
know about how to set the number of reducers has changed that value.
This phenomenon of a
Check your logs. That should work out of the box with the configuration
steps you described.
Jeff
> -Original Message-
> From: Colin Freas [mailto:[EMAIL PROTECTED]
> Sent: Friday, March 21, 2008 10:40 AM
> To: core-user@hadoop.apache.org
> Subject: Master as DataNode
>
> setting up a s
setting up a simple hadoop cluster with two machines, i've gotten to the
point where the two machines can see each other, things seem fine, but i'm
trying to set up the master as both a master and a slave, just for testing
purposes.
so, i've put the master into the conf/masters file and the conf/s
ah, yes. that worked. thanks!
On Fri, Mar 21, 2008 at 12:48 PM, Natarajan, Senthil <[EMAIL PROTECTED]>
wrote:
> I guess the following file might have localhost entry, change to hostname
>
> /conf/masters
> /conf/slaves
>
>
> -Original Message-
> From: Colin Freas [mailto:[EMAIL PROTECTE
I guess the following file might have localhost entry, change to hostname
/conf/masters
/conf/slaves
-Original Message-
From: Colin Freas [mailto:[EMAIL PROTECTED]
Sent: Friday, March 21, 2008 12:25 PM
To: core-user@hadoop.apache.org
Subject: NFS mounted home, host RSA keys, localhost, s
The default number of reducers is 4. It is unlikely that a user who doesn't
know about how to set the number of reducers has changed that value.
This phenomenon of apparently having only a single reducer often happens if
you have a very skewed distribution of keys for the reduce phase.
Imagine
Thanks Hairong,
I've just created https://issues.apache.org/jira/browse/HADOOP-3064 for this.
Tom
On 20/03/2008, Hairong Kuang <[EMAIL PROTECTED]> wrote:
> Yes, this is a bug. This only occurs when a job's input path contains the
> closures. JobConf.getInputPaths interprets mr/input/glob/2008/
On 3/21/08 8:29 AM, "Dan Tamowski" <[EMAIL PROTECTED]> wrote:
> -Does Hadoop/MR offer a clean abstraction for both consuming and producing a
> large number of files? (I know it can handily consume a large number of
> fies, but all examples of output seem to form a single file)
Yes.
IT works v
i'm working to set up a cluster across several machines where users' home
dirs are on an nfs mount.
i setup key authentication for the hadoop user, install all the software on
one node, get everything running, and move on to another node.
once there, however, my sshd complains because the host ke
Hello,
Forgive me if I am missing something in the documentation, but nothing is
jumping out at me.
I am exploring the use of Hadoop for image analysis and/or image
vectorization and have a few questions. I anticipate that there will be a
large collection of image files as input with an equal num
On Fri, 21 Mar 2008, Jean-Pierre OCALAN wrote:
> Hi,
>
> I'm currently working on a project that implies massive log parsing. I have
> one master and 6 slaves.
> By looking the each slaves logs I've noticed that REDUCE operation just runs
> on one machine.
> So does that mean that reduce just runs
Hi,
I'm currently working on a project that implies massive log parsing. I
have one master and 6 slaves.
By looking the each slaves logs I've noticed that REDUCE operation
just runs on one machine.
So does that mean that reduce just runs on one machine ? And if that
is true how can I specif
On Fri, Mar 21, 2008 at 12:42 AM, Doug Cutting <[EMAIL PROTECTED]> wrote:
> Rong-en Fan wrote:
> > I have two questions regarding the mapfile in hadoop/hdfs. First, when
> using
> > MapFileOutputFormat as reducer's output, is there any way to change
> > the index interval (i.e., able to call se
32 matches
Mail list logo