On Fri, Jul 3, 2009 at 4:36 PM, Marcus Herou wrote:
> I understand what you are saying but the theory do not really get into my
> head... You mean that the latency in the CPU + Disk-IO is something like
> 1 times less (or perhaps more) than the latency between calling a
> remote
> system via s
Anyway why would it slow things down if it converges let's say 100 times
faster (in terms of iterations) and you are able to have a memcached or
whatever shared system (Voldemort) which is equal to the number of MR hosts
i.e. a memcached server on each one of them ?
I understand what you are sayin
When speaking in terms of Hadoop that is I guess...? But normally running in
a single JVM then this is the case right ?
/M
On Sat, Jul 4, 2009 at 1:17 AM, Ted Dunning wrote:
> No. It should not want that.
>
> On Fri, Jul 3, 2009 at 2:13 PM, Marcus Herou >wrote:
>
> > Should not N2 be wanting
No. It should not want that.
On Fri, Jul 3, 2009 at 2:13 PM, Marcus Herou wrote:
> Should not N2 be wanting to be aware of the freshest possible state of N1 ?
>
That doesn't actually speed things up. Generally, in fact, it slows things
down.
This is a case of sequential update. Batch update converges more slowly in
terms of the total number of operations, but because of the economies
available in map-reduce programs (due to sequential reading, merge sor
Hi.
I think I am confusing you guys with talking about various things at the
same time. I am mostly (99.9%) after Sequential throughput but sometimes I
need massively fast Random Access. And I would never ever pay for the BIG
kahoona of machine(s) that would be needed to be able to give me great R
On Fri, 2009-06-26 at 10:55 -0500, Mark Kerzner wrote:
> Tom, this is so much right on time! Bravo, Karmasphere.
> I installed the plugins, and nothing crashed - in fact, I get the same
> screens as the manual promises.
>
> It is worth reading this group - they released the plugin two days ago.
A
Hi David,
I'm unaware of any issue that would cause memory leaks when a file is open
for read for a long time.
There are some issues currently with write pipeline recovery when a file is
open for writing for a long time and the datanodes to which it's writing
fail. So, I would not recommend havin
Do you want random access for web presentation? What is your required
update time? What about search index delay?
Or batch sequential access for large scale computation like pageRank?
These are very different answers.
The first is likely to be a standard sharded profile database with
associate
Not my baby.
I designed it out of my system at about the same time you did. With 0.20,
however, we are re-evaluating it.
I still think you are thinking about random access which is a mistake for
batch computations like PageRank.
On Fri, Jul 3, 2009 at 12:28 AM, Marcus Herou wrote:
> Ted:
> Don
For computing pageRank, however, I bet that memcache would actually slow you
down by forcing you to have a smaller cluster.
For a batch program, latency is not the issue, aggregate throughput is. If
you have a 50 node MR cluster, you should be able to very easily sustain a
few GB/s in reading you
Hi Uri,
The script start-mapred.sh has two commands one of them is used to start the
jobtracker and the other is used to start the tasktrackers listed in the slaves
file.
I made a copy of the start-mapred.sh and removed the start job tracker command
line. I change the slaves file according to w
I have been told that it is not a good idea to keep HDFS files open for
a long time. The reason sounded like a memory leak in the name node -
that over time, the resources absorbed by an open file will increase.
Is this still an issue with Hadoop-0,19.x and 0-20.x? Was it ever an
issue?
I have
I don't understand this statement. Basic page rank in map-reduce is
normally a simple undergraduate class assignment:
http://www.ics.uci.edu/~abehm/class.../uci/.../Behm-Shah_PageRank.ppt
http://www.umiacs.umd.edu/~jimmylin/cloud9/docs/exercises/pagerank.html
What is it about your problem that m
Hadoop Fans,
I just wanted to drop the community a quick note about a new tool we just
released called MRUnit: http://bit.ly/J0AjZ
MRUnit helps bridge the gap between MapReduce programs and JUnit by
providing a set of interfaces and test harnesses, which allow MapReduce
programs to be more easily
It's unnecessary to run the NN and JT daemons on separate machines in small
clusters with more than three nodes. You'll only have performance benefits
by putting these daemons on separate machines if you have a large (100s of
nodes) cluster. It makes sense to separate the NN and JT daemons in a t
Hi,
It's unclear exactly what the problem is, so you should try and follow the
getting started guide more closely:
<
http://wiki.apache.org/hadoop/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster)
>
You should get a single-node cluster working before you try and get a
multi-node cluster.
Goo
Marcus Herou wrote:
Hi.
Comments inline
Cheers
//Marcus
On Fri, Jul 3, 2009 at 4:48 PM, Steve Loughran wrote:
Marcus Herou wrote:
Hi.
This is my company so I reveal what I like, even though the board would
shoot me but hey do you think they are scanning this mailinglist ? :)
The PR algo
Hi.
Comments inline
Cheers
//Marcus
On Fri, Jul 3, 2009 at 4:48 PM, Steve Loughran wrote:
> Marcus Herou wrote:
>
>> Hi.
>>
>> This is my company so I reveal what I like, even though the board would
>> shoot me but hey do you think they are scanning this mailinglist ? :)
>>
>> The PR algo is ve
Marcus Herou wrote:
Hi.
This is my company so I reveal what I like, even though the board would
shoot me but hey do you think they are scanning this mailinglist ? :)
The PR algo is very simple (but clever) and can be found on wikipedia:
http://en.wikipedia.org/wiki/PageRank
What is painful is t
Mark Kerzner wrote:
That's awesome information, Marcus.
I am working on a project which would require a similar architectural
solution (although unlike you I can't broadcast the details), so that was
very useful. One thing I can say though is that mine is in no way a
competitor, being in a differ
Michael Basnight wrote:
I have a java app that runs in tomcat and now needs to talk to my hadoop
infrastructure. Typically, all the testing ive done / examples show
starting something that uses hadoop via the 'bin/hadoop -jar' cmd, but
as you can imagine this is no good for a existing tomcat ap
Hi,
I wonder which is better, Namenode and JobTracker run in different server or
not?
--
View this message in context:
http://www.nabble.com/Wich-is-better-Namenode-and-JobTracker-run-in-different-server-or-not--tp24321039p24321039.html
Sent from the Hadoop core-user mailing list archive at N
Hallo everyone,
I have installed the hadoop 0.18.3 on three linux machines, I am trying to
run the
example of WordCountv1.0 on a cluster. But I guess I have a problem
somewhere.
*
Problem*
*After formating the name node:*
I am getting several STARTUP_MSG and at the end a "SHUTDOWN_MSG: shutting
do
hallo, everytime I try sending you and email explaining my problem
in Haddop the email does not reach you and I get the following error
*Technical details of permanent failure:
Google tried to deliver your message, but it was rejected by the recipient
domain.
We recommend contacting the other emai
Man, scanned through the slides, looks very promising.
Great work !
//Marcus
On Fri, Jul 3, 2009 at 9:28 AM, Marcus Herou wrote:
> Hi.
>
> This is my company so I reveal what I like, even though the board would
> shoot me but hey do you think they are scanning this mailinglist ? :)
>
> The PR a
I followed this thread and happy it finally worked for you. Can you
summerise for our benefit what was the final working alteration of the
code in your initial thread note?
Thanks!!
From:
akhil1988
To:
core-u...@hadoop.apache.org
Date:
03/07/2009 06:59 AM
Subject:
Re: Using addCacheArchive
Hi.
This is my company so I reveal what I like, even though the board would
shoot me but hey do you think they are scanning this mailinglist ? :)
The PR algo is very simple (but clever) and can be found on wikipedia:
http://en.wikipedia.org/wiki/PageRank
What is painful is to calculate it in a di
So, what exactly did you do?
From:
Iman E
To:
common-user@hadoop.apache.org
Date:
03/07/2009 04:34 AM
Subject:
Re: starting a tasktracker on a specific node in the cluster
The method I described below is now working! The jobtracker takes
sometime to update its list of available task tracke
29 matches
Mail list logo