Python DFS/DHT project

2010-02-18 Thread Darren Govoni
Hi, I'm developing a python DFS/DHT and Software RAID file system that resembles Hadoop (among others). I wanted to convey the traits of my filesystem and see how it compares to HDFS but my aim is to develop different capabilities, not the exact same. Basically, what my DFS can do now is: - zero

Re: Python DFS/DHT project

2010-02-18 Thread Darren Govoni
is not exactly like this, but the point of it in RAID and my system is also the same in HDFS which is fault-tolerance through redundancy and distribution across servers. Cheers, Darren On Thu, 2010-02-18 at 16:52 -0500, Edward Capriolo wrote: > On Thu, Feb 18, 2010 at 4:29 PM, Darren Govoni wr

Re: Hadoop: Divide and Conquer Algorithms

2010-02-28 Thread Darren Govoni
I'm not sure this sort of problem will be efficient in Hadoop, but its the kind of problem WaveFS[1] is designed for. It propagates intermediate values across the cluster, allowing for algorithms to run in parallel, but coalesce shared products from distributed calculations. Without the need to for

Re: Big-O Notation for Hadoop

2010-03-01 Thread Darren Govoni
Theoretically. O(n) All other variables being equal across all nodes should...m.reduce to n. That part that really can't be measured is the cost of Hadoop's bookkeeping chores as the data set grows since some things in Hadoop involve synchronous/serial behavior. On Mon, 2010-03-01 at 12:

Re: Big-O Notation for Hadoop

2010-03-01 Thread Darren Govoni
1, 2010 at 4:13 PM, Darren Govoni wrote: > > Theoretically. O(n) > > > > All other variables being equal across all nodes > > should...m.reduce to n. > > > > That part that really can't be measured is the cost of Hadoop's > > bookkeep

Re: Import the results into SimpleDB

2010-05-11 Thread Darren Govoni
Might as well not use Hadoop then... On Tue, 2010-05-11 at 21:02 -0500, Mark Kerzner wrote: > Hi, > > I want a Hadoop job that will simply take each line of the input text file > and store it (after parsing) in a database, like SimpleDB. > > Can I put this code into Mapper, make no call to "col

Re: Sanity check re: value of 10GbE NICs for Hadoop?

2011-06-28 Thread Darren Govoni
Hadoop, like other parallel networked computation architectures is I/O bound, predominantly. This means any increase in network bandwidth is "A Good Thing" and can have drastic positive effects on performance. All your points stem from this simple realization. Although I'm confused by your #6.

Re: Sanity check re: value of 10GbE NICs for Hadoop?

2011-06-28 Thread Darren Govoni
-availability or data management benefits for use with Hadoop? Saqib -Original Message- From: Darren Govoni [mailto:dar...@ontrenet.com] Sent: Tuesday, June 28, 2011 10:21 AM To: common-user@hadoop.apache.org Subject: Re: Sanity check re: value of 10GbE NICs for Hadoop? Hadoop, like other

RE: Image Processing in Hadoop

2012-04-02 Thread Darren Govoni
This doesn't sound like a mapreduce[1] sort of problem. Now, of course, you can store files in HDFS and retrieve them. But its up to your application to interpret them. MapReduce cannot "display the corresponding door image", it is a computation scheme and performs calculations that you provide. [