from:"Colin Evans"

Re: RAID vs. JBOD

2009-01-12 Thread Colin Evans

Currently, Hadoop does round-robin allocation of blocks and data across multiple JBOD disks. We did some testing and found that there weren't significant differences between RAID-0 and JBOD. We went with JBOD because we figured that RAID-0 has a higher failure rate than JBOD -- any disk f

Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce

2008-10-20 Thread Colin Evans

of Computer Science & Engineering, Korea University 1, 5-ga, Anam-dong, Seongbuk-gu, Seoul, 136-713, Republic of Korea TEL : +82-2-3290-3580 - On Tue, Oct 21, 2008 at 10:23 AM, Colin Evans <[EMAIL PROTECTED]> wrote:

Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce

2008-10-20 Thread Colin Evans

Hi Edward, At Metaweb, we're experimenting with storing raw triples in HDFS flat files, and have written a simple query language and planner that executes the queries with chained map-reduce jobs. This approach works well for warehousing triple data, and doesn't require HBase. Queries may ta

Re: Distributed cache Design

2008-10-16 Thread Colin Evans

The trick is to amortize your computation over the whole set. So DFS for a single node will always be faster on an in-memory graph, but Hadoop is a good tool for computing all-pairs shortest paths in one shot if you re-frame the algorithm as a belief propagation and message passing algorithm.

Re: Distributed cache Design

2008-10-16 Thread Colin Evans

At Freebase, we're mapping our large graphs into very large files of triples in HDFS and running large queries over them. Hadoop is optimized for processing streaming data off of disk, and we've found that trying to load a multi-GB graph and then access it in a Hadoop task has scaling problems

Re: LZO and native hadoop libraries

2008-09-30 Thread Colin Evans

: Unfortunately, setting those environment variables did not help my issue. It appears that the "HADOOP_LZO_LIBRARY" variable is not defined in both LzoCompressor.c and LzoDecompressor.c. Where is this variable supposed to be set? On Sep 30, 2008, at 12:33 PM, Colin Evans wrote: Hi N

Re: LZO and native hadoop libraries

2008-09-30 Thread Colin Evans

adoop/io/compress/lzo/LzoCompressor.c:135: error: syntax error before ',' token [exec] make[2]: *** [LzoCompressor.lo] Error 1 [exec] make[1]: *** [all-recursive] Error 1 [exec] make: *** [all] Error 2 Any ideas? On Sep 30, 2008, at 11:53 AM, Colin Evans wrote: There

Re: LZO and native hadoop libraries

2008-09-30 Thread Colin Evans

There's a patch to get the native targets to build on Mac OS X: http://issues.apache.org/jira/browse/HADOOP-3659 You probably will need to monkey with LDFLAGS as well to get it to work, but we've been able to build the native libs for the Mac without too much trouble. Doug Cutting wrote: A

Hadoop + Python = Happy

2008-09-23 Thread Colin Evans

Freebase is finally open-sourcing our Jython-based framework for writing map-reduce jobs on Hadoop. Happy tightly embeds Jython into the Hadoop APIs, files off a lot of the sharp edges, and makes writing map-reduce programs a breeze. This is the 0.1 release, but we've been using Happy at Free

Hadoop presentations at the next Freebase user group meeting

2008-06-12 Thread Colin Evans

ning financial data from the SEC in Freebase, a talk by Kurt Bollacker on data mining Wikipedia, and at talk by Kirrily Robert on new features in Freebase. Sign up if you're planning on coming - space can be limited. http://upcoming.yahoo.com/event/760574 Thanks Colin Evans

RAID-0 vs. JBOD?

2008-04-10 Thread Colin Evans

We're building a cluster of 40 machines with 5 drives each, and I'm curious what people's experiences have been for using RAID-0 for HDFS vs. configuring seperate partitions (JBOD) and having the datanode balance between them. I took a look at the datanode code, and datanodes appear to write b

Re: Hadoop streaming performance problem

2008-03-31 Thread Colin Evans

At Metaweb, we did a lot of comparisons between streaming (using Python) and native Java, and in general streaming performance was not much slower than the native java -- most of the slowdown was from Python being a slow language. The main problems with streaming apps that we found are that th

Re: map/reduce function on xml string

2008-03-04 Thread Colin Evans

Here's the code. If folks are interested, I can submit it as a patch as well. Prasan Ary wrote: Colin, Is it possible that you share some of the code with us? thx, Prasan Colin Evans <[EMAIL PROTECTED]> wrote: We ended up subclassing TextInputFormat and addi

Re: map/reduce function on xml string

2008-03-03 Thread Colin Evans

We ended up subclassing TextInputFormat and adding a custom RecordReader that starts and ends record reads on tags. The StreamXmlRecordReader class is a good reference for this. Prasan Ary wrote: Hi All, I am writing a java implementation for my map/reduce function on hadoop. Input to th

Re: Question on DFS block placement and 'what is a rack' wrt DFS block placement

2008-02-12 Thread Colin Evans

On 2/12/08 12:19 PM, "Colin Evans" <[EMAIL PROTECTED]> wrote: The big question for me is how well a dual-CPU 4-core (8 cores per box) configuration will do. Has anyone tried out this configuration with Intel or AMD CPUs? Is the memory throughput sufficient?

Re: Question on DFS block placement and 'what is a rack' wrt DFS block placement

2008-02-12 Thread Colin Evans

Because of acquiring servers of different capacities at different times, we have 2 servers with 1TB of disk each, and 11 servers with ~300GB each. The 1TB servers tend to be under-utilized by HDFS given their capacity. This makes sense, as block replicas need to be relatively evenly distribut

HDFS lost blocks?

2008-02-04 Thread Colin Evans

oks like a bunch of blocks got allocated on the datanodes that the namdenode doesn't know about, and the datanodes are refusing to work with new blocks that have the same id. Does this sound likely? What's a good fix for this? Thanks! Colin Evans

Re: hadoop: how to find top N frequently occurring words

2008-02-04 Thread Colin Evans

Hi Ted, I've been building out a similar framework in JavaScript (Rhino) for work that I've been doing at MetaWeb, and we've been thinking about open sourcing it too. It's pretty clear that there are major benefits to using a dynamic scripting language with Hadoop. I'd love too see how you'r

Re: RAID vs. JBOD

Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce

Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce

Re: Distributed cache Design

Re: Distributed cache Design

Re: LZO and native hadoop libraries

Re: LZO and native hadoop libraries

Re: LZO and native hadoop libraries

Hadoop + Python = Happy

Hadoop presentations at the next Freebase user group meeting

RAID-0 vs. JBOD?

Re: Hadoop streaming performance problem

Re: map/reduce function on xml string

Re: map/reduce function on xml string

Re: Question on DFS block placement and 'what is a rack' wrt DFS block placement

Re: Question on DFS block placement and 'what is a rack' wrt DFS block placement

HDFS lost blocks?

Re: hadoop: how to find top N frequently occurring words

18 matches

Site Navigation

Mail list logo

Footer information