Re: Reducer-side join example

2010-04-06 Thread Ed Kohlwey
Hi, Your question has an academic sound, so I'll give it an academic answer ;). Unfortunately, there are not really any good generalized (ie. cross join a large matrix with a large matrix) methods for doing joins in map-reduce. The fundamental reason for this is that in the general case you're

Re: Help on processing large amount of videos on hadoop

2009-12-22 Thread Ed Kohlwey
Hi Huazhong, Sounds like an interesting application. Here's a few tips. 1. If the frames are not independent, you should find a way to key them according to their order before dumping them in Hadoop so that they can be sorted as part of your map reduce task. BTW, the video won't appear split

Re: Can hadoop 0.20.1 programs runs on Amazon Elastic Mapreduce?

2009-12-16 Thread Ed Kohlwey
Last time I checked EMR only runs 0.18.3. You can use EC2 though, which winds up being cheaper anyways. On Wed, Dec 16, 2009 at 8:51 PM, 松柳 lamfeeli...@gmail.com wrote: Hi all, I'm wondering whether Amazon starts to support the newest stable version of Hadoop, or we can still just use 0.18.3?

Re: multiple file input

2009-12-08 Thread Ed Kohlwey
One important thing to note is that, with cross products, you'll almost always get better performance if you can fit both files on a single node's disk rather than distributing the files. On Tue, Dec 8, 2009 at 9:18 AM, laser08150815 la...@laserxyz.de wrote: pmg wrote: I am evaluating

Re: RE: Using Hadoop in non-typical large scale user-driven environment

2009-12-02 Thread Ed Kohlwey
As far as replication goes, you should look at a project called pastry. Apparently some people have used hadoop mapreduce on top of it. You will need to be clever, however, in how you do your mapreduce because you probably won't want the job to eat all the users cpu time. On Dec 2, 2009 5:11 PM,

Re: New graphic interface for Hadoop - Contains: FileManager, Daemon Admin, Quick Stream Job Setup, etc

2009-11-18 Thread Ed Kohlwey
The tool looks interesting. You should consider providing the source for it. Is it written in a language that can run on platforms besides windows? On Nov 17, 2009 10:40 AM, Cubic cubicdes...@gmail.com wrote: Hi list. This tool is a graphic interface for Hadoop. It may improove your productivity

Re: About Distribute Cache

2009-11-15 Thread Ed Kohlwey
Hi, What you can fit in distributed cache generally depends on the available disk space on your nodes. With most clusters 300 mb will not be a problem, but it depends on the cluster and the workload you're processing. On Sat, Nov 14, 2009 at 10:34 PM, 于凤东 fengdon...@gmail.com wrote: I have a