Re: writing output files in hadoop streaming

2008-01-16 Thread John Heidemann
andard Hadoop output stream best leverages the Hadoop mechanisms. I even labeled it the "proper" way, and talked about serialization (without using that word): >> On 15/01/2008, John Heidemann <[EMAIL PROTECTED]> wrote: >>... >>> There's a second way, which i

Re: writing output files in hadoop streaming

2008-01-15 Thread John Heidemann
On Tue, 15 Jan 2008 09:09:07 PST, Ted Dunning wrote: > >Regarding the race condition, hadoop builds task specific temporary >directories in the output directory, one per reduce task, that hold these >output files (as long as you don't use absolute path names). When the >process completes successf

Re: Appropriate use of Hadoop for non-map/reduce tasks?

2007-12-21 Thread John Heidemann
On Fri, 21 Dec 2007 12:24:57 PST, John Heidemann wrote: >On Thu, 20 Dec 2007 18:46:58 PST, Kirk True wrote: >>Hi all, >> >>A lot of the ideas I have for incorporating Hadoop into internal projects >>revolves around distributing long-running tasks over multiple machine

Re: Appropriate use of Hadoop for non-map/reduce tasks?

2007-12-21 Thread John Heidemann
un multiple systems, but then again they have some resources to invest. Or maybe other cluster compute systems are now easier to deploy and maintain, and optimize, and control interactions with Hadoop, and... But I'd guess not :-) ) -John Heidemann

Re: Using hadoop for distributed rendering

2007-10-17 Thread John Heidemann
nefit from HDFS or other file systems---the input needed for rendering (compositing, textures, models, etc.) is not an obvious fit for map/reduce, at least to me. -John Heidemann

using hadoop to map the internet

2007-10-04 Thread John Heidemann
Hadoop folks might be interested that we've used Hadoop to render some maps of the Internet address space. Aggregated maps are at ; we've rendered these both with and without Hadoop. The more intresting map that required Hadoop is at

hardware specs for hadoop nodes

2007-09-10 Thread John Heidemann
nts anyone? And what about, say on the namenode? People talk about it being a memory bottleneck, but ours is underutilized. Should we start a wiki page about this? -John Heidemann

Re: "Broken pipe" in hadoop streaming...looking for debugging hints

2007-07-06 Thread John Heidemann
On Thu, 05 Jul 2007 11:10:31 PDT, John Heidemann wrote: > >I'm running hadoop streaming from svn (version 552930, reasonably >recent). My map/reduce job maps ~1M records, but then a few reduces >succeed and many fail, eventually terminating the job unsuccessfully. >I'm l

"Broken pipe" in hadoop streaming...looking for debugging hints

2007-07-06 Thread John Heidemann
get any input and it eventually timesout and kills me.) (When I go look in the raw reduce directories, e.g., task_0004_m_000455_0, I do see reasonable looking stuff, including my input.) The other strange thing is I don't get 100% reduce failures, but maybe 490/503 fail. Any suggestions? -John Heidemann

"Broken pipe" in hadoop streaming...looking for debugging hints

2007-07-05 Thread John Heidemann
ke the reduces that succeed legitimately have no input and so have no output. Another oddity. Any suggestions? -John Heidemann