Re: What exactly are the output_dir/part-00000 semantics (of a streaming job) ?

2011-05-13 Thread Dieter Plaetinck
On Thu, 12 May 2011 09:49:23 -0700 (PDT) Aman aman_d...@hotmail.com wrote: The creation of files part-n is atomic. When you run a MR job, these files are created in directory output_dir/_temporary and moved to output_dir after the files is closed for writing. This move is atomic hence as

What exactly are the output_dir/part-00000 semantics (of a streaming job) ?

2011-05-12 Thread Dieter Plaetinck
Hi, I'm running some experiments using hadoop streaming. I always get a output_dir/part-0 file at the end, but I wonder: when exactly will this filename show up? when it's completely written, or will it already show up while the hapreduce software is still writing to it? Is the write atomic?

Re: What exactly are the output_dir/part-00000 semantics (of a streaming job) ?

2011-05-12 Thread Aman
The creation of files part-n is atomic. When you run a MR job, these files are created in directory output_dir/_temporary and moved to output_dir after the files is closed for writing. This move is atomic hence as long as you don't try to read these files from temporary directory (which I see