Re: When reduce tasks start in MapReduce Streaming?

2013-01-16 Thread Pedro Sá da Costa
So why it's called hadoop streaming, if it doesn't behave like a streaming application (The reduces don't receive data as long as it is produced by the map tasks)? On 16 January 2013 05:41, Jeff Bean wrote: > me property. The reduce method is not called until the mappers are done, and > the redu

Re: When reduce tasks start in MapReduce Streaming?

2013-01-16 Thread Jeff Bean
It's called Hadoop Streaming because keys and values are streamed in to stdin of the script you specify for Hadoop Streaming and then captured via stdout. On Wed, Jan 16, 2013 at 1:04 AM, Pedro Sá da Costa wrote: > So why it's called hadoop streaming, if it doesn't behave like a > streaming appli

RE: Limitation of key-value pairs for a particular key.

2013-01-16 Thread Utkarsh Gupta
Hi, Thanks for the response. There was some issues with my code. I have checked that in detail. All the values of map are present in reducer but not in sorted order. This case happens if the number of values are too large for a key. Thanks Utkarsh From: Vinod Kumar Vavilapalli [mailto:vino...@h

RE: Limitation of key-value pairs for a particular key.

2013-01-16 Thread Harsh J
We don't sort values (only keys) nor apply any manual limits in MR. Can your post a reproduceable test case to support your suspicion? On Jan 16, 2013 4:34 PM, "Utkarsh Gupta" wrote: > Hi, > > Thanks for the response. There was some issues with my code. I have > checked that in detail.

Re: MPI and hadoop on same cluster

2013-01-16 Thread Harsh J
The patch has not been contributed yet. Upstream at open-mpi there does seem to be a branch that makes some reference to Hadoop, but I think the features are yet to be made available there too. Apparently waiting on some form of a product release first? That's all I could gather from some sleuthing