The behvior is exactly what I expected. Thanks Akhil and Tathagata!
bit1...@163.com From: Akhil Das Date: 2015-02-24 13:32 To: bit1129 CC: Tathagata Das; user Subject: Re: Re: About FlumeUtils.createStream That depends on how many machines you have in your cluster. Say you have 6 workers and its most likely it is to be distributed across all worker (assuming your topic has 6 partitions). Now when you have more than 6 partition, say 12. Then these 6 receivers will start to consume from 2 partitions at a time. And when you have less partitions say 3, then 3 of the receivers will be idle. On 24 Feb 2015 10:16, "bit1...@163.com" <bit1...@163.com> wrote: Hi, Akhil,Tathagata, This leads me to another question ,For the Spark Streaming and Kafka Integration, If there are more than one Receiver in the cluster, such as val streams = (1 to 6).map ( _ => KafkaUtils.createStream(ssc, zkQuorum, group, topicMap).map(_._2) ), then these Receivers will stay on one cluster node, or will they distributed among the cluster nodes? bit1...@163.com From: Akhil Das Date: 2015-02-24 12:58 To: Tathagata Das CC: user; bit1129 Subject: Re: About FlumeUtils.createStream I see, thanks for the clarification TD. On 24 Feb 2015 09:56, "Tathagata Das" <t...@databricks.com> wrote: Akhil, that is incorrect. Spark will list on the given port for Flume to push data into it. When in local mode, it will listen on localhost:9999 When in some kind of cluster, instead of localhost you will have to give the hostname of the cluster node where you want Flume to forward the data. Spark will launch the Flume receiver on that node (assuming the hostname matching is correct), and list on port 9999, for receiving data from Flume. So only the configured machine will listen on port 9999. I suggest trying the other stream. FlumeUtils.createPollingStream. More details here. http://spark.apache.org/docs/latest/streaming-flume-integration.html On Sat, Feb 21, 2015 at 12:17 AM, Akhil Das <ak...@sigmoidanalytics.com> wrote: Spark won't listen on 9999 mate, It basically means you have a flume source running at port 9999 of your localhost. And when you submit your application in standalone mode, workers will consume date from that port. Thanks Best Regards On Sat, Feb 21, 2015 at 9:22 AM, bit1...@163.com <bit1...@163.com> wrote: Hi, In the spark streaming application, I write the code, FlumeUtils.createStream(ssc,"localhost",9999),which means spark will listen on the 9999 port, and wait for Flume Sink to write to it. My question is: when I submit the application to the Spark Standalone cluster, will 9999 be opened only on the Driver Machine or all the workers will also open the 9999 port and wait for the Flume data?