That depends on how many machines you have in your cluster. Say you have 6 workers and its most likely it is to be distributed across all worker (assuming your topic has 6 partitions). Now when you have more than 6 partition, say 12. Then these 6 receivers will start to consume from 2 partitions at a time. And when you have less partitions say 3, then 3 of the receivers will be idle. On 24 Feb 2015 10:16, "bit1...@163.com" <bit1...@163.com> wrote:
> Hi, Akhil,Tathagata, > > This leads me to another question ,For the Spark Streaming and Kafka > Integration, If there are more than one Receiver in the cluster, such as > val streams = (1 to 6).map ( _ => KafkaUtils.createStream(ssc, > zkQuorum, group, topicMap).map(_._2) ), > then these Receivers will stay on one cluster node, or will they > distributed among the cluster nodes? > > ------------------------------ > bit1...@163.com > > > *From:* Akhil Das <ak...@sigmoidanalytics.com> > *Date:* 2015-02-24 12:58 > *To:* Tathagata Das <t...@databricks.com> > *CC:* user <user@spark.apache.org>; bit1129 <bit1...@163.com> > *Subject:* Re: About FlumeUtils.createStream > > I see, thanks for the clarification TD. > On 24 Feb 2015 09:56, "Tathagata Das" <t...@databricks.com> wrote: > >> Akhil, that is incorrect. >> >> Spark will list on the given port for Flume to push data into it. >> When in local mode, it will listen on localhost:9999 >> When in some kind of cluster, instead of localhost you will have to give >> the hostname of the cluster node where you want Flume to forward the data. >> Spark will launch the Flume receiver on that node (assuming the hostname >> matching is correct), and list on port 9999, for receiving data from Flume. >> So only the configured machine will listen on port 9999. >> >> I suggest trying the other stream. FlumeUtils.createPollingStream. More >> details here. >> http://spark.apache.org/docs/latest/streaming-flume-integration.html >> >> >> >> On Sat, Feb 21, 2015 at 12:17 AM, Akhil Das <ak...@sigmoidanalytics.com> >> wrote: >> >>> Spark won't listen on 9999 mate, It basically means you have a flume >>> source running at port 9999 of your localhost. And when you submit your >>> application in standalone mode, workers will consume date from that port. >>> >>> Thanks >>> Best Regards >>> >>> On Sat, Feb 21, 2015 at 9:22 AM, bit1...@163.com <bit1...@163.com> >>> wrote: >>> >>>> >>>> Hi, >>>> In the spark streaming application, I write the code, >>>> FlumeUtils.createStream(ssc,"localhost",9999),which >>>> means spark will listen on the 9999 port, and wait for Flume Sink to write >>>> to it. >>>> My question is: when I submit the application to the Spark Standalone >>>> cluster, will 9999 be opened only on the Driver Machine or all the workers >>>> will also open the 9999 port and wait for the Flume data? >>>> >>>> ------------------------------ >>>> >>>> >>> >>