Hi Tobias,
I tried to use 10 as numPartition. The number of executors allocated is the
number of DStream. Therefore, it seems the parameter does not spread data
into many partitions. In order to to that, it seems we have to do
repartition. If numPartitions will distribute the data to multiple
Hi Tathagata,
I am currentlycreating multiple DStream to consumefrom different topics.
How can I let each consumer consume from different partitions. I find the
following parameters from Spark API:
createStream[K, V, U : Decoder[_], T : Decoder[_]](jssc:
JavaStreamingContext
Bill,
numPartitions means the number of Spark partitions that the data received
from Kafka will be split to. It has nothing to do with Kafka partitions, as
far as I know.
If you create multiple Kafka consumers, it doesn't seem like you can
specify which consumer will consume which Kafka
Hi Tobias,
It seems that repartition can create more executors for the stages
following data receiving. However, the number of executors is still far
less than what I require (I specify one core for each executor). Based on
the index of the executors in the stage, I find many numbers are missing
Thanks Tathagata,
That would be awesome if Spark streaming can support receiving rate in
general. I tried to explore the link you provided but could not find any
specific JIRA related to this? Do you have the JIRA number for this?
On Thu, Jul 17, 2014 at 9:21 PM, Tathagata Das
Oops, wrong link!
JIRA: https://github.com/apache/spark/pull/945/files
Github PR: https://github.com/apache/spark/pull/945/files
On Fri, Jul 18, 2014 at 7:19 AM, Chen Song chen.song...@gmail.com wrote:
Thanks Tathagata,
That would be awesome if Spark streaming can support receiving rate in
Dang! Messed it up again!
JIRA: https://issues.apache.org/jira/browse/SPARK-1341
Github PR: https://github.com/apache/spark/pull/945/files
On Fri, Jul 18, 2014 at 11:35 AM, Tathagata Das tathagata.das1...@gmail.com
wrote:
Oops, wrong link!
JIRA:
Thanks Luis and Tobias.
On Tue, Jul 1, 2014 at 11:39 PM, Tobias Pfeiffer t...@preferred.jp wrote:
Hi,
On Wed, Jul 2, 2014 at 1:57 AM, Chen Song chen.song...@gmail.com wrote:
* Is there a way to control how far Kafka Dstream can read on
topic-partition (via offset for example). By setting
I also have an issue consuming from Kafka. When I consume from Kafka, there
are always a single executor working on this job. Even I use repartition,
it seems that there is still a single executor. Does anyone has an idea how
to add parallelism to this job?
On Thu, Jul 17, 2014 at 2:06 PM, Chen
Bill,
are you saying, after repartition(400), you have 400 partitions on one host
and the other hosts receive nothing of the data?
Tobias
On Fri, Jul 18, 2014 at 8:11 AM, Bill Jay bill.jaypeter...@gmail.com
wrote:
I also have an issue consuming from Kafka. When I consume from Kafka,
there
You can create multiple kafka stream to partition your topics across them,
which will run multiple receivers or multiple executors. This is covered in
the Spark streaming guide.
http://spark.apache.org/docs/latest/streaming-programming-guide.html#level-of-parallelism-in-data-receiving
And for the
In my use case, if I need to stop spark streaming for a while, data would
accumulate a lot on kafka topic-partitions. After I restart spark streaming
job, the worker's heap will go out of memory on the fetch of the 1st batch.
I am wondering if
* Is there a way to throttle reading from kafka in
Maybe reducing the batch duration would help :\
2014-07-01 17:57 GMT+01:00 Chen Song chen.song...@gmail.com:
In my use case, if I need to stop spark streaming for a while, data would
accumulate a lot on kafka topic-partitions. After I restart spark streaming
job, the worker's heap will go
Hi,
On Wed, Jul 2, 2014 at 1:57 AM, Chen Song chen.song...@gmail.com wrote:
* Is there a way to control how far Kafka Dstream can read on
topic-partition (via offset for example). By setting this to a small
number, it will force DStream to read less data initially.
Please see the post at
14 matches
Mail list logo