I did try a test with spark 2.0 + spark-streaming-kafka-0-10-assembly. Partition was increased using "bin/kafka-topics.sh --alter" after spark job was started. I don't see messages from new partitions in the DStream.
KafkaUtils.createDirectStream[Array[Byte], Array[Byte]] ( > ssc, PreferConsistent, Subscribe[Array[Byte], Array[Byte]](topics, > kafkaParams) ) > .map(r => (r.key(), r.value())) Also, no.of partitions did not increase too. > dataStream.foreachRDD( (rdd, curTime) => { > logger.info(s"rdd has ${rdd.getNumPartitions} partitions.") Should I be setting some parameter/config? Is the doc for new integ available? Thanks, Srikanth On Fri, Jul 22, 2016 at 2:15 PM, Cody Koeninger <c...@koeninger.org> wrote: > No, restarting from a checkpoint won't do it, you need to re-define the > stream. > > Here's the jira for the 0.10 integration > > https://issues.apache.org/jira/browse/SPARK-12177 > > I haven't gotten docs completed yet, but there are examples at > > https://github.com/koeninger/kafka-exactly-once/tree/kafka-0.10 > > On Fri, Jul 22, 2016 at 1:05 PM, Srikanth <srikanth...@gmail.com> wrote: > > In Spark 1.x, if we restart from a checkpoint, will it read from new > > partitions? > > > > If you can, pls point us to some doc/link that talks about Kafka 0.10 > integ > > in Spark 2.0. > > > > On Fri, Jul 22, 2016 at 1:33 PM, Cody Koeninger <c...@koeninger.org> > wrote: > >> > >> For the integration for kafka 0.8, you are literally starting a > >> streaming job against a fixed set of topicapartitions, It will not > >> change throughout the job, so you'll need to restart the spark job if > >> you change kafka partitions. > >> > >> For the integration for kafka 0.10 / spark 2.0, if you use subscribe > >> or subscribepattern, it should pick up new partitions as they are > >> added. > >> > >> On Fri, Jul 22, 2016 at 11:29 AM, Srikanth <srikanth...@gmail.com> > wrote: > >> > Hello, > >> > > >> > I'd like to understand how Spark Streaming(direct) would handle Kafka > >> > partition addition? > >> > Will a running job be aware of new partitions and read from it? > >> > Since it uses Kafka APIs to query offsets and offsets are handled > >> > internally. > >> > > >> > Srikanth > > > > >