[Spark Streaming][Problem with DataFrame UDFs]

2016-01-20 Thread jpocalan
Hi, I have an application which creates a Kafka Direct Stream from 1 topic having 5 partitions. As a result each batch is composed of an RDD having 5 partitions. In order to apply transformation to my batch I have decided to convert the RDD to DataFrame (DF) so that I can easily add column to the

Re: Kafka - streaming from multiple topics

2015-12-16 Thread jpocalan
Nevermind, I found the answer to my questions. The following spark configuration property will allow you to process multiple KafkaDirectStream in parallel: --conf spark.streaming.concurrentJobs= -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-streami