Did you try spark 2.3 with structured streaming? There watermarking and plain sql might be really interesting for you. Aakash Basu <aakash.spark....@gmail.com> schrieb am Mi. 14. März 2018 um 14:57:
> Hi, > > > > *Info (Using):Spark Streaming Kafka 0.8 package* > > *Spark 2.2.1* > *Kafka 1.0.1* > > As of now, I am feeding paragraphs in Kafka console producer and my Spark, > which is acting as a receiver is printing the flattened words, which is a > complete RDD operation. > > *My motive is to read two tables continuously (being updated) as two > distinct Kafka topics being read as two Spark Dataframes and join them > based on a key and produce the output. *(I am from Spark-SQL background, > pardon my Spark-SQL-ish writing) > > *It may happen, the first topic is receiving new data 15 mins prior to the > second topic, in that scenario, how to proceed? I should not lose any data.* > > As of now, I want to simply pass paragraphs, read them as RDD, convert to > DF and then join to get the common keys as the output. (Just for R&D). > > Started using Spark Streaming and Kafka today itself. > > Please help! > > Thanks, > Aakash. >