Did you try spark 2.3 with structured streaming? There watermarking and
plain sql might be really interesting for you.
Aakash Basu <aakash.spark....@gmail.com> schrieb am Mi. 14. März 2018 um
14:57:

> Hi,
>
>
>
> *Info (Using):Spark Streaming Kafka 0.8 package*
>
> *Spark 2.2.1*
> *Kafka 1.0.1*
>
> As of now, I am feeding paragraphs in Kafka console producer and my Spark,
> which is acting as a receiver is printing the flattened words, which is a
> complete RDD operation.
>
> *My motive is to read two tables continuously (being updated) as two
> distinct Kafka topics being read as two Spark Dataframes and join them
> based on a key and produce the output. *(I am from Spark-SQL background,
> pardon my Spark-SQL-ish writing)
>
> *It may happen, the first topic is receiving new data 15 mins prior to the
> second topic, in that scenario, how to proceed? I should not lose any data.*
>
> As of now, I want to simply pass paragraphs, read them as RDD, convert to
> DF and then join to get the common keys as the output. (Just for R&D).
>
> Started using Spark Streaming and Kafka today itself.
>
> Please help!
>
> Thanks,
> Aakash.
>

Reply via email to