Thanks to TD, the savior! Shall look into it.
On Thu, Mar 15, 2018 at 1:04 AM, Tathagata Das <tathagata.das1...@gmail.com> wrote: > Relevant: https://databricks.com/blog/2018/03/13/ > introducing-stream-stream-joins-in-apache-spark-2-3.html > > This is true stream-stream join which will automatically buffer delayed > data and appropriately join stuff with SQL join semantics. Please check it > out :) > > TD > > > > On Wed, Mar 14, 2018 at 12:07 PM, Dylan Guedes <djmggue...@gmail.com> > wrote: > >> I misread it, and thought that you question was if pyspark supports kafka >> lol. Sorry! >> >> On Wed, Mar 14, 2018 at 3:58 PM, Aakash Basu <aakash.spark....@gmail.com> >> wrote: >> >>> Hey Dylan, >>> >>> Great! >>> >>> Can you revert back to my initial and also the latest mail? >>> >>> Thanks, >>> Aakash. >>> >>> On 15-Mar-2018 12:27 AM, "Dylan Guedes" <djmggue...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> I've been using the Kafka with pyspark since 2.1. >>>> >>>> On Wed, Mar 14, 2018 at 3:49 PM, Aakash Basu < >>>> aakash.spark....@gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> I'm yet to. >>>>> >>>>> Just want to know, when does Spark 2.3 with 0.10 Kafka Spark Package >>>>> allows Python? I read somewhere, as of now Scala and Java are the >>>>> languages >>>>> to be used. >>>>> >>>>> Please correct me if am wrong. >>>>> >>>>> Thanks, >>>>> Aakash. >>>>> >>>>> On 14-Mar-2018 8:24 PM, "Georg Heiler" <georg.kf.hei...@gmail.com> >>>>> wrote: >>>>> >>>>>> Did you try spark 2.3 with structured streaming? There watermarking >>>>>> and plain sql might be really interesting for you. >>>>>> Aakash Basu <aakash.spark....@gmail.com> schrieb am Mi. 14. März >>>>>> 2018 um 14:57: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> >>>>>>> >>>>>>> *Info (Using):Spark Streaming Kafka 0.8 package* >>>>>>> >>>>>>> *Spark 2.2.1* >>>>>>> *Kafka 1.0.1* >>>>>>> >>>>>>> As of now, I am feeding paragraphs in Kafka console producer and my >>>>>>> Spark, which is acting as a receiver is printing the flattened words, >>>>>>> which >>>>>>> is a complete RDD operation. >>>>>>> >>>>>>> *My motive is to read two tables continuously (being updated) as two >>>>>>> distinct Kafka topics being read as two Spark Dataframes and join them >>>>>>> based on a key and produce the output. *(I am from Spark-SQL >>>>>>> background, pardon my Spark-SQL-ish writing) >>>>>>> >>>>>>> *It may happen, the first topic is receiving new data 15 mins prior >>>>>>> to the second topic, in that scenario, how to proceed? I should not lose >>>>>>> any data.* >>>>>>> >>>>>>> As of now, I want to simply pass paragraphs, read them as RDD, >>>>>>> convert to DF and then join to get the common keys as the output. (Just >>>>>>> for >>>>>>> R&D). >>>>>>> >>>>>>> Started using Spark Streaming and Kafka today itself. >>>>>>> >>>>>>> Please help! >>>>>>> >>>>>>> Thanks, >>>>>>> Aakash. >>>>>>> >>>>>> >>>> >> >