15, 2015 8:38 AM
To: 'Ilove Data'; 'Tathagata Das'
Cc: 'Akhil Das'; 'user'
Subject: RE: Join between DStream and Periodically-Changing-RDD
Then go for the second option I suggested - simply turn (keep turning) your
HDFS file (Batch RDD) into a str
; user
Subject: Re: Join between DStream and Periodically-Changing-RDD
@Akhil Das
Join two Dstreams might not be an option since I want to join stream with
historical data in HDFS folder.
@Tagatha Das & @Evo Eftimov
Batch RDD to be reloaded is considerably huge compare to Dstream data s
eam
>> RDDs
>>
>>
>>
>> You can feed your HDFS file into a Message Broker topic and consume it
>> from there in the form of DStream RDDs which you keep aggregating over the
>> lifetime of the spark streaming app instance
>>
>>
>>
to:ak...@sigmoidanalytics.com]
> *Sent:* Wednesday, June 10, 2015 8:36 AM
> *To:* Ilove Data
> *Cc:* user@spark.apache.org
> *Subject:* Re: Join between DStream and Periodically-Changing-RDD
>
>
>
> RDD's are immutable, why not join two DStreams?
>
Data
Cc: user@spark.apache.org
Subject: Re: Join between DStream and Periodically-Changing-RDD
RDD's are immutable, why not join two DStreams?
Not sure, but you can try something like this also:
kvDstream.foreachRDD(rdd => {
val file = ssc.sparkContext.textFile(&
RDD's are immutable, why not join two DStreams?
Not sure, but you can try something like this also:
kvDstream.foreachRDD(rdd => {
val file = ssc.sparkContext.textFile("/sigmoid/")
val kvFile = file.map(x => (x.split(",")(0), x))
rdd.join(kvFile)
})
Thanks
Best Regards