Have you taken a look at DStream.transformWith( ... ) . That allows you
apply arbitrary transformation between RDDs (of the same timestamp) of two
different streams.
So you can do something like this.
2s-window-stream.transformWith(1s-window-stream, (rdd1: RDD[...], rdd2:
RDD[...]) => {
...
// return a new RDD
})
And streamingContext.transform() extends it to N DStreams. :)
Hope this helps!
TD
On Wed, Jul 16, 2014 at 10:42 AM, Walrus theCat <[email protected]>
wrote:
> hey at least it's something (thanks!) ... not sure what i'm going to do if
> i can't find a solution (other than not use spark) as i really need these
> capabilities. anyone got anything else?
>
>
> On Wed, Jul 16, 2014 at 10:34 AM, Luis Ángel Vicente Sánchez <
> [email protected]> wrote:
>
>> hum... maybe consuming all streams at the same time with an actor that
>> would act as a new DStream source... but this is just a random idea... I
>> don't really know if that would be a good idea or even possible.
>>
>>
>> 2014-07-16 18:30 GMT+01:00 Walrus theCat <[email protected]>:
>>
>> Yeah -- I tried the .union operation and it didn't work for that reason.
>>> Surely there has to be a way to do this, as I imagine this is a commonly
>>> desired goal in streaming applications?
>>>
>>>
>>> On Wed, Jul 16, 2014 at 10:10 AM, Luis Ángel Vicente Sánchez <
>>> [email protected]> wrote:
>>>
>>>> I'm joining several kafka dstreams using the join operation but you
>>>> have the limitation that the duration of the batch has to be same,i.e. 1
>>>> second window for all dstreams... so it would not work for you.
>>>>
>>>>
>>>> 2014-07-16 18:08 GMT+01:00 Walrus theCat <[email protected]>:
>>>>
>>>> Hi,
>>>>>
>>>>> My application has multiple dstreams on the same inputstream:
>>>>>
>>>>> dstream1 // 1 second window
>>>>> dstream2 // 2 second window
>>>>> dstream3 // 5 minute window
>>>>>
>>>>>
>>>>> I want to write logic that deals with all three windows (e.g. when the
>>>>> 1 second window differs from the 2 second window by some delta ...)
>>>>>
>>>>> I've found some examples online (there's not much out there!), and I
>>>>> can only see people transforming a single dstream. In conventional spark,
>>>>> we'd do this sort of thing with a cartesian on RDDs.
>>>>>
>>>>> How can I deal with multiple Dstreams at once?
>>>>>
>>>>> Thanks
>>>>>
>>>>
>>>>
>>>
>>
>