Yeah it's awkward, the transforms being done are fairly time sensitive, so I 
don't want them to wait 60 seconds or more.

I might have to move the code from a transform into a custom receiver instead, 
so they'll be processed outside the window length. A buffered writer is a good 
idea too, thanks.

Thanks,
Ewan

From: Ashic Mahtab [mailto:as...@live.com]
Sent: 31 December 2015 13:50
To: Ewan Leith <ewan.le...@realitymine.com>; Apache Spark 
<user@spark.apache.org>
Subject: RE: Batch together RDDs for Streaming output, without delaying 
execution of map or transform functions

Hi Ewan,
Transforms are definitions of what needs to be done - they don't execute until 
and action is triggered. For what you want, I think you might need to have an 
action that writes out rdds to some sort of buffered writer.

-Ashic.
________________________________
From: ewan.le...@realitymine.com<mailto:ewan.le...@realitymine.com>
To: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Batch together RDDs for Streaming output, without delaying execution 
of map or transform functions
Date: Thu, 31 Dec 2015 11:35:37 +0000
Hi all,

I'm sure this must have been solved already, but I can't see anything obvious.

Using Spark Streaming, I'm trying to execute a transform function on a DStream 
at short batch intervals (e.g. 1 second), but only write the resulting data to 
disk using saveAsTextFiles in a larger batch after a longer delay (say 60 
seconds).

I thought the ReceiverInputDStream window function might be a good help here, 
but instead, applying it to a transformed DStream causes the transform function 
to only execute at the end of the window too.

Has anyone got a solution to this?

Thanks,
Ewan



Reply via email to