I think this could be of some help to you. https://issues.apache.org/jira/browse/SPARK-3660
On Tue, Feb 24, 2015 at 2:18 AM, Matus Faro <matus.f...@kik.com> wrote: > Hi, > > Our application is being designed to operate at all times on a large > sliding window (day+) of data. The operations performed on the window > of data will change fairly frequently and I need a way to save and > restore the sliding window after an app upgrade without having to wait > the duration of the sliding window to "warm up". Because it's an app > upgrade, checkpointing will not work unfortunately. > > I can potentially dump the window to an outside storage periodically > or on app shutdown, but I don't have an ideal way of restoring it. > > I thought about two non-ideal solutions: > 1. Load the previous data all at once into the sliding window on app > startup. The problem is, at one point I will have double the data in > the sliding window until the initial batch of data goes out of scope. > 2. Broadcast the previous state of the window separately from the > window. Perform the operations on both sets of data until it comes out > of scope. The problem is, the data will not fit into memory. > > Solutions that would solve my problem: > 1. Ability to pre-populate sliding window. > 2. Have control over batch slicing. It would be nice for a Receiver to > dictate the current batch timestamp in order to slow down or fast > forward time. > > Any feedback would be greatly appreciated! > > Thank you, > Matus > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- [image: Sigmoid Analytics] <http://htmlsig.com/www.sigmoidanalytics.com> *Arush Kharbanda* || Technical Teamlead ar...@sigmoidanalytics.com || www.sigmoidanalytics.com