Here are couple of ideas. 1. You can set up a Structured Streaming query to update in-memory table. Look at the memory sink in the programming guide - http://spark.apache.org/ docs/latest/structured-streaming-programming-guide.html#output-sinks So you can query the latest table using a specified table name, and also join that table with another stream. However, note that this in-memory table is maintained in the driver, and so you have be careful about the size of the table.
2. If you cannot define a streaming query in the slow moving due to unavailability of connector for your streaming data source, then you can always define a batch Dataframe and register it as view, and then run a background then periodically creates a new Dataframe with updated data and re-registers it as a view with the same name. Any streaming query that joins a streaming dataframe with the view will automatically start using the most updated data as soon as the view is updated. Hope this helps. On Thu, Apr 20, 2017 at 1:30 PM, Hemanth Gudela <hemanth.gud...@qvantel.com> wrote: > Thanks Georg for your reply. > > But I’m not sure if I fully understood your answer. > > > > If you meant to join two streams (one reading Kafka, and another reading > database table), then I think it’s not possible, because > > 1. According to documentation > <http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#data-sources>, > Structured streaming does not support database as a streaming source > > 2. Joining between two streams is not possible yet. > > > > Regards, > > Hemanth > > > > *From: *Georg Heiler <georg.kf.hei...@gmail.com> > *Date: *Thursday, 20 April 2017 at 23.11 > *To: *Hemanth Gudela <hemanth.gud...@qvantel.com>, "user@spark.apache.org" > <user@spark.apache.org> > *Subject: *Re: Spark structured streaming: Is it possible to periodically > refresh static data frame? > > > > What about treating the static data as a (slow) stream as well? > > > > Hemanth Gudela <hemanth.gud...@qvantel.com> schrieb am Do., 20. Apr. 2017 > um 22:09 Uhr: > > Hello, > > > > I am working on a use case where there is a need to join streaming data > frame with a static data frame. > > The streaming data frame continuously gets data from Kafka topics, whereas > static data frame fetches data from a database table. > > > > However, as the underlying database table is getting updated often, I must > somehow manage to refresh my static data frame periodically to get the > latest information from underlying database table. > > > > My questions: > > 1. Is it possible to periodically refresh static data frame? > > 2. If refreshing static data frame is not possible, is there a > mechanism to automatically stop & restarting spark structured streaming > job, so that every time the job restarts, the static data frame gets > updated with latest information from underlying database table. > > 3. If 1) and 2) are not possible, please suggest alternatives to > achieve my requirement described above. > > > > Thanks, > > Hemanth > >