Hi Satyajit, For the query/join part there is a couple of approaches. 1. create a dataframe from all incoming streaming batch (i.e. actually an rdd) and join with your reference data (coming from existing table) 2. you can use structure streaming that basically consists of schema in every batch (you can understand it as a stream of dataframes)
While joining with reference data, if it is static data then load once and persist it or if it is dynamic data then keep updating this at a regular interval. Best Regards, Vikash Pareek ----- __Vikash Pareek -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org