Hi Satyajit,

For the query/join part there is a couple of approaches.
1. create a dataframe from all incoming streaming batch (i.e. actually an
rdd) and join with your reference data (coming from existing table) 2. you
can use structure streaming that basically consists of the schema in every
batch (you can understand it as a stream of dataframes)

While joining with reference data, if it is static data then load once and
persist it or if it is dynamic data then keep updating this at a regular
interval.


Best Regards,
Vikash Pareek




-----

__Vikash Pareek
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to