Hi All,

Requirements:
I am working on the data flow, which will use the view definition(view
definition already defined in schema), there are multiple tables used in
the view definition. Here we want to stream the view data into elastic
index based on if any of the table(used in the view definition) data got
changed.


Current flow:
1. we are inserting id's from the table(which used in the view definition)
into the common table.
2. From the common table by using the id, we will be streaming the view
data (by using if any of the incomming id is present in the collective id
of all tables used from view definition) by using spark structured
streaming.


Issue:
1. Here we are facing issue - For each incomming id here we running view
definition(so it will read all the data from all the data) and check if any
of the incomming id is present in the collective id's of view result, Due
to which it is taking more memory in the cluster driver and taking more
time to process.


I am epxpecting an alternate solution, if we can avoid full scan of view
definition every time, If you have any alternate deisgn flow how we can
achieve the result, please suggest for the same.


Note: Also, it will be helpfull, if you can share the details like
community forum or platform to discuss this kind of deisgn related topics,
it will be more helpfull.

Reply via email to