Hi All, Requirements: I am working on the data flow, which will use the view definition(view definition already defined in schema), there are multiple tables used in the view definition. Here we want to stream the view data into elastic index based on if any of the table(used in the view definition) data got changed.
Current flow: 1. we are inserting id's from the table(which used in the view definition) into the common table. 2. From the common table by using the id, we will be streaming the view data (by using if any of the incomming id is present in the collective id of all tables used from view definition) by using spark structured streaming. Issue: 1. Here we are facing issue - For each incomming id here we running view definition(so it will read all the data from all the data) and check if any of the incomming id is present in the collective id's of view result, Due to which it is taking more memory in the cluster driver and taking more time to process. I am epxpecting an alternate solution, if we can avoid full scan of view definition every time, If you have any alternate deisgn flow how we can achieve the result, please suggest for the same. Note: Also, it will be helpfull, if you can share the details like community forum or platform to discuss this kind of deisgn related topics, it will be more helpfull.