Hi, What is the recommended way to share datasets across multiple spark-streaming applications, so that the incoming data can be looked up against this shared dataset?
The shared dataset is also incrementally refreshed and stored on S3. Below is the scenario. Streaming App-1 consumes data from Source-1 and writes to DS-1 in S3. Streaming App-2 consumes data from Source-2 and writes to DS-2 in S3. Streaming App-3 consumes data from Source-3, *needs to lookup against DS-1 and DS-2* and write to DS-3 in S3. Streaming App-4 consumes data from Source-4, *needs to lookup against DS-1 and DS-2 *and write to DS-3 in S3. Streaming App-n consumes data from Source-n, *needs to lookup against DS-1 and DS-2 *and write to DS-n in S3. So DS-1 and DS-2 ideally should be shared for lookup across multiple streaming apps. Any input is appreciated. Thank you!