Hi, I have two streaming RDD1 and RDD2 and want to cogroup them. Data don't come in the same time and sometimes they could come with some delay. When I get all data I want to insert in MongoDB.
For example, imagine that I get: RDD1 --> T 0 RDD2 -->T 0.5 I do cogroup between them but I couldn't store in Mongo yet because it could come more data in the next windows/slide. RDD2' -->T 1.5 Another RDD2' comes, I only want to save in Mongo once. So, I should only save it when I get all data. What I know it's how long I should wait as much. Ideally, I would like to save in MongoDB in the last slide for each RDD when I know that there is not possible to get more RDD2 to join with RDD1. Is it possible? how? Maybe there is other way to resolve this problem, any idea?