Hi

I was looking at the new full outer join. This seems to be working fine for
my use case however I have a question regarding the state size.

I have 2 streams each will have 100's of million unique keys. Also, Each of
these will get the updated value of keys 100's of times per day.

As per my understanding in full outer join flink will keep all the values
of the keys which it has seen in the state and whenever a new value comes
from
1 of the stream. It will be joined against all of the key values which were
there for 2nd stream.It could be 1 or 100's of rows. This seems inefficient
but my question is more on the state side. Thus, I will need to keep
billion's of values in state on both side. This will be very expensive.

It is a non windowed join. A key can recieve updates for 50-60 days and
after that it wont get any updates on any of the streams.

Is there a way we could use a state such that only 1 value per key is
retained in the state to reduce the size of the state?

I am using the Table API but could use the Datastream api if needed.

Thanks

Reply via email to