How to generate unique incrementing identifier in a structured streaming dataframe

Felix Kizhakkel Jose Tue, 13 Jul 2021 11:52:50 -0700

Hello,

I am using Spark Structured Streaming to sink data from Kafka to AWS S3. I
am wondering if its possible for me to introduce a uniquely incrementing
identifier for each record as we do in RDBMS (incrementing long id)?
This would greatly benefit to range prune while reading based on this ID.


Any thoughts? I have looked at monotonically_incrementing_id but seems like
its not deterministic and it wont ensure new records gets next id from the
latest id what  is already present in the storage (S3)

Regards,
Felix K Jose

How to generate unique incrementing identifier in a structured streaming dataframe

Reply via email to