Hello,

I am using Spark Structured Streaming to sink data from Kafka to AWS S3. I
am wondering if its possible for me to introduce a uniquely incrementing
identifier for each record as we do in RDBMS (incrementing long id)?
This would greatly benefit to range prune while reading based on this ID.

Any thoughts? I have looked at monotonically_incrementing_id but seems like
its not deterministic and it wont ensure new records gets next id from the
latest id what  is already present in the storage (S3)

Regards,
Felix K Jose

Reply via email to