Hi,

What's the best way to assign a truly unique row ID (rather than a hash) to
a DataFrame/Dataset?

I originally thought that functions.monotonically_increasing_id would do
this, but it seems to have a rather unfortunate property that if you add it
as a column to table A and then derive tables X, Y, Z and save those, the
row ID values in X, Y, and Z may end up different. I assume this is because
it delays the actual computation to the point where each of those tables is
computed.

Reply via email to