I noticed that repartition will result in non-deterministic lineage because
it'll result in changed orders for rows.

So for instance, if you do things like:

val data = read(...)
val k = data.repartition(5)
val h = k.repartition(5)

It seems that this results in different ordering of rows for 'k' each time
you call it.
And because of this different ordering, 'h' will result in different
partitions even, because 'repartition' distributes through a random number
generator with the 'index' as the key.

Reply via email to