OK the number of partitions n or more to the point the "optimum" no of
partitions depends on the size of your batch data DF among other things and
the degree of parallelism at the end point where you will be writing to
sink. If you require high parallelism because your tasks are fine grained,
then
Is this the point you are trying to implement?
I have state data source which enables the state in SS --> Structured
Streaming to be rewritten, which enables repartitioning, schema
evolution, etc via batch query. The writer requires hash partitioning
against group key, with the "desired number of
Hi All,
I'm developing a DataSource on Spark 3.2 to write data to our system,
and using DataSource V2 API. I want to implement the interface
RequiresDistributionAndOrdering