Is it possible to configure batch size immutability between two `mapInPandas`?

Azamat G. Wed, 07 Aug 2024 16:44:28 -0700

I have two sequential 'mapInPandas' in code. In the first 'mapInPandas' I
yield a Pandas dataframe, for example, a size of 70_000. In the second
'mapInPandas' I want to get this Pandas dataframe from the input iterator
without changing size. Is it possible to configure? I was trying to set
'spark.sql.execution.arrow.maxRecordsPerBatch' = 80_000, and in this case I
get a dataframe of size 80_000 in the second 'mapInPandas'.

Is it possible to configure batch size immutability between two `mapInPandas`?

Reply via email to