Hi,
I understand that Zstd compression can optionally be provided a dictionary
object to improve performance. See “training mode” here
https://facebook.github.io/zstd/
Does Spark surface a way to provide this dictionary object when writing/reading
data? What about for intermediate shuffle resu
vice, one verified and tested result holds more weight than a thousand expert
opinions.
On Sat, 17 Feb 2024 at 23:40, Saha, Daniel wrote:
Hi,
Background: I am running into executor disk space issues when running a
long-lived Spark 3.3 app with YARN on AWS EMR. The app performs back-to-back
spar