dear spark contributors, I'm searching for a way to model spark shuffle cost and i wonder if there s mathematic formulas to compute "shuffle read " and "shuffle write" sizes in the stages view in spark UI. if there isn't, are there any references to head start in this. Stage Id ▾ <http://localhost:4040/stages/?&completedStage.sort=Stage+Id&completedStage.desc=false&completedStage.pageSize=100#completed> Description <http://localhost:4040/stages/?&completedStage.sort=Description&completedStage.pageSize=100#completed> Submitted <http://localhost:4040/stages/?&completedStage.sort=Submitted&completedStage.pageSize=100#completed> Duration <http://localhost:4040/stages/?&completedStage.sort=Duration&completedStage.pageSize=100#completed>Tasks: Succeeded/TotalInput <http://localhost:4040/stages/?&completedStage.sort=Input&completedStage.pageSize=100#completed> Output <http://localhost:4040/stages/?&completedStage.sort=Output&completedStage.pageSize=100#completed>Shuffle Read <http://localhost:4040/stages/?&completedStage.sort=Shuffle+Read&completedStage.pageSize=100#completed>Shuffle Write <http://localhost:4040/stages/?&completedStage.sort=Shuffle+Write&completedStage.pageSize=100#completed>
thank you for the help and the directions yours sincerely Asma ZGOLLI Ph.D. student in data engineering - computer science