date:20221120

Spark performance on small dataset

2022-11-20 Thread Prarthi Jain

Hi Everyone, Spark and the RDD approach it favors assumes that most applications run on big data and need massive parallelism via sharding and concurrent computing. But some tasks run on small data and do not need or benefit from RDD parallelism. How are these tasks expected to perform on Spark?

Re: VolcanoFeatureStep( Custom Scheduler ) not found in Spark 3.3.1 archive

2022-11-20 Thread Gnana Kumar

Thanks Chris for your guidance.Your maven commands have worked really. I have been able to build the source and generate the binary distribution as well using below steps. >mvn -Denforcer.skip=true -DrecompileMode=all -Pkubernetes -Pvolcano -Pscala-2.12 -DskipTests clean package