date:20210706

Unsubscribe

2021-07-06 Thread sids m

Unsubscribe

Unsubscribe

2021-07-06 Thread kushagra deep

Unsubscribe

Unsubscribe

2021-07-06 Thread Rishi Raj Tandon

Unsubscribe On Tue, 6 Jul 2021, 12:54 pm Ramkumar V, wrote: > Unsubscribe > >

Re: Bechmarks on Spark running on Yarn versus Spark on K8s

2021-07-06 Thread Mich Talebzadeh

I had a chance to look at this paper. I have reservations about this benchmark. They have used Google Dataproc which you can create a cluster of it with Hadoop and Spark (they used Spark 3) and decides on the number of worker nodes This is the layout of their set up Setup This benchmark

Re: OutOfMemoryError

2021-07-06 Thread Mich Talebzadeh

Personally rather than Parameters here: val spark = SparkSession .builder .master("local[*]") .appName("OOM") .config("spark.driver.host", "localhost") .config("spark.driver.maxResultSize", "0") .config("spark.sql.caseSensitive", "false") .config("spark.sql.adaptive.enabled",

Re: Spark AQE Post-Shuffle partitions coalesce don't work as expected, and even make data skew in some partitions. Need help to debug issue.

2021-07-06 Thread Mich Talebzadeh

Ok Nick let us have a look at this. Your raw size is variable as per json. One row may be x bytes and the other x*y bytes. Sounds like you run this batch through some cron or airflow and you carry on from where checkpointLocation points to the last processed records. You end up with executors

Unsubscribe

2021-07-06 Thread Ramkumar V

Unsubscribe

Re: OutOfMemoryError

2021-07-06 Thread javaguy Java

Hi Sean, thx for the tip. I'm just running my app via spark-submit on CLI ie >spark-submit --class X --master local[*] assembly.jar so I'll now add to CLI args ie: spark-submit --class X --master local[*] --driver-memory 8g assembly.jar etc. Unless I have this wrong? Thx On Thu, Jul 1, 2021