Unsubscribe
Unsubscribe
Unsubscribe
On Tue, 6 Jul 2021, 12:54 pm Ramkumar V, wrote:
> Unsubscribe
>
>
I had a chance to look at this paper.
I have reservations about this benchmark. They have used Google Dataproc
which you can create a cluster of it with Hadoop and Spark (they used
Spark 3) and decides on the number of worker nodes
This is the layout of their set up
Setup
This benchmark
Personally rather than
Parameters here:
val spark = SparkSession
.builder
.master("local[*]")
.appName("OOM")
.config("spark.driver.host", "localhost")
.config("spark.driver.maxResultSize", "0")
.config("spark.sql.caseSensitive", "false")
.config("spark.sql.adaptive.enabled",
Ok Nick let us have a look at this.
Your raw size is variable as per json. One row may be x bytes and the other
x*y bytes. Sounds like you run this batch through some cron or airflow and
you carry on from where checkpointLocation points to the last processed
records.
You end up with executors
Unsubscribe
Hi Sean, thx for the tip. I'm just running my app via spark-submit on CLI
ie >spark-submit --class X --master local[*] assembly.jar so I'll now add
to CLI args ie: spark-submit --class X --master local[*]
--driver-memory 8g assembly.jar
etc.
Unless I have this wrong?
Thx
On Thu, Jul 1, 2021
Unsubscribe
Unsubscribe
11 matches
Mail list logo