Re: spark - local question

2022-11-05 Thread Bjørn Jørgensen
I am using jupyter docker stack with spark. So I started a new notebook and this code. import multiprocessing from pyspark.sql import SparkSession from pyspark import SparkConf, SparkContext import time t1 = time.time() number_cores = int(multiprocessing.cpu_count()) memory_gb = 4 def

Re: [EXTERNAL] Re: Re: Re: Re: Stage level scheduling - lower the number of executors when using GPUs

2022-11-05 Thread Shay Elbaz
Consider this: 1. The application is allowed to use only 20 GPUs. 2. To ensure exactly 20 GPUs, I use the df.rdd.repartition(20).withResources(gpus.build).mapPartitions(func) technique. (maxExecutors >> 20). 3. Given the volume of the input data, it takes 20 hours total to run the DL