Re: about cpu cores

2022-07-10 Thread Tufan Rakshit
Mainly depends what your cluster manager Yarn or kubernates ? Best Tufan On Sun, 10 Jul 2022 at 14:38, Sean Owen wrote: > Jobs consist of tasks, each of which consumes a core (can be set to >1 > too, but that's a different story). If there are more tasks ready to > execute than available cores,

Re: about cpu cores

2022-07-10 Thread Sean Owen
Jobs consist of tasks, each of which consumes a core (can be set to >1 too, but that's a different story). If there are more tasks ready to execute than available cores, some tasks simply wait. On Sun, Jul 10, 2022 at 3:31 AM Yong Walt wrote: > given my spark cluster has 128 cores totally. > If

Re: [EXTERNAL] RDD.pipe() for binary data

2022-07-10 Thread Shay Elbaz
Yuhao, You can use pyspark as entrypoint to your application. With py4j you can call Java/Scala functions from the python application. There's no need to use the pipe() function for that. Shay From: Yuhao Zhang Sent: Saturday, July 9, 2022 4:13:42 AM To:

about cpu cores

2022-07-10 Thread Yong Walt
given my spark cluster has 128 cores totally. If the jobs (each job was assigned only one core) I submitted to the cluster are over 128, what will happen? Thank you.

reading each JSON file from dataframe...

2022-07-10 Thread Muthu Jayakumar
Hello there, I have a dataframe with the following... +-+---+---+ |entity_id|file_path |other_useful_id|