Great.
A case that I hope can be better documented, especially now that we have
Pandas API on Spark and many potential new users coming from Pandas.
Is how to start Spark with full available memory and CPU.
I use this function to do this in a notebook.

import multiprocessing
import os
import sys
from pyspark import SparkConf, SparkContext
from pyspark import pandas as ps
from pyspark.sql import *

os.environ["PYARROW_IGNORE_TIMEZONE"] = "1"

number_cores = int(multiprocessing.cpu_count())

mem_bytes = os.sysconf("SC_PAGE_SIZE") * os.sysconf("SC_PHYS_PAGES")  #
e.g. 4015976448
memory_gb = int(mem_bytes / (1024.0**3))  # e.g. 3.74



def get_spark_session(app_name: str, conf: SparkConf):
    conf.setMaster("local[{}]".format(number_cores))
    conf.set("spark.driver.memory", "{}g".format(memory_gb)).set(
        "spark.sql.adaptive.enabled", "True"
    ).set(
        "spark.serializer", "org.apache.spark.serializer.KryoSerializer"
    ).set(
        "spark.sql.repl.eagerEval.maxNumRows", "10000"
    ).set(
        "sc.setLogLevel", "ERROR"
    )

    return
SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate()


spark = get_spark_session("My app", SparkConf())

ons. 15. mar. 2023 kl. 19:27 skrev Denny Lee <denny.g....@gmail.com>:

> Thanks Mich for tackling this!  I encourage everyone to add to the list so
> we can have a comprehensive list of topics, eh?!
>
> On Wed, Mar 15, 2023 at 10:27 Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> Thanks to @Denny Lee <denny.g....@gmail.com>  to give access to
>>
>> https://www.linkedin.com/company/apachespark/
>>
>> and contribution from @asma zgolli <zgollia...@gmail.com>
>>
>> You will see my post at the bottom. Please add anything else on topics to
>> the list as a comment.
>>
>> We will then put them together in an article perhaps. Comments and
>> contributions are welcome.
>>
>> HTH
>>
>> Mich Talebzadeh,
>> Lead Solutions Architect/Engineering Lead,
>> Palantir Technologies Limited
>>
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Tue, 14 Mar 2023 at 15:09, Mich Talebzadeh <mich.talebza...@gmail.com>
>> wrote:
>>
>>> Hi Denny,
>>>
>>> That Apache Spark Linkedin page
>>> https://www.linkedin.com/company/apachespark/ looks fine. It also
>>> allows a wider audience to benefit from it.
>>>
>>> +1 for me
>>>
>>>
>>>
>>>    view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Tue, 14 Mar 2023 at 14:23, Denny Lee <denny.g....@gmail.com> wrote:
>>>
>>>> In the past, we've been using the Apache Spark LinkedIn page
>>>> <https://www.linkedin.com/company/apachespark/> and group to broadcast
>>>> these type of events - if you're cool with this?  Or we could go through
>>>> the process of submitting and updating the current
>>>> https://spark.apache.org or request to leverage the original Spark
>>>> confluence page <https://cwiki.apache.org/confluence/display/SPARK>.
>>>>  WDYT?
>>>>
>>>> On Mon, Mar 13, 2023 at 9:34 AM Mich Talebzadeh <
>>>> mich.talebza...@gmail.com> wrote:
>>>>
>>>>> Well that needs to be created first for this purpose. The appropriate
>>>>> name etc. to be decided. Maybe @Denny Lee <denny.g....@gmail.com>
>>>>> can facilitate this as he offered his help.
>>>>>
>>>>>
>>>>> cheers
>>>>>
>>>>>
>>>>>
>>>>>    view my Linkedin profile
>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>
>>>>>
>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>>> any loss, damage or destruction of data or any other property which may
>>>>> arise from relying on this email's technical content is explicitly
>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>> arising from such loss, damage or destruction.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, 13 Mar 2023 at 16:29, asma zgolli <zgollia...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello Mich,
>>>>>>
>>>>>> Can you please provide the link for the confluence page?
>>>>>>
>>>>>> Many thanks
>>>>>> Asma
>>>>>> Ph.D. in Big Data - Applied Machine Learning
>>>>>>
>>>>>> Le lun. 13 mars 2023 à 17:21, Mich Talebzadeh <
>>>>>> mich.talebza...@gmail.com> a écrit :
>>>>>>
>>>>>>> Apologies I missed the list.
>>>>>>>
>>>>>>> To move forward I selected these topics from the thread "Online
>>>>>>> classes for spark topics".
>>>>>>>
>>>>>>> To take this further I propose a confluence page to be seup.
>>>>>>>
>>>>>>>
>>>>>>>    1. Spark UI
>>>>>>>    2. Dynamic allocation
>>>>>>>    3. Tuning of jobs
>>>>>>>    4. Collecting spark metrics for monitoring and alerting
>>>>>>>    5.  For those who prefer to use Pandas API on Spark since the
>>>>>>>    release of Spark 3.2, What are some important notes for those users? 
>>>>>>> For
>>>>>>>    example, what are the additional factors affecting the Spark 
>>>>>>> performance
>>>>>>>    using Pandas API on Spark? How to tune them in addition to the 
>>>>>>> conventional
>>>>>>>    Spark tuning methods applied to Spark SQL users.
>>>>>>>    6. Spark internals and/or comparing spark 3 and 2
>>>>>>>    7. Spark Streaming & Spark Structured Streaming
>>>>>>>    8. Spark on notebooks
>>>>>>>    9. Spark on serverless (for example Spark on Google Cloud)
>>>>>>>    10. Spark on k8s
>>>>>>>
>>>>>>> Opinions and how to is welcome
>>>>>>>
>>>>>>>
>>>>>>>    view my Linkedin profile
>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>
>>>>>>>
>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>> for any loss, damage or destruction of data or any other property which 
>>>>>>> may
>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>> disclaimed. The author will in no case be liable for any monetary 
>>>>>>> damages
>>>>>>> arising from such loss, damage or destruction.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, 13 Mar 2023 at 16:16, Mich Talebzadeh <
>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi guys
>>>>>>>>
>>>>>>>> To move forward I selected these topics from the thread "Online
>>>>>>>> classes for spark topics".
>>>>>>>>
>>>>>>>> To take this further I propose a confluence page to be seup.
>>>>>>>>
>>>>>>>> Opinions and how to is welcome
>>>>>>>>
>>>>>>>> Cheers
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>    view my Linkedin profile
>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>
>>>>>>>>
>>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>>> for any loss, damage or destruction of data or any other property 
>>>>>>>> which may
>>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>>> disclaimed. The author will in no case be liable for any monetary 
>>>>>>>> damages
>>>>>>>> arising from such loss, damage or destruction.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>

-- 
Bjørn Jørgensen
Vestre Aspehaug 4, 6010 Ålesund
Norge

+47 480 94 297

Reply via email to