Yes, Spark in local mode works :)
One tip
If you just start it, then the default settings is one core and 1 GB.
I'm using this func to start spark in local mode to get all cors and max RAM
import multiprocessing
import os
from pyspark.sql import SparkSession
from pyspark import SparkConf, SparkContext
number_cores = int(multiprocessing.cpu_count())
mem_bytes = os.sysconf("SC_PAGE_SIZE") * os.sysconf("SC_PHYS_PAGES") #
e.g. 4015976448
memory_gb = int(mem_bytes / (1024.0**3)) # e.g. 3.74
def get_spark_session(app_name: str, conf: SparkConf):
conf.setMaster("local[{}]".format(number_cores))
conf.set("spark.driver.memory", "{}g".format(memory_gb)).set(
"spark.sql.adaptive.enabled", "True"
).set(
"spark.serializer", "org.apache.spark.serializer.KryoSerializer"
).set(
"spark.sql.repl.eagerEval.maxNumRows", "100"
).set(
"sc.setLogLevel", "ERROR"
)
return
SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate()
spark = get_spark_session("My_app", SparkConf())
Now when you type spark you will see something like this.
SparkSession - in-memory
SparkContext
Spark UI
Version v3.4.0-SNAPSHOT
Master local[16]
AppName My_app
man. 31. okt. 2022 kl. 14:50 skrev Sean Owen <[email protected]>:
> Sure, as stable and available as your machine is. If you don't need fault
> tolerance or scale beyond one machine, sure.
>
> On Mon, Oct 31, 2022 at 8:43 AM 张健BJ <[email protected]> wrote:
>
>> Dear developers:
>> I have a question about the pyspark local
>> mode. Can it be used in production and Will it cause unexpected problems?
>> The scenario is as follows:
>>
>> Our team wants to develop an etl component based on python language. Data
>> can be transferred between various data sources.
>>
>> If there is no yarn environment, can we read data from Database A and write
>> it to Database B in local mode.Will this function be guaranteed to be stable
>> and available?
>>
>>
>>
>> Thanks,
>> Look forward to your reply
>>
>
--
Bjørn Jørgensen
Vestre Aspehaug 4, 6010 Ålesund
Norge
+47 480 94 297