Re: Structured Streaming Spark 3.0.1

gshen Thu, 21 Jan 2021 13:45:30 -0800

Thanks for the tips!

I think I figured out what might be causing it. It's the checkpointing to
Microsoft Azure Data Lake Storage (ADLS).


When I use "local checkpointing" it works, but when i use fails when there's
a groupBy in the stream. Weirdly it works when there is no groupBy clause in
the stream.

It's able to create the checkpoint location and the base files on ADLS, but
it can't write any commits. Hence, why I see the first batch of records, but
then it crashes on writing the first commit.

Here's what my checkpointing location looks like:

abfss://"+container_name+"@"+storage_account_name+".dfs.core.windows.net/"+file_name

and my SparkSession:

spark = pyspark.sql.SparkSession.builder\
    .appName("pyspark-stream-read")\
    .master("spark://spark-master:7077")\
    .config("spark.executor.memory", "512m")\
    .config("spark.jars.packages",
"io.delta:delta-core_2.12:0.7.0,org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.1,org.apache.hadoop:hadoop-azure:3.2.1")
\
    .config("spark.sql.extensions",
"io.delta.sql.DeltaSparkSessionExtension") \
    .config("spark.sql.catalog.spark_catalog",
"org.apache.spark.sql.delta.catalog.DeltaCatalog") \
    .config("spark.delta.logStore.class",
"org.apache.spark.sql.delta.storage.AzureLogStore") \
    .getOrCreate()






--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Structured Streaming Spark 3.0.1

Reply via email to