
Could you please set the config
to 0 and see whether it works? (NOTE: will slow down your process since the
cleaning phase will happen in the foreground. The default is background
with 1 thread. You can try out more threads than 1.)
If it doesn't help, please turn on the DEBUG log level for the package
and grep the log messages from SourceFileArchiver & SourceFileRemover.

Jungtaek Lim (HeartSaVioR)

On Thu, Jan 27, 2022 at 9:56 PM Gabriela Dvořáková
<gabri...@monthio.com.invalid> wrote:

> Hi,
> I am writing to ask for advice regarding the cleanSource option of the
> DataStreamReader. I am using pyspark with Spark 3.1. via Azure Synapse. To
> my knowledge, cleanSource option was introduced in Spark version 3. I'd
> spent a significant amount of time trying to configure this option with
> both "archive" and "delete" options, but the streaming seems to only
> process data in the source data lake storage account container, and store
> them in the sink storage account data lake container. The archive folder is
> never created nor any of the source processed files are removed. None of
> the forums or stackoverflow have been of any help so far, so I am reaching
> out to you if you perhaps have any tips on how to get it running? Here is
> my code:
> Reading:
> df = (spark
> .readStream
> .option("sourceArchiveDir", f
> dfs.core.windows.net/budget-app/budgetOutput/archived-v5')
> .option("cleanSource", "archive")
> .format('json')
> .schema(schema)
> --
> ...Processing...
> Writing:
> (
> df.writeStream
> .format("delta")
> .outputMode('append')
> .option("checkpointLocation", RAW_DATA_LAKE_CHECKPOINT_PATH)
> .trigger(once=True)
> .partitionBy("Year", "Month", "clientId")
> .awaitTermination()
> )
> Thank you very much for help,
> Gabriela
> _____________________________________
> Med venlig hilsen / Best regards
> Gabriela Dvořáková
> Developer | monthio
> M: +421902480757
> E: gabri...@monthio.com
> W: www.monthio.com
> Monthio Aps, Ragnagade 7, 2100 Copenhagen
> Create personal wealth and healthy economy
> for people by changing the ways of banking"

Reply via email to