Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-05 Thread Mich Talebzadeh
OK I found a workaround. Basically each stream state is not kept and I have two streams. One is a business topic and the other one created to shut down spark structured streaming gracefully. I was interested to print the value for the most recent batch Id for the business topic called "md" here u

Data duplication and loss occur after executing 'insert overwrite...' in Spark 3.1.1

2023-03-05 Thread 周锋
Hi all, We are currently using Spark version 3.1.1 in our production environment. We have noticed that occasionally, after executing 'insert overwrite ... select', the resulting data is inconsistent, with some data being duplicated or lost. This issue does not occur all the time and seems to b

[Spark Structured Streaming] Do spark structured streaming is support sink to AWS Kinesis currently and how to handle if achieve quotas of kinesis?

2023-03-05 Thread hueiyuan su
*Component*: Spark Structured Streaming *Level*: Advanced *Scenario*: How-to *Problems Description* 1. I currently would like to use pyspark structured streaming to write data to kinesis. But it seems like does not have corresponding connector can use. I would confirm whet