Single spark streaming job to read incoming events with dynamic schema

2020-11-16 Thread act_coder
I am trying to create a spark structured streaming job which reads from a Kafka topic and the events coming from that Kafka topic will have different schemas (There is no standard schema for the incoming events). Sample incoming events: event1: {timestamp:2018-09-28T15:50:57.2420418+00:00, value:

spark-sql on windows throws Exception in thread "main" java.lang.UnsatisfiedLinkError:

2020-11-16 Thread Mich Talebzadeh
Need to create some hive test tables for pyCharm SPARK_HOME is set up as D:\temp\spark-3.0.1-bin-hadoop2.7 HADOOP_HOME is c:\hadoop\ spark-shell works. Trying to run spark-sql, I get the following errors PS C:\tmp\hive> spark-sql log4j:WARN No appenders could be found for logger (org.apache.h

Re: Blacklisting in Spark Stateful Structured Streaming

2020-11-16 Thread Yuanjian Li
If you use the `flatMap/mapGroupsWithState` API for a "stateful" SS job, the blacklisting structure can be put into the user-defined state. To use a 3rd-party cache should also be a good choice. Eric Beabes 于2020年11月11日周三 上午6:54写道: > Currently we’ve a “Stateful” Spark Structured Streaming job th

Re: DStreams stop consuming from Kafka

2020-11-16 Thread liyuanjian
Maybe you can try the `foreachBatch` API in structured streaming, which allows reusing existing datasources. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apa