Re: Spark streaming with Kafka

2020-07-02 Thread dwgw
Hi I am able to correct the issue. The issue was due to wrong version of JAR file I have used. I have removed the these JAR files and copied correct version of JAR files and the error has gone away. Regards -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Re: Spark streaming with Kafka

2020-07-02 Thread Jungtaek Lim
I can't reproduce. Could you please make sure you're running spark-shell with official spark 3.0.0 distribution? Please try out changing the directory and using relative path like "./spark-shell". On Thu, Jul 2, 2020 at 9:59 PM dwgw wrote: > Hi > I am trying to stream kafka topic from spark

Re: Failure Threshold in Spark Structured Streaming?

2020-07-02 Thread Jungtaek Lim
Structured Streaming is basically following SQL semantic, which doesn't have such a semantic of "max allowance of failures". If you'd like to tolerate malformed data, please read with raw format (string or binary) which won't fail with such data, and try converting. e.g. from_json() will produce

Announcing .NET for Apache Spark™ 0.12

2020-07-02 Thread Terry Kim
We are happy to announce that .NET for Apache Spark™ v0.12 has been released ! Thanks to the community for the great feedback. The release note includes the full list of

Hyperspace v0.1 is now open-sourced!

2020-07-02 Thread Terry Kim
Hi all, We are happy to announce the open-sourcing of Hyperspace v0.1, an indexing subsystem for Apache Spark™: - Code: https://github.com/microsoft/hyperspace - Blog Article: https://aka.ms/hyperspace-blog - Spark Summit Talk:

Failure Threshold in Spark Structured Streaming?

2020-07-02 Thread Eric Beabes
Currently my job fails even on a single failure. In other words, even if one incoming message is malformed the job fails. I believe there's a property that allows us to set an acceptable number of failures. I Googled but couldn't find the answer. Can someone please help? Thanks.

Re: File Not Found: /tmp/spark-events in Spark 3.0

2020-07-02 Thread Xin Jinhan
Hi, First, the '/tmp/spark-events' is the default storage location of spark eventLog, but the log will be stored in it only when the 'spark.eventLog.enabled' is true, which your spark 2.4.6 may set to false. So you can try to set false and the error may disappear. Second, I suggest enable

Re: File Not Found: /tmp/spark-events in Spark 3.0

2020-07-02 Thread Zero
This could be the result of you not setting the location of eventLog properly. By default, it's/TMP/Spark-Events, and since the files in the/TMP directory are cleaned up regularly, you could have this problem. --Original-- From:"Xin Jinhan"<18183124...@163.com;

Re: File Not Found: /tmp/spark-events in Spark 3.0

2020-07-02 Thread Xin Jinhan
Hi, First, the /tmp/spark-events is the default storage location of spark eventLog, but the log is stored only when you set the 'spark.eventLog.enabled=true', which maybe your spark 2.4.6 set to false. So you can just set it to false and the error will disappear. Second, I suggest to open the

Spark streaming with Kafka

2020-07-02 Thread dwgw
Hi I am trying to stream kafka topic from spark shell but i am getting the following error. I am using *spark 3.0.0/scala 2.12.10* (Java HotSpot(TM) 64-Bit Server VM, *Java 1.8.0_212*) *[spark@hdp-dev ~]$ spark-shell --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0* Ivy Default Cache

Spark streaming with Kafka

2020-07-02 Thread dwgw
HiI am trying to stream kafka topic from spark shell but i am getting the following error. I am using *spark 3.0.0/scala 2.12.10* (Java HotSpot(TM) 64-Bit Server VM, *Java 1.8.0_212*)*[spark@hdp-dev ~]$ spark-shell --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0*Ivy Default Cache set

How does Spark Streaming handle late data?

2020-07-02 Thread lafeier
Hi, AllI am using Spark Streaming for real-time data, but the data is delayed.My batch time is set to 15 minutes and then Spark steaming trigger calculation at 15 minutes,30 minutes,45 minutes and 60 minutes, but my data delay is 5 minutes, what should I