Re: ERROR org.apache.spark.scheduler.AsyncEventQueue: Listener EventLoggingListener threw an exception

2021-03-18 Thread Mich Talebzadeh
Thanks. This error I see is happening in Google Dataproc with Spark 3.1.1. It used to be Spark 3.1.1-rc2 but now upgraded to 3.1.1. This is provided as a service. Unfortunately this error comes back on occasions when the job is running: *** price for ticker MKS is 703.00 >= 676.96 SELL ***

Re: ERROR org.apache.spark.scheduler.AsyncEventQueue: Listener EventLoggingListener threw an exception

2021-03-18 Thread Jungtaek Lim
We've fixed the single case for "onJobStart", please check SPARK-34731 [1]. The patch will be available in Spark 3.1.2 / 3.2.0, but if someone reports the same for lower version lines I think we could port back to lower version lines as well. 1. https://issues.apache.org/jira/browse/SPARK-34731

Coalesce vs reduce operation parameter

2021-03-18 Thread Pedro Tuero
I was reviewing a spark java application running on aws emr. The code was like: RDD.reduceByKey(func).coalesce(number).saveAsTextFile() That stage took hours to complete. I changed to: RDD.reduceByKey(func, number).saveAsTextFile() And it now takes less than 2 minutes, and the final output is

Re: ERROR org.apache.spark.scheduler.AsyncEventQueue: Listener EventLoggingListener threw an exception

2021-03-18 Thread Mich Talebzadeh
Recall this was the error 21/03/18 16:53:38 ERROR org.apache.spark.scheduler.AsyncEventQueue: Listener EventLoggingListener threw an exception java.util.ConcurrentModificationException at java.util.Hashtable$Enumerator.next(Hashtable.java:1387) I resolved this error message by setting:

Re: ERROR org.apache.spark.scheduler.AsyncEventQueue: Listener EventLoggingListener threw an exception

2021-03-18 Thread Mich Talebzadeh
This is an intermittent error. Full error is this 21/03/18 17:35:12 ERROR org.apache.spark.scheduler.AsyncEventQueue: Listener EventLoggingListener threw an exception java.util.ConcurrentModificationException at java.util.Hashtable$Enumerator.next(Hashtable.java:1387) at

ERROR org.apache.spark.scheduler.AsyncEventQueue: Listener EventLoggingListener threw an exception

2021-03-18 Thread Mich Talebzadeh
Hi, Does anyone know about the cause of this error in Spark structured streaming? Spark version 3.1.1 21/03/18 16:53:38 ERROR org.apache.spark.scheduler.AsyncEventQueue: Listener EventLoggingListener threw an exception java.util.ConcurrentModificationException at

Spark version verification

2021-03-18 Thread Mich Talebzadeh
Hi What would be a signature in Spark version or binaries that confirms the release is built on Spark built on 3.1.1 as opposed to 3.1.1-RC-1 or RC-2? Thanks Mich view my Linkedin profile *Disclaimer:* Use it at your own risk.

Re: FlatMapGroupsWithStateFunction is called thrice - Production use case.

2021-03-18 Thread Kuttaiah Robin
Hi Jungtaek, Thanks for looking into it. We use spark-2.4.3. I removed most of our code and pasted here just to understand the flow. Sorry for the delay. I would try to provide a simple reproducer when I find time, but this is really hurting us. Another observation I see is basically only if I

Spark 3.1.1 availability in Google Cloud

2021-03-18 Thread Mich Talebzadeh
For those interested, Spark latest version in Google Cloud Dataproc clusters under image 2.0-debian10 has been upgraded to Spark version 3.1.1 echo $SPARK_HOME /usr/lib/spark spark-shell Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use