Re: compile spark 3.1.1 error

2021-05-10 Thread jason_xu
Hi Jiahong, I got the same failure on building spark 3.1.1 with hadoop 2.8.5. Any chance you find a solution? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.ap

Re: Updating spark-env.sh per application

2021-05-10 Thread Mich Talebzadeh
Hi Renu, SPARK_DIST_CLASSPATH is specific to hadoop API and it is not part of conf/spark-env.sh, although you can add to this file as per doc here However, two things will throw light into this 1. Version of Spark you are using?

Re: Calculate average from Spark stream

2021-05-10 Thread Lalwani, Jayesh
You don’t need to “launch batches” every 5 minutes. You can launch batches every 2 seconds, and aggregate on window for 5 minutes. Spark will read data from topic every 2 seconds, and keep the data in memory for 5 minutes. You need to make few decisions 1. DO you want a tumbling window or a

Re: Calculate average from Spark stream

2021-05-10 Thread Mich Talebzadeh
Hi Giuseppe, Just looked over your PySpark code. You are doing Spark Structured Streaming (SSS) Your kafka topic sends messages every two seconds and regardless you want to enrich the data every 5 minutes. In other words weait for 5 minutes to build the batch. You can either run wait for 5 minut