Re: Shutting down spark structured streaming when the streaming process completed current process

2021-04-23 Thread Mich Talebzadeh
Like to hear comments on this. Basically the ability to shutdown a running spark structured streaming process gracefully. In a way it may be something worth integrating in Spark structured streaming. Much like Kafka team are working to get rid of zooKeeper and replacing it with a system type

Accelerating Spark SQL / Dataframe using GPUs & Alluxio

2021-04-23 Thread Bin Fan
Hi Spark users, We have been working on GPU acceleration for Apache Spark SQL / Dataframe using the RAPIDS Accelerator for Apache Spark and open source project Alluxio

Re: java.lang.IllegalArgumentException: Unsupported class file major version 55

2021-04-23 Thread Sean Owen
This means you compiled with Java 11, but are running on Java < 11. It's not related to Spark. On Fri, Apr 23, 2021 at 10:23 AM chansonzhang wrote: > I just update the spark-* version in my pom.xml to match my spark and scala > environment, and this solved the problem > > > > > -- > Sent from:

pyspark sql load with path of special character

2021-04-23 Thread Regin Quinoa
Hi, I am using pyspark sql to load files into table following ```LOAD DATA LOCAL INPATH '/user/hive/warehouse/students' OVERWRITE INTO TABLE test_load;``` https://spark.apache.org/docs/latest/sql-ref-syntax-dml-load.html It complains pyspark.sql.utils.AnalysisException: load data input path does

Re: java.lang.IllegalArgumentException: Unsupported class file major version 55

2021-04-23 Thread chansonzhang
I just update the spark-* version in my pom.xml to match my spark and scala environment, and this solved the problem -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail:

Re: Spark Streaming with Files

2021-04-23 Thread Mich Talebzadeh
Interesting. If we go back to classic Lambda architecture on premise, you could Flume API to Kafka to add files to HDFS in time series bases. Most higher CDC vendors do exactly that. Oracle GoldenGate (OGG) classic gets data from Oracle redo logs and sends them to subscribers. One can deploy OGC

Spark Streaming with Files

2021-04-23 Thread ayan guha
Hi In one of the spark summit demo, it is been alluded that we should think batch jobs in streaming pattern, using "run once" in a schedule. I find this idea very interesting and I understand how this can be achieved for sources like kafka, kinesis or similar. in fact we have implemented this

Sleep behavior

2021-04-23 Thread Praneeth Shishtla
Hi, We have a 6 node spark cluster and have some pyspark jobs running on it. The job is dependent on external application and to have resiliency we try a couple of times. Will it be fine to induce some wait time between two runs(using time.sleep()) ? Or could there by any sync issues? Wanted to

Shutting down spark structured streaming when the streaming process completed current process

2021-04-23 Thread Mich Talebzadeh
Hi, This is the design that I came up with. How to shutdown the topic doing work for the message being processed, wait for it to complete and shutdown the streaming process for a given topic. I thought about this and looked at options. Using sensors to implement this like airflow would be