Question regarding Projection PushDown

2021-08-27 Thread satyajit vegesna
Hi All, Please help with below question, I am trying to build my own data source to connect to CustomAerospike. Now I am almost done with everything, but still not sure how to implement Projection Pushdown while selecting nested columns. Spark does implicit for column projection pushdown, but

Re: Processing Multiple Streams in a Single Job

2021-08-27 Thread Sean Owen
That is something else. Yes, you can create a single, complex stream job that joins different data sources, etc. That is not different than any other Spark usage. What are you looking for w.r.t. docs? We are also saying you can simply run N unrelated streaming jobs in parallel on the driver,

Re: Processing Multiple Streams in a Single Job

2021-08-27 Thread Artemis User
Thanks Mich.  I understand now how to deal multiple streams in a single job, but the responses I got before were very abstract and confusing.  So I had to go back to the Spark doc and figure out the details.  This is what I found out: 1. The standard and recommended way to do multi-stream

Spark submit on openshift

2021-08-27 Thread Markus Gierich
Hi! I created a spark cluster on openshift using radanalytics.io I'm trying to execute the SparPi sample using spark-submit --name sparkpi-2 \--master spark://hans:7077 \ --deploy-mode cluster \ --class org.apache.spark.examples.SparkPi \

Performance Degradation in Spark 3.0.2 compared to Spark 3.0.1

2021-08-27 Thread Sharma, Prakash (Nokia - IN/Bangalore)
Sessional Greetings , We're doing tpc-ds query tests using Spark 3.0.2 on kubernetes with data on HDFS and we're observing delays in query execution time when compared to Spark 3.0.1 on same environment. We've observed that some stages fail, but looks like it is taking some time to realise