Re: How to disable pushdown predicate in spark 2.x query

2020-06-22 Thread Xiao Li
Just turn off the JDBC option pushDownPredicate, which was introduced in Spark 2.4. https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html Xiao On Mon, Jun 22, 2020 at 11:36 AM Mohit Durgapal wrote: > Hi All, > > I am trying to read a table of a relational database using spark 2.x. > >

CVE-2020-9480: Apache Spark RCE vulnerability in auth-enabled standalone master

2020-06-22 Thread Sean Owen
Severity: Important Vendor: The Apache Software Foundation Versions Affected: Apache Spark 2.4.5 and earlier Description: In Apache Spark 2.4.5 and earlier, a standalone resource manager's master may be configured to require authentication (spark.authenticate) via a shared secret. When enabled,

How to disable pushdown predicate in spark 2.x query

2020-06-22 Thread Mohit Durgapal
Hi All, I am trying to read a table of a relational database using spark 2.x. I am using code like the following: sparkContext.read().jdbc(url, table , connectionProperties).select('SELECT_COLUMN').where(whereClause); Now, What's happening is spark is actually the SQL query which spark is runn

Documentation on SupportsReportStatistics Outdated?

2020-06-22 Thread Micah Kornfield
I was wondering if the documentation on SupportsReportStatistics [1] about its interaction with the planner and predicate pushdowns is still accurate. It says: "Implementations that return more accurate statistics based on pushed operators will not improve query performance until the planner can

Re: Using hadoop-cloud_2.12 jars

2020-06-22 Thread Rahij Ramsharan
Thanks for the response. If we intend consumers to be able to use this based on the docs I linked, could we publish the jar to maven central? On Mon, Jun 22, 2020 at 12:59 PM Jorge Machado wrote: > You can build it from source. > > Clone the spark git repo and run: ./build/mvn clean package -Dsk

Re: Using hadoop-cloud_2.12 jars

2020-06-22 Thread Jorge Machado
You can build it from source. Clone the spark git repo and run: ./build/mvn clean package -DskipTests -Phadoop-3.2 -Pkubernetes -Phadoop-cloud Regards > On 22. Jun 2020, at 11:00, Rahij Ramsharan wrote: > > Hello, > > I am trying to use the new S3 committers > (https://spark.apache.org/do

Re: we control spark file names before we write them - should we opensource it?

2020-06-22 Thread ilaimalka
Hey Panos, our solution allows us to analyze the full path and modify the file name. so for multiple partitions, we can extract the values of the partitions and then inject them into the file name. for example, for the following file: s3://some_bucket/some_folder/partition1=value1/partition2=valu

Reg - Why Apache Hadoop need to be Installed separately for Running Apache Sparkā€¦?

2020-06-22 Thread Praveen Kumar Ramachandran
I'm learning Apache Spark, where I'm trying to run a basic Spark Program written in Java. I've installed Apache Spark *(spark-2.4.3-bin-without-hadoop)* downloaded from https://spark.apache.org/ . I've created a maven project in eclipse and added the following dependency : org.apache.s

Using hadoop-cloud_2.12 jars

2020-06-22 Thread Rahij Ramsharan
Hello, I am trying to use the new S3 committers ( https://spark.apache.org/docs/latest/cloud-integration.html#committing-work-into-cloud-storage-safely-and-fast) in spark 3.0.0. As per https://spark.apache.org/docs/latest/cloud-integration.html#installation, I need to include "org.apache.spark:had

RE: we control spark file names before we write them - should we opensource it?

2020-06-22 Thread ilaimalka
Hey Stefan, Thank you for your replay. May I ask for a use-case or an example of how you would use this ability. I want to make sure our solution would work for you. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ -

RE: Spark Thrift Server in Kubernetes deployment

2020-06-22 Thread Rao, Abhishek (Nokia - IN/Bangalore)
Hi, STS deployment on k8s is not supported out of the box. We had done some minor changes in spark code to get Spark Thrift Server working on k8s. Here is the PR that we had created. https://github.com/apache/spark/pull/22433 Unfortunately, this could not be merged. Thanks and Regards, Abhishek