[Spark Core] Does Spark support parquet predicate pushdown for big lists?

2021-12-16 Thread Amin Borjian
Hello all, We use Apache Spark 3.2.0 and our data stored on Apache Hadoop with parquet format. One of the advantages of the parquet format is the presence of the predicate pushdown filter feature, which allows only the necessary data to be read. This feature is well provided by Spark. For

Re: Unsubscribe

2021-12-16 Thread Piper H
please send an empty email to: user-unsubscr...@spark.apache.org to unsubscribe yourself from the list. On Fri, Dec 17, 2021 at 11:14 AM Ankit Maloo wrote: > Please do unsubscribe me from your mailing list. >

Unsubscribe

2021-12-16 Thread Ankit Maloo
Please do unsubscribe me from your mailing list.

Re: class instance variable in PySpark used in lambda function

2021-12-16 Thread Mich Talebzadeh
Many thanks Pol. As it happens I was doing a work around with numRows = 10. In general it is bad practice to hard code the constants within the code. For the same reason we ought not put URLs embedded in the PySpark program itself. What I did was to add numRows to the yaml file which is read at