Hello all,
We use Apache Spark 3.2.0 and our data stored on Apache Hadoop with parquet
format.
One of the advantages of the parquet format is the presence of the predicate
pushdown filter feature, which allows only the necessary data to be read. This
feature is well provided by Spark. For
please send an empty email to:
user-unsubscr...@spark.apache.org
to unsubscribe yourself from the list.
On Fri, Dec 17, 2021 at 11:14 AM Ankit Maloo
wrote:
> Please do unsubscribe me from your mailing list.
>
Please do unsubscribe me from your mailing list.
Many thanks Pol.
As it happens I was doing a work around with numRows = 10. In general it
is bad practice to hard code the constants within the code. For the same
reason we ought not put URLs embedded in the PySpark program itself.
What I did was to add numRows to the yaml file which is read at