Greetings everyone,
I'm trying to read a single field of a Hive table stored as Parquet in
Spark (~140GB for the entire table, this single field should be just a few
GB) and look at the sorted output using the following:
sql("SELECT " + field + " FROM MY_TABLE ORDER BY " + field + " DESC")
But this simple line of code gives:
Caused by: java.lang.IllegalArgumentException: Cannot allocate a page with
more than 17179869176 bytes
Same error for:
sql("SELECT " + field + " FROM MY_TABLE).sort(field)
and:
sql("SELECT " + field + " FROM MY_TABLE).orderBy(field)
I'm running this on a machine with more than 200GB of RAM, running in local
mode with spark.driver.memory set to 64g.
I do not know why it cannot allocate a big enough page, and why is it
trying to allocate such a big page in the first place?
I hope someone with more knowledge of Spark can shed some light on this.
Thank you!
*Best regards,*
*Babak Alipour ,*
*University of Florida*