https://spark.apache.org/docs/2.2.0/configuration.html#memory-management
MARIO AMATUCCI Senior Software Engineer Office: +48 12 881 10 05 x 31463 Email: mario_amatu...@epam.com Gdansk, Poland epam.com ~do more with less~ CONFIDENTIALITY CAUTION AND DISCLAIMER This message is intended only for the use of the individual(s) or entity(ies) to which it is addressed and contains information that is legally privileged and confidential. If you are not the intended recipient, or the person responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. All unintended recipients are obliged to delete this message and destroy any printed copies. -----Original Message----- From: Nicolas Paris <nicolas.pa...@riseup.net> Sent: Tuesday, July 23, 2019 6:56 PM To: user@spark.apache.org Subject: Avro large binary read memory problem Hi I have those avro file with the schema id:Long, content:Binary the binary are large image with a maximum of 2GB of size. I d like to get a subset of row "where id in (...)" Sadly I get memory errors even if the subset is 0 of size. It looks like the reader stores the binary information until the heap size or the container is killed by yarn. Any idea how to tune the memory management to avoid to get memory problem? Thanks -- spark 2.4.3 -- nicolas --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org