Depends on the data types you use.

Do you have in jsonlines format? Then the amount of memory plays much less a 
role.

Otherwise if it is one large object or array I would not recommend it.

> Am 18.06.2020 um 15:12 schrieb Chetan Khatri <chetan.opensou...@gmail.com>:
> 
> 
> Hi Spark Users,
> 
> I have a 50GB of JSON file, I would like to read and persist at HDFS so it 
> can be taken into next transformation. I am trying to read as 
> spark.read.json(path) but this is giving Out of memory error on driver. 
> Obviously, I can't afford having 50 GB on driver memory. In general, what is 
> the best practice to read large JSON file like 50 GB?
> 
> Thanks

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to