Hi,

I am using sc.textFile("shared_dir/*")  to load all the files in a directory
on a shared partition. The total size of the files in this directory is 1.2
TB. We have a 16  node cluster with 3 TB memory (1 node is driver, 15 nodes
are workers). But the loading fails after around 1 TB of data is read (in
the mapPartitions stage). Basically, there  is no progress in mapPartitions
after 1 TB of input. It seems that the cluster has sufficient memory but not
sure why the program get stuck.

1.2 TB of data divided across 15 worker nodes would require each node to
have about 80 GB of memory. Every node in the cluster is allocated around
170GB of memory. According to the spark documentation, the default storage
fraction for RDDs is 60% of the allocated memory. I have increased that to
0.8 (by setting --conf spark.storage.memorFraction=0.8) , so each node
should have around 136 GB of memory for storing RDDs. So I am not sure why
the program is failing in the mapPartitions stage where it seems to be 
loading the data. 

I dont have a good idea about the Spark internals and would appreciate any
help in fixing this issue. 

thanks
   



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-failing-when-loading-large-amount-of-data-tp19441.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to