load large number of files from s3

2016-11-11 Thread Shawn Wan
: http://apache-spark-user-list.1001560.n3.nabble.com/load-large-number-of-files-from-s3-tp28062.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

load large number of files from s3

2016-11-11 Thread Xiaomeng Wan
Hi, We have 30 million small files (100k each) on s3. I want to know how bad it is to load them directly from s3 ( eg driver memory, io, executor memory, s3 reliability) before merge or distcp them. Anybody has experience? Thanks in advance! Regards, Shawn