Hi ruben,

You may try sc.binaryFiles which is designed for lots of small files and it
can map paths into inputstreams.
Each inputstream will keep only the path and some configuration, so it would
be cheap to shuffle them.
However, I'm not sure whether spark take the data locality into account
while dealing with these inputstreams.

Hope this helps



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/use-case-reading-files-split-per-id-tp28044p28075.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Reply via email to