I am also interested in this. My solution now is making a file to a line of string, i.e. deleting all '\n', then adding filename as the head of line with a space.
[filename] [space] [content] Anyone have better ideas ? 2014-1-31 AM12:18于 "Philip Ogren" <philip.og...@oracle.com>写道: > In my Spark programming thus far my unit of work has been a single row > from an hdfs file by creating an RDD[Array[String]] with something like: > > spark.textFile(path).map(_.split("\t")) > > Now, I'd like to do some work over a large collection of files in which > the unit of work is a single file (rather than a row from a file.) Does > Spark anticipate users creating an RDD[URI] or RDD[File] or some such and > supporting actions and transformations that one might want to do on such an > RDD? Any advice and/or code snippets would be appreciated! > > Thanks, > Philip >