I am also interested in this. My solution now is making a file to a line of
string, i.e. deleting all '\n', then adding filename as the head of line
with a space.

[filename] [space] [content]

Anyone have better ideas ?
2014-1-31 AM12:18于 "Philip Ogren" <philip.og...@oracle.com>写道:

> In my Spark programming thus far my unit of work has been a single row
> from an hdfs file by creating an RDD[Array[String]] with something like:
>
> spark.textFile(path).map(_.split("\t"))
>
> Now, I'd like to do some work over a large collection of files in which
> the unit of work is a single file (rather than a row from a file.)  Does
> Spark anticipate users creating an RDD[URI] or RDD[File] or some such and
> supporting actions and transformations that one might want to do on such an
> RDD?  Any advice and/or code snippets would be appreciated!
>
> Thanks,
> Philip
>

Reply via email to