Assume I want to make a PairRDD whose keys are S3 URLs and whose values are Strings holding the contents of those (UTF-8) files, but NOT split into lines. Are there length limits on those files/Strings? 1 MB? 16 MB? 4 GB? 1 TB? Similarly, can such a thing be registered as a table so that I can use substr() to pick out pieces of the string?
Thanks, Ron