Spark breaks data across machines at partition level, so realistic limit is on the partition size.
Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi <https://twitter.com/mayur_rustagi> On Thu, Aug 7, 2014 at 8:41 AM, Daniel, Ronald (ELS-SDG) < [email protected]> wrote: > Assume I want to make a PairRDD whose keys are S3 URLs and whose values > are Strings holding the contents of those (UTF-8) files, but NOT split into > lines. Are there length limits on those files/Strings? 1 MB? 16 MB? 4 GB? 1 > TB? > > Similarly, can such a thing be registered as a table so that I can use > substr() to pick out pieces of the string? > > > > Thanks, > > Ron > > >
