RE: Reading large files

2015-05-06 Thread Ulanov, Alexander
: Wednesday, May 06, 2015 2:23 PM To: Ulanov, Alexander Cc: user@spark.apache.org Subject: Re: Reading large files Thanks. In both cases, does the driver need to have enough memory to contain the entire file? How do both these functions work when, for example, the binary file is 4G and available

Re: Reading large files

2015-05-06 Thread Vijayasarathy Kannan
Thanks. In both cases, does the driver need to have enough memory to contain the entire file? How do both these functions work when, for example, the binary file is 4G and available driver memory is lesser? On Wed, May 6, 2015 at 1:54 PM, Ulanov, Alexander wrote: > SparkContext has two methods

RE: Reading large files

2015-05-06 Thread Ulanov, Alexander
SparkContext has two methods for reading binary files: binaryFiles (reads multiple binary files into RDD) and binaryRecords (reads separate lines of a single binary file into RDD). For example, I have a big binary file split into logical parts, so I can use “binaryFiles”. The possible problem is