: Wednesday, May 06, 2015 2:23 PM
To: Ulanov, Alexander
Cc: user@spark.apache.org
Subject: Re: Reading large files
Thanks.
In both cases, does the driver need to have enough memory to contain the entire
file? How do both these functions work when, for example, the binary file is 4G
and available
Thanks.
In both cases, does the driver need to have enough memory to contain the
entire file? How do both these functions work when, for example, the binary
file is 4G and available driver memory is lesser?
On Wed, May 6, 2015 at 1:54 PM, Ulanov, Alexander
wrote:
> SparkContext has two methods
SparkContext has two methods for reading binary files: binaryFiles (reads
multiple binary files into RDD) and binaryRecords (reads separate lines of a
single binary file into RDD). For example, I have a big binary file split into
logical parts, so I can use “binaryFiles”. The possible problem is