Re: error loading large files in PySpark 0.9.0

Nick Pentreath Sat, 07 Jun 2014 07:26:23 -0700

Ah looking at that inputformat it should just work out the box using 
sc.newAPIHadoopFile ...



Would be interested to hear if it works as expected for you (in python you'll 
end up with bytearray values).




N
—
Sent from Mailbox

On Fri, Jun 6, 2014 at 9:38 PM, Jeremy Freeman <freeman.jer...@gmail.com>
wrote:

> Oh cool, thanks for the heads up! Especially for the Hadoop InputFormat
> support. We recently wrote a custom hadoop input format so we can support
> flat binary files
> (https://github.com/freeman-lab/thunder/tree/master/scala/src/main/scala/thunder/util/io/hadoop),
> and have been testing it in Scala. So I was following Nick's progress and
> was eager to check this out when ready. Will let you guys know how it goes.
> -- J
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/error-loading-large-files-in-PySpark-0-9-0-tp3049p7144.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: error loading large files in PySpark 0.9.0

Reply via email to