Ah looking at that inputformat it should just work out the box using sc.newAPIHadoopFile ...
Would be interested to hear if it works as expected for you (in python you'll end up with bytearray values). N — Sent from Mailbox On Fri, Jun 6, 2014 at 9:38 PM, Jeremy Freeman <freeman.jer...@gmail.com> wrote: > Oh cool, thanks for the heads up! Especially for the Hadoop InputFormat > support. We recently wrote a custom hadoop input format so we can support > flat binary files > (https://github.com/freeman-lab/thunder/tree/master/scala/src/main/scala/thunder/util/io/hadoop), > and have been testing it in Scala. So I was following Nick's progress and > was eager to check this out when ready. Will let you guys know how it goes. > -- J > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/error-loading-large-files-in-PySpark-0-9-0-tp3049p7144.html > Sent from the Apache Spark User List mailing list archive at Nabble.com.