How big do you expect the file to be? Spark has issues with single blocks over 
2GB (see https://issues.apache.org/jira/browse/SPARK-1476 and 
https://issues.apache.org/jira/browse/SPARK-6235 for example)

If you don’t know, try running

df.repartition(100).write.format…

to get an idea of how  big it would be, I assume it’s over 2 GB

From: Zhang, Jingyu [mailto:jingyu.zh...@news.com.au]
Sent: 16 November 2015 10:17
To: user <user@spark.apache.org>
Subject: Size exceeds Integer.MAX_VALUE on EMR 4.0.0 Spark 1.4.1


I am using spark-csv to save files in s3, it shown Size exceeds. Please let me 
know how to fix it. Thanks.

df.write()

    .format("com.databricks.spark.csv")

    .option("header", "true")

    .save("s3://newcars.csv");

java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE

        at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:860)

        at 
org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:125)

        at 
org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:113)

        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1285)

        at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:127)

        at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:134)

        at 
org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:511)

        at 
org.apache.spark.storage.BlockManager.getLocal(BlockManager.scala:429)

        at org.apache.spark.storage.BlockManager.get(BlockManager.scala:617)

        at 
org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:154)

        at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)

        at org.apache.spark.rdd.RDD.iterator(RDD.scala:242)

        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)

        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)

        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)

        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)

        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)

        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)

        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)

        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)

        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)

        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)

        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)

        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)

        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)

        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)

        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)

        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)

        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)

        at org.apache.spark.scheduler.Task.run(Task.scala:70)

        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)

        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

        at java.lang.Thread.run(Thread.java:745)


This message and its attachments may contain legally privileged or confidential 
information. It is intended solely for the named addressee. If you are not the 
addressee indicated in this message or responsible for delivery of the message 
to the addressee, you may not copy or deliver this message or its attachments 
to anyone. Rather, you should permanently delete this message and its 
attachments and kindly notify the sender by reply e-mail. Any content of this 
message and its attachments which does not relate to the official business of 
the sending company must be taken not to have been sent or endorsed by that 
company or any of its related entities. No warranty is made that the e-mail or 
attachments are free from computer virus or other defect.

Reply via email to