I think you've hit the nail on the head. Since the serialization
ultimately creates a byte array, and arrays can have at most ~2
billion elements in the JVM, the broadcast can be at most ~2GB.

At that scale, you might consider whether you really have to broadcast
these values, or want to handle them as RDDs and join and so on.

Or consider whether you can break it up into several broadcasts?


On Fri, Feb 13, 2015 at 6:24 PM, soila <[email protected]> wrote:
> I am trying to broadcast a large 5GB variable using Spark 1.2.0. I get the
> following exception when the size of the broadcast variable exceeds 2GB. Any
> ideas on how I can resolve this issue?
>
> java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
>         at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:829)
>         at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:123)
>         at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:132)
>         at
> org.apache.spark.storage.DiskStore.putIterator(DiskStore.scala:99)
>         at
> org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:147)
>         at
> org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:114)
>         at
> org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:787)
>         at
> org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638)
>         at
> org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:992)
>         at
> org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:98)
>         at
> org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:84)
>         at
> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
>         at
> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
>         at
> org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
>         at org.apache.spark.SparkContext.broadcast(SparkContext.scala:945)
>
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Size-exceeds-Integer-MAX-VALUE-exception-when-broadcasting-large-variable-tp21648.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to