Works like a charm. Thanks Reynold for the quick and efficient response! Alexis
2015-08-05 19:19 GMT+02:00 Reynold Xin <r...@databricks.com>: > In Spark 1.5, we have a new way to manage memory (part of Project > Tungsten). The default unit of memory allocation is 64MB, which is way too > high when you have 1G of memory allocated in total and have more than 4 > threads. > > We will reduce the default page size before releasing 1.5. For now, you > can just reduce spark.buffer.pageSize variable to a lower value (e.g. 16m). > > > https://github.com/apache/spark/blob/702aa9d7fb16c98a50e046edfd76b8a7861d0391/sql/core/src/main/scala/org/apache/spark/sql/execution/sort.scala#L125 > > On Wed, Aug 5, 2015 at 9:25 AM, Alexis Seigneurin <aseigneu...@ippon.fr> > wrote: > >> Hi, >> >> I'm receiving a memory allocation error with a recent build of Spark 1.5: >> >> java.io.IOException: Unable to acquire 67108864 bytes of memory >> at >> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:348) >> at >> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:398) >> at >> org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:92) >> at >> org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:174) >> at >> org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:146) >> at >> org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:126) >> >> >> The issue appears when joining 2 datasets. One with 6084 records, the >> other one with 200 records. I'm expecting to receive 200 records in the >> result. >> >> I'm using a homemade build prepared from "branch-1.5" with commit ID >> "eedb996". I have run "mvn -DskipTests clean install" to generate that >> build. >> >> Apart from that, I'm using Java 1.7.0_51 and Maven 3.3.3. >> >> I've prepared a test case that can be built and executed very easily >> (data files are included in the repo): >> https://github.com/aseigneurin/spark-testcase >> >> One thing to note is that the issue arises when the master is set to >> "local[*]" but not when set to "local". Both options work without problem >> with Spark 1.4, though. >> >> Any help will be greatly appreciated! >> >> Many thanks, >> Alexis >> > >