Works like a charm. Thanks Reynold for the quick and efficient response!

Alexis

2015-08-05 19:19 GMT+02:00 Reynold Xin <r...@databricks.com>:

> In Spark 1.5, we have a new way to manage memory (part of Project
> Tungsten). The default unit of memory allocation is 64MB, which is way too
> high when you have 1G of memory allocated in total and have more than 4
> threads.
>
> We will reduce the default page size before releasing 1.5.  For now, you
> can just reduce spark.buffer.pageSize variable to a lower value (e.g. 16m).
>
>
> https://github.com/apache/spark/blob/702aa9d7fb16c98a50e046edfd76b8a7861d0391/sql/core/src/main/scala/org/apache/spark/sql/execution/sort.scala#L125
>
> On Wed, Aug 5, 2015 at 9:25 AM, Alexis Seigneurin <aseigneu...@ippon.fr>
> wrote:
>
>> Hi,
>>
>> I'm receiving a memory allocation error with a recent build of Spark 1.5:
>>
>> java.io.IOException: Unable to acquire 67108864 bytes of memory
>> at
>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:348)
>> at
>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:398)
>> at
>> org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:92)
>> at
>> org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:174)
>> at
>> org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:146)
>> at
>> org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:126)
>>
>>
>> The issue appears when joining 2 datasets. One with 6084 records, the
>> other one with 200 records. I'm expecting to receive 200 records in the
>> result.
>>
>> I'm using a homemade build prepared from "branch-1.5" with commit ID
>> "eedb996". I have run "mvn -DskipTests clean install" to generate that
>> build.
>>
>> Apart from that, I'm using Java 1.7.0_51 and Maven 3.3.3.
>>
>> I've prepared a test case that can be built and executed very easily
>> (data files are included in the repo):
>> https://github.com/aseigneurin/spark-testcase
>>
>> One thing to note is that the issue arises when the master is set to
>> "local[*]" but not when set to "local". Both options work without problem
>> with Spark 1.4, though.
>>
>> Any help will be greatly appreciated!
>>
>> Many thanks,
>> Alexis
>>
>
>

Reply via email to