[ 
https://issues.apache.org/jira/browse/PIG-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-89:
-----------------------------

    Assignee: Benjamin Francisoud

> Too many spills to files causes ArrayIndexOutOfBoundsException if new temp 
> file cant be created
> -----------------------------------------------------------------------------------------------
>
>                 Key: PIG-89
>                 URL: https://issues.apache.org/jira/browse/PIG-89
>             Project: Pig
>          Issue Type: Bug
>          Components: data
>         Environment: Linux, Local execution Mode, JDK 1.6
>            Reporter: Craig Macdonald
>            Assignee: Benjamin Francisoud
>             Fix For: 0.1.0
>
>         Attachments: databag-89-v3.patch, patch-v2.defaultdatabag, 
> patch.defaultdatabag
>
>
> Hello,
> I am experimenting, trying to perform a DISTINCT on a medium sized set of 
> URLs - about 3million (same set as I discussed previously - Utkarsh has a 
> copy), this time in local execution mode.
> Pig script:
> {{
> A = LOAD 'all_13122007.txt';
> B = DISTINCT A;
> store B into 'bla;
> }}
> Bring these errors (two lines swapped in DefaultDatabag) to find real error.
> {{
> 2008-02-04 18:09:44,756 [Low Memory Detector] INFO  org.apache.pig - low 
> memory handler called init = 29491200(28800K) used = 269834064(263509K) 
> committed = 307036160(299840K) max = 471662592(460608K)
> 2008-02-04 18:09:45,355 [Low Memory Detector] ERROR org.apache.pig - Unable 
> to spill contents to disk
> java.io.IOException: Too many open files
>         at java.io.UnixFileSystem.createFileExclusively(Native Method)
>         at java.io.File.checkAndCreate(File.java:1704)
>         at java.io.File.createTempFile(File.java:1793)
>         at java.io.File.createTempFile(File.java:1830)
>         at org.apache.pig.data.DataBag.getSpillFile(DataBag.java:367)
>         at org.apache.pig.data.DefaultDataBag.spill(DefaultDataBag.java:69)
>         at 
> org.apache.pig.impl.util.SpillableMemoryManager.handleNotification(SpillableMemoryManager.java:123)
>         at 
> sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:138)
>         at sun.management.MemoryImpl.createNotification(MemoryImpl.java:171)
>         at 
> sun.management.MemoryPoolImpl$CollectionSensor.triggerAction(MemoryPoolImpl.java:300)
>         at sun.management.Sensor.trigger(Sensor.java:120)
> java.lang.ArrayIndexOutOfBoundsException: -1
>         at java.util.ArrayList.remove(ArrayList.java:390)
>         at org.apache.pig.data.DefaultDataBag.spill(DefaultDataBag.java:84)
>         at 
> org.apache.pig.impl.util.SpillableMemoryManager.handleNotification(SpillableMemoryManager.java:123)
>         at 
> sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:138)
>         at sun.management.MemoryImpl.createNotification(MemoryImpl.java:171)
>         at 
> sun.management.MemoryPoolImpl$CollectionSensor.triggerAction(MemoryPoolImpl.java:300)
>         at sun.management.Sensor.trigger(Sensor.java:120)
> Exception in thread "Low Memory Detector" java.lang.InternalError: Error in 
> invoking listener
>         at 
> sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:141)
>         at sun.management.MemoryImpl.createNotification(MemoryImpl.java:171)
>         at 
> sun.management.MemoryPoolImpl$CollectionSensor.triggerAction(MemoryPoolImpl.java:300)
>         at sun.management.Sensor.trigger(Sensor.java:120)
> }}
> There are a two sub-issues here: 
> 1. Pig spills too much using a default JVM (64MB) size - expected?
> Perhaps pig.pl should set a default JVM size of more than 64MB?
> 2. the line DefaultDataBag.java:84
> {{{
> mSpillFiles.remove(mSpillFiles.size() - 1);
> }}}
> line should check that mSpillFiles.size() > 0,  because if 
> File.createTempFile( ) in Databag.getSpillFile() fails, the mSpillFiles will 
> not yet have been updated. My preference would be to split try{ } catch 
> (IOException ioe) { } within DefaultDatabag.spill() into two exception 
> handlers - one for getSpillFile() errors, and one for actual writing errors 
> (when we know mSpillFiles has been added to).
> If this latter point isnt coherent, I can create patch.
> Ta muchly.
> C

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to