[ https://issues.apache.org/jira/browse/PIG-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alan Gates reassigned PIG-89: ----------------------------- Assignee: Benjamin Francisoud > Too many spills to files causes ArrayIndexOutOfBoundsException if new temp > file cant be created > ----------------------------------------------------------------------------------------------- > > Key: PIG-89 > URL: https://issues.apache.org/jira/browse/PIG-89 > Project: Pig > Issue Type: Bug > Components: data > Environment: Linux, Local execution Mode, JDK 1.6 > Reporter: Craig Macdonald > Assignee: Benjamin Francisoud > Fix For: 0.1.0 > > Attachments: databag-89-v3.patch, patch-v2.defaultdatabag, > patch.defaultdatabag > > > Hello, > I am experimenting, trying to perform a DISTINCT on a medium sized set of > URLs - about 3million (same set as I discussed previously - Utkarsh has a > copy), this time in local execution mode. > Pig script: > {{ > A = LOAD 'all_13122007.txt'; > B = DISTINCT A; > store B into 'bla; > }} > Bring these errors (two lines swapped in DefaultDatabag) to find real error. > {{ > 2008-02-04 18:09:44,756 [Low Memory Detector] INFO org.apache.pig - low > memory handler called init = 29491200(28800K) used = 269834064(263509K) > committed = 307036160(299840K) max = 471662592(460608K) > 2008-02-04 18:09:45,355 [Low Memory Detector] ERROR org.apache.pig - Unable > to spill contents to disk > java.io.IOException: Too many open files > at java.io.UnixFileSystem.createFileExclusively(Native Method) > at java.io.File.checkAndCreate(File.java:1704) > at java.io.File.createTempFile(File.java:1793) > at java.io.File.createTempFile(File.java:1830) > at org.apache.pig.data.DataBag.getSpillFile(DataBag.java:367) > at org.apache.pig.data.DefaultDataBag.spill(DefaultDataBag.java:69) > at > org.apache.pig.impl.util.SpillableMemoryManager.handleNotification(SpillableMemoryManager.java:123) > at > sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:138) > at sun.management.MemoryImpl.createNotification(MemoryImpl.java:171) > at > sun.management.MemoryPoolImpl$CollectionSensor.triggerAction(MemoryPoolImpl.java:300) > at sun.management.Sensor.trigger(Sensor.java:120) > java.lang.ArrayIndexOutOfBoundsException: -1 > at java.util.ArrayList.remove(ArrayList.java:390) > at org.apache.pig.data.DefaultDataBag.spill(DefaultDataBag.java:84) > at > org.apache.pig.impl.util.SpillableMemoryManager.handleNotification(SpillableMemoryManager.java:123) > at > sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:138) > at sun.management.MemoryImpl.createNotification(MemoryImpl.java:171) > at > sun.management.MemoryPoolImpl$CollectionSensor.triggerAction(MemoryPoolImpl.java:300) > at sun.management.Sensor.trigger(Sensor.java:120) > Exception in thread "Low Memory Detector" java.lang.InternalError: Error in > invoking listener > at > sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:141) > at sun.management.MemoryImpl.createNotification(MemoryImpl.java:171) > at > sun.management.MemoryPoolImpl$CollectionSensor.triggerAction(MemoryPoolImpl.java:300) > at sun.management.Sensor.trigger(Sensor.java:120) > }} > There are a two sub-issues here: > 1. Pig spills too much using a default JVM (64MB) size - expected? > Perhaps pig.pl should set a default JVM size of more than 64MB? > 2. the line DefaultDataBag.java:84 > {{{ > mSpillFiles.remove(mSpillFiles.size() - 1); > }}} > line should check that mSpillFiles.size() > 0, because if > File.createTempFile( ) in Databag.getSpillFile() fails, the mSpillFiles will > not yet have been updated. My preference would be to split try{ } catch > (IOException ioe) { } within DefaultDatabag.spill() into two exception > handlers - one for getSpillFile() errors, and one for actual writing errors > (when we know mSpillFiles has been added to). > If this latter point isnt coherent, I can create patch. > Ta muchly. > C -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.