We've had good luck loading Phoenix from HDFS using PhoenixHBaseStorage up until now. Now that we're pushing hundreds of megabytes of data at a time, we're seeing the error at the bottom of this post.
I've tried adding heap space to the pig process, up to 4GB. I've also tried reducing the batch size to 500 from 5,000. Neither adjustment fixes the problem. We're stuck. Anyone got any ideas? We're on Phoenix 2.2, on CDH 4.4. 2014-05-11 09:40:30,034 INFO com.salesforce.phoenix.pig.PhoenixPigConfiguration: Phoenix Upsert Statement: UPSERT INTO FLOW_MEF VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?) 2014-05-11 09:40:30,034 INFO com.salesforce.phoenix.pig.hadoop.PhoenixOutputFormat: Initialized Phoenix connection, autoCommit=false 2014-05-11 09:40:30,042 WARN org.apache.hadoop.conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS 2014-05-11 09:40:30,044 WARN org.apache.hadoop.conf.Configuration: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum 2014-05-11 09:40:30,074 INFO org.apache.pig.data.SchemaTupleBackend: Key [pig.schematuple] was not set... will not generate code. 2014-05-11 09:40:30,131 INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map: Aliases being processed per job phase (AliasName[line,offset]): M: flow_mef[7,11] C: R: 2014-05-11 09:40:31,289 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=hiveapp1:2181 sessionTimeout=180000 watcher=hconnection 2014-05-11 09:40:31,290 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of this process is [email protected] 2014-05-11 09:40:31,291 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server hiveapp1/10.10.30.200:2181. Will not attempt to authenticate using SASL (unknown error) 2014-05-11 09:40:31,292 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to hiveapp1/10.10.30.200:2181, initiating session 2014-05-11 09:40:31,304 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server hiveapp1/10.10.30.200:2181, sessionid = 0x345ba20ce0f7383, negotiated timeout = 60000 2014-05-11 09:40:31,446 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Closed zookeeper sessionid=0x345ba20ce0f7383 2014-05-11 09:40:31,462 INFO org.apache.zookeeper.ZooKeeper: Session: 0x345ba20ce0f7383 closed 2014-05-11 09:40:31,462 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down 2014-05-11 09:40:31,469 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=hiveapp1:2181 sessionTimeout=180000 watcher=hconnection 2014-05-11 09:40:31,470 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of this process is [email protected] 2014-05-11 09:40:31,471 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server hiveapp1/10.10.30.200:2181. Will not attempt to authenticate using SASL (unknown error) 2014-05-11 09:40:31,472 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to hiveapp1/10.10.30.200:2181, initiating session 2014-05-11 09:40:31,479 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server hiveapp1/10.10.30.200:2181, sessionid = 0x345ba20ce0f7389, negotiated timeout = 60000 2014-05-11 09:40:31,629 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Closed zookeeper sessionid=0x345ba20ce0f7389 2014-05-11 09:40:31,755 INFO org.apache.zookeeper.ZooKeeper: Session: 0x345ba20ce0f7389 closed 2014-05-11 09:40:31,755 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down 2014-05-11 09:40:31,761 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=hiveapp1:2181 sessionTimeout=180000 watcher=hconnection 2014-05-11 09:40:31,762 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of this process is [email protected] 2014-05-11 09:40:31,762 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server hiveapp1/10.10.30.200:2181. Will not attempt to authenticate using SASL (unknown error) 2014-05-11 09:40:31,763 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to hiveapp1/10.10.30.200:2181, initiating session 2014-05-11 09:40:31,921 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server hiveapp1/10.10.30.200:2181, sessionid = 0x345ba20ce0f738c, negotiated timeout = 60000 2014-05-11 09:40:32,053 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Closed zookeeper sessionid=0x345ba20ce0f738c 2014-05-11 09:40:32,220 INFO org.apache.zookeeper.ZooKeeper: Session: 0x345ba20ce0f738c closed 2014-05-11 09:40:32,221 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down 2014-05-11 09:43:11,127 WARN org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Failed all from region=FLOW_MEF,\x03,1399822720276.35656409db81bc4a45384e11cec0e45b., hostname=hiveapp2, port=60020 java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.OutOfMemoryError at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1571) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1423) at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:754) at com.salesforce.phoenix.execute.MutationState.commit(MutationState.java:384) at com.salesforce.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:249) at com.salesforce.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:86) at com.salesforce.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:51) at com.salesforce.phoenix.pig.PhoenixHBaseStorage.putNext(PhoenixHBaseStorage.java:161) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:558) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:106) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:264) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: java.lang.RuntimeException: java.lang.OutOfMemoryError at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:216) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1407) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1395) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.OutOfMemoryError at sun.misc.Unsafe.allocateMemory(Native Method) at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:127) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:174) at sun.nio.ch.IOUtil.write(IOUtil.java:58) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487) at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:62) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:143) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:153) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:114) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122) at java.io.DataOutputStream.write(DataOutputStream.java:107) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:625) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:981) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) at com.sun.proxy.$Proxy15.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1400) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1398) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210) ... 6 more 2014-05-11 09:43:12,366 WARN org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Failed all from region=FLOW_MEF,\x09,1399822720277.54ea08cfff2e43e5186b26fec76f3030., hostname=hiveapp3, port=60020 java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.OutOfMemoryError at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1571) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1423) at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:754) at com.salesforce.phoenix.execute.MutationState.commit(MutationState.java:384) at com.salesforce.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:249) at com.salesforce.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:86) at com.salesforce.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:51) at com.salesforce.phoenix.pig.PhoenixHBaseStorage.putNext(PhoenixHBaseStorage.java:161) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:558) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:106) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:264) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: java.lang.RuntimeException: java.lang.OutOfMemoryError at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:216) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1407) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1395) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.OutOfMemoryError at sun.misc.Unsafe.allocateMemory(Native Method) at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:127) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:174) at sun.nio.ch.IOUtil.write(IOUtil.java:58) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487) at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:62) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:143) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:153) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:114) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122) at java.io.DataOutputStream.write(DataOutputStream.java:107) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:625) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:981) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) at com.sun.proxy.$Proxy15.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1400) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1398) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210) ... 6 more 2014-05-11 09:43:13,409 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=hiveapp1:2181 sessionTimeout=180000 watcher=hconnection 2014-05-11 09:43:13,411 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of this process is [email protected] 2014-05-11 09:43:13,411 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server hiveapp1/10.10.30.200:2181. Will not attempt to authenticate using SASL (unknown error) 2014-05-11 09:43:13,413 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to hiveapp1/10.10.30.200:2181, initiating session 2014-05-11 09:43:13,547 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server hiveapp1/10.10.30.200:2181, sessionid = 0x345ba20ce0f73b5, negotiated timeout = 60000 2014-05-11 09:43:13,678 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Closed zookeeper sessionid=0x345ba20ce0f73b5 2014-05-11 09:43:13,895 INFO org.apache.zookeeper.ZooKeeper: Session: 0x345ba20ce0f73b5 closed 2014-05-11 09:43:13,895 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down -- Russell Jurney twitter.com/rjurney [email protected] datasyndrome.com
