Have you tried reducing the batch size (phoenix.upsert.batch.size) further, as this is the correct knob to dial down? This would reduce the client-side buffering which should prevent the OOM you're seeing. Would multiple PhoenixHBaseStorage invocations occur in parallel on the same client?
Thanks, James On Sun, May 11, 2014 at 9:54 AM, Russell Jurney <[email protected]>wrote: > We've had good luck loading Phoenix from HDFS using PhoenixHBaseStorage > up until now. Now that we're pushing hundreds of megabytes of data at a > time, we're seeing the error at the bottom of this post. > > I've tried adding heap space to the pig process, up to 4GB. I've also > tried reducing the batch size to 500 from 5,000. Neither adjustment fixes > the problem. We're stuck. Anyone got any ideas? > > We're on Phoenix 2.2, on CDH 4.4. > > 2014-05-11 09:40:30,034 INFO > com.salesforce.phoenix.pig.PhoenixPigConfiguration: Phoenix Upsert > Statement: UPSERT INTO FLOW_MEF > VALUES > (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?) > 2014-05-11 09:40:30,034 INFO > com.salesforce.phoenix.pig.hadoop.PhoenixOutputFormat: Initialized Phoenix > connection, autoCommit=false > 2014-05-11 09:40:30,042 WARN org.apache.hadoop.conf.Configuration: > fs.default.name is deprecated. Instead, use fs.defaultFS > 2014-05-11 09:40:30,044 WARN org.apache.hadoop.conf.Configuration: > io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum > 2014-05-11 09:40:30,074 INFO org.apache.pig.data.SchemaTupleBackend: Key > [pig.schematuple] was not set... will not generate code. > 2014-05-11 09:40:30,131 INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map: > Aliases being processed per job phase (AliasName[line,offset]): M: > flow_mef[7,11] C: R: > 2014-05-11 09:40:31,289 INFO org.apache.zookeeper.ZooKeeper: Initiating > client connection, connectString=hiveapp1:2181 sessionTimeout=180000 > watcher=hconnection > 2014-05-11 09:40:31,290 INFO > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of > this process is [email protected] > 2014-05-11 09:40:31,291 INFO org.apache.zookeeper.ClientCnxn: Opening > socket connection to server hiveapp1/10.10.30.200:2181. Will not attempt > to authenticate using SASL (unknown error) > 2014-05-11 09:40:31,292 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to hiveapp1/10.10.30.200:2181, initiating session > 2014-05-11 09:40:31,304 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server hiveapp1/10.10.30.200:2181, sessionid = > 0x345ba20ce0f7383, negotiated timeout = 60000 > 2014-05-11 09:40:31,446 INFO > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: > Closed zookeeper sessionid=0x345ba20ce0f7383 > 2014-05-11 09:40:31,462 INFO org.apache.zookeeper.ZooKeeper: Session: > 0x345ba20ce0f7383 closed > 2014-05-11 09:40:31,462 INFO org.apache.zookeeper.ClientCnxn: EventThread > shut down > 2014-05-11 09:40:31,469 INFO org.apache.zookeeper.ZooKeeper: Initiating > client connection, connectString=hiveapp1:2181 sessionTimeout=180000 > watcher=hconnection > 2014-05-11 09:40:31,470 INFO > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of > this process is [email protected] > 2014-05-11 09:40:31,471 INFO org.apache.zookeeper.ClientCnxn: Opening > socket connection to server hiveapp1/10.10.30.200:2181. Will not attempt > to authenticate using SASL (unknown error) > 2014-05-11 09:40:31,472 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to hiveapp1/10.10.30.200:2181, initiating session > 2014-05-11 09:40:31,479 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server hiveapp1/10.10.30.200:2181, sessionid = > 0x345ba20ce0f7389, negotiated timeout = 60000 > 2014-05-11 09:40:31,629 INFO > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: > Closed zookeeper sessionid=0x345ba20ce0f7389 > 2014-05-11 09:40:31,755 INFO org.apache.zookeeper.ZooKeeper: Session: > 0x345ba20ce0f7389 closed > 2014-05-11 09:40:31,755 INFO org.apache.zookeeper.ClientCnxn: EventThread > shut down > 2014-05-11 09:40:31,761 INFO org.apache.zookeeper.ZooKeeper: Initiating > client connection, connectString=hiveapp1:2181 sessionTimeout=180000 > watcher=hconnection > 2014-05-11 09:40:31,762 INFO > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of > this process is [email protected] > 2014-05-11 09:40:31,762 INFO org.apache.zookeeper.ClientCnxn: Opening > socket connection to server hiveapp1/10.10.30.200:2181. Will not attempt > to authenticate using SASL (unknown error) > 2014-05-11 09:40:31,763 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to hiveapp1/10.10.30.200:2181, initiating session > 2014-05-11 09:40:31,921 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server hiveapp1/10.10.30.200:2181, sessionid = > 0x345ba20ce0f738c, negotiated timeout = 60000 > 2014-05-11 09:40:32,053 INFO > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: > Closed zookeeper sessionid=0x345ba20ce0f738c > 2014-05-11 09:40:32,220 INFO org.apache.zookeeper.ZooKeeper: Session: > 0x345ba20ce0f738c closed > 2014-05-11 09:40:32,221 INFO org.apache.zookeeper.ClientCnxn: EventThread > shut down > 2014-05-11 09:43:11,127 WARN > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: > Failed all from > region=FLOW_MEF,\x03,1399822720276.35656409db81bc4a45384e11cec0e45b., > hostname=hiveapp2, port=60020 > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > java.lang.OutOfMemoryError > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:188) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1571) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1423) > at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:754) > at > com.salesforce.phoenix.execute.MutationState.commit(MutationState.java:384) > at > com.salesforce.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:249) > at > com.salesforce.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:86) > at > com.salesforce.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:51) > at > com.salesforce.phoenix.pig.PhoenixHBaseStorage.putNext(PhoenixHBaseStorage.java:161) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98) > at > org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:558) > at > org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85) > at > org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:106) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:264) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) > at org.apache.hadoop.mapred.Child$4.run(Child.java:268) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) > at org.apache.hadoop.mapred.Child.main(Child.java:262) > Caused by: java.lang.RuntimeException: java.lang.OutOfMemoryError > at > org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:216) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1407) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1395) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.OutOfMemoryError > at sun.misc.Unsafe.allocateMemory(Native Method) > at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:127) > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) > at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:174) > at sun.nio.ch.IOUtil.write(IOUtil.java:58) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487) > at > org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:62) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:143) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:153) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:114) > at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:625) > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:981) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) > at com.sun.proxy.$Proxy15.multi(Unknown Source) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1400) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1398) > at > org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210) > ... 6 more > 2014-05-11 09:43:12,366 WARN > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: > Failed all from > region=FLOW_MEF,\x09,1399822720277.54ea08cfff2e43e5186b26fec76f3030., > hostname=hiveapp3, port=60020 > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > java.lang.OutOfMemoryError > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:188) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1571) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1423) > at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:754) > at > com.salesforce.phoenix.execute.MutationState.commit(MutationState.java:384) > at > com.salesforce.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:249) > at > com.salesforce.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:86) > at > com.salesforce.phoenix.pig.hadoop.PhoenixRecordWriter.write(PhoenixRecordWriter.java:51) > at > com.salesforce.phoenix.pig.PhoenixHBaseStorage.putNext(PhoenixHBaseStorage.java:161) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98) > at > org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:558) > at > org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85) > at > org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:106) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:264) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) > at org.apache.hadoop.mapred.Child$4.run(Child.java:268) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) > at org.apache.hadoop.mapred.Child.main(Child.java:262) > Caused by: java.lang.RuntimeException: java.lang.OutOfMemoryError > at > org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:216) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1407) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1395) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.OutOfMemoryError > at sun.misc.Unsafe.allocateMemory(Native Method) > at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:127) > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) > at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:174) > at sun.nio.ch.IOUtil.write(IOUtil.java:58) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487) > at > org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:62) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:143) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:153) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:114) > at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:625) > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:981) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) > at com.sun.proxy.$Proxy15.multi(Unknown Source) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1400) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1398) > at > org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210) > ... 6 more > 2014-05-11 09:43:13,409 INFO org.apache.zookeeper.ZooKeeper: Initiating > client connection, connectString=hiveapp1:2181 sessionTimeout=180000 > watcher=hconnection > 2014-05-11 09:43:13,411 INFO > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of > this process is [email protected] > 2014-05-11 09:43:13,411 INFO org.apache.zookeeper.ClientCnxn: Opening > socket connection to server hiveapp1/10.10.30.200:2181. Will not attempt > to authenticate using SASL (unknown error) > 2014-05-11 09:43:13,413 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to hiveapp1/10.10.30.200:2181, initiating session > 2014-05-11 09:43:13,547 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server hiveapp1/10.10.30.200:2181, sessionid = > 0x345ba20ce0f73b5, negotiated timeout = 60000 > 2014-05-11 09:43:13,678 INFO > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: > Closed zookeeper sessionid=0x345ba20ce0f73b5 > 2014-05-11 09:43:13,895 INFO org.apache.zookeeper.ZooKeeper: Session: > 0x345ba20ce0f73b5 closed > 2014-05-11 09:43:13,895 INFO org.apache.zookeeper.ClientCnxn: EventThread > shut down > > -- > Russell Jurney twitter.com/rjurney [email protected] datasyndrome. > com >
