Is this an expectation problem or a legitimate concern. I have been studying the memory configurations on cloudera manager and I don't seem to see where I can improve my situation.
On Thu, May 8, 2014 at 5:35 PM, Geovanie Marquez <[email protected] > wrote: > sorry didn't include version > > CDH5 version - CDH-5.0.0-1.cdh5.0.0.p0.47 > > > On Thu, May 8, 2014 at 5:32 PM, Geovanie Marquez < > [email protected]> wrote: > >> Hey group, >> >> There is one job that scans HBase contents and is really resource >> intensive using all resources available to yarn (under Resource Manager). >> In my case, that is 8GB. My expectation here is that a properly configured >> cluster would kill the application or degrade the application performance >> but never ever take a region server down. This is intended to be a >> multi-tenant environment where developers may submit jobs at will and I >> would want a configuration where the cluster services are not exited in >> this way because of memory. >> >> The simple solution here, is to change the way the job consumes resources >> so that when run it is not so resource greedy. I want to understand how I >> can mitigate this situation in general. >> >> **It FAILS with the following config:** >> The RPC client has 30 handlers >> write buffer of 2MiB >> The RegionServer heap is 4GiB >> Max Size of all memstores is 0.40 of total heap >> HFile Block Cache Size is 0.40 >> Low watermark for memstore flush is 0.38 >> HBase Memstore size is 128MiB >> >> **Job still FAILS with the following config:** >> Everything else the same except >> The RPC client has 10 handlers >> >> **Job still FAILS with the following config:** >> Everything else the same except >> HFile Block Cache Size is 0.10 >> >> >> When this runs I get the following error stacktrace: >> # >> #How do I avoid this via configuration. >> # >> >> java.lang.OutOfMemoryError: Java heap space >> at >> org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1100) >> at >> org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:721) >> 2014-05-08 16:23:54,705 WARN [IPC Client (1242056950) connection to >> c1d001.in.wellcentive.com/10.2.4.21:60020 from hbase] >> org.apache.hadoop.ipc.RpcClient: IPC Client (1242056950) connection to >> c1d001.in.wellcentive.com/10.2.4.21:60020 from hbase: unexpected exception >> receiving call responses >> # >> >> ###Yes, there was an RPC timeout this is what is killing the server because >> the timeout is eventually (1minute later) reached. >> >> # >> >> java.lang.OutOfMemoryError: Java heap space >> at >> org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1100) >> at >> org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:721) >> 2014-05-08 16:23:55,319 INFO [main] >> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: recovered from >> org.apache.hadoop.hbase.DoNotRetryIOException: Failed after retry of >> OutOfOrderScannerNextException: was there a rpc timeout? >> at >> org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:384) >> at >> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:194) >> at >> org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:138) >> at >> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533) >> at >> org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) >> at >> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) >> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:415) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) >> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) >> >> # >> >> ## Probably caused by the OOME above >> >> # >> >> Caused by: >> org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: >> org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected >> nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: >> 5612205039322936440 number_of_rows: 10000 close_scanner: false >> next_call_seq: 0 >> at >> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3018) >> at >> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929) >> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175) >> at >> org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879) >> >> >
