Hi, Check if you have hadoop native libs in java.library.path
[root@c6404 cache]# ll /usr/lib/ams-hbase/lib/hadoop-native/ total 4688 -rw-r--r-- 1 root root 1319074 гру 3 03:24 libhadoop.a -rw-r--r-- 1 root root 1487444 гру 3 03:24 libhadooppipes.a -rw-r--r-- 1 root root 775455 гру 3 03:24 libhadoop.so -rw-r--r-- 1 root root 582760 гру 3 03:24 libhadooputils.a -rw-r--r-- 1 root root 366380 гру 3 03:24 libhdfs.a -rw-r--r-- 1 root root 230225 гру 3 03:24 libhdfs.so -rw-r--r-- 1 root root 19848 гру 3 03:24 libsnappy.so.1 If no, the collector RPM hasn't been built correctly ________________________________________ From: Eirik Thorsnes <[email protected]> Sent: Thursday, December 03, 2015 8:16 PM To: [email protected] Subject: Ambari metrics collector dies in 2.1.3-snapshot Hi, I'm testing Ambari 2.1.3-snapshot (from Dec 1st, a830cc0) on HDP2.3.0 stack. In this setup Ambari-metrics-collector dies after some minutes with the below log-paste (note the "FATAL" error, this comes after many of the exceptions seen on top). Possibly related to the pasted error below: On startup it fails to load the native libraries, from the log: 2015-12-03 18:40:44,296 WARN [main] NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable even though they exist in the java.library.path given some lines below in the log: 2015-12-03 18:40:44,396 INFO [main] ZooKeeper:100 - Client environment:java.library.path=/usr/lib/ams-hbase/lib/hadoop-native -Xmx3072m I also tried to replace the path above with a symlink to the hadoop-client/lib/native dir (which has different content) - but this did not help. =========== paste =============== Thu Dec 03 18:26:25 CET 2015, RpcRetryingCaller{globalStartTime=1449163034289, pause=100, retries=35}, java.io.IOException: java.io.IOException: java.lang.NoClassDefFoundError: org/iq8 0/snappy/CorruptionException at org.apache.phoenix.coprocessor.ServerCachingEndpointImpl.addServerCache(ServerCachingEndpointImpl.java:78) at org.apache.phoenix.coprocessor.generated.ServerCachingProtos$ServerCachingService.callMethod(ServerCachingProtos.java:3200) at org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7390) at org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:1873) at org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:1855) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32209) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2112) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NoClassDefFoundError: org/iq80/snappy/CorruptionException at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.apache.phoenix.coprocessor.ServerCachingEndpointImpl.addServerCache(ServerCachingEndpointImpl.java:72) ... 10 more Caused by: java.lang.ClassNotFoundException: org.iq80.snappy.CorruptionException at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 13 more at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:147) at org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel.callExecService(RegionCoprocessorRpcChannel.java:95) at org.apache.hadoop.hbase.ipc.CoprocessorRpcChannel.callMethod(CoprocessorRpcChannel.java:56) at org.apache.phoenix.coprocessor.generated.ServerCachingProtos$ServerCachingService$Stub.addServerCache(ServerCachingProtos.java:3270) at org.apache.phoenix.cache.ServerCacheClient$1$1.call(ServerCacheClient.java:204) at org.apache.phoenix.cache.ServerCacheClient$1$1.call(ServerCacheClient.java:189) at org.apache.hadoop.hbase.client.HTable$16.call(HTable.java:1741) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: java.io.IOException: java.lang.NoClassDefFoundError: org/iq80/snappy/CorruptionException at org.apache.phoenix.coprocessor.ServerCachingEndpointImpl.addServerCache(ServerCachingEndpointImpl.java:78) at org.apache.phoenix.coprocessor.generated.ServerCachingProtos$ServerCachingService.callMethod(ServerCachingProtos.java:3200) at org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7390) at org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:1873) at org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:1855) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32209) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2112) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NoClassDefFoundError: org/iq80/snappy/CorruptionException at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.apache.phoenix.coprocessor.ServerCachingEndpointImpl.addServerCache(ServerCachingEndpointImpl.java:72) ... 10 more Caused by: java.lang.ClassNotFoundException: org.iq80.snappy.CorruptionException at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 13 more at sun.reflect.GeneratedConstructorAccessor43.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:322) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.execService(ProtobufUtil.java:1619) at org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel$1.call(RegionCoprocessorRpcChannel.java:92) at org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel$1.call(RegionCoprocessorRpcChannel.java:89) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126) ... 10 more Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): java.io.IOException: java.lang.NoClassDefFoundError: org/iq80/snappy/CorruptionException at org.apache.phoenix.coprocessor.ServerCachingEndpointImpl.addServerCache(ServerCachingEndpointImpl.java:78) at org.apache.phoenix.coprocessor.generated.ServerCachingProtos$ServerCachingService.callMethod(ServerCachingProtos.java:3200) at org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7390) at org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:1873) at org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:1855) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32209) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2112) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NoClassDefFoundError: org/iq80/snappy/CorruptionException at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.apache.phoenix.coprocessor.ServerCachingEndpointImpl.addServerCache(ServerCachingEndpointImpl.java:72) ... 10 more Caused by: java.lang.ClassNotFoundException: org.iq80.snappy.CorruptionException at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 13 more at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1206) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.execService(ClientProtos.java:32675) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.execService(ProtobufUtil.java:1615) ... 13 more 2015-12-03 18:26:25,220 INFO [hconnection-0x33bc72d1-shared--pool2-t265] RpcRetryingCaller:132 - Call exception, tries=16, retries=35, started=188985 ms ago, cancelled=false, msg=row 'metricssystem.MetricsSystem.NumActiveSinks^@compute-10-2.local^@^@^@^AMi���datanode' on table 'METRIC_RECORD' at region=METRIC_RECORD,metricssystem.MetricsSystem.NumActiveSinks\x00com pute-10-2.local\x00\x00\x00\x01Mi\xDD\xE3\xF7datanode,1432015934895.363cbca58c745853100106053690db95., hostname=compute-10-1.local,61320,1449162924698, seqNum=34149729 2015-12-03 18:26:25,539 INFO [hconnection-0x33bc72d1-shared--pool2-t155] RpcRetryingCaller:132 - Call exception, tries=27, retries=35, started=409895 ms ago, cancelled=false, msg=row '' on table 'METRIC_RECORD' at region=METRIC_RECORD,,1432015934895.0f0a9816ffb93fe65176292b6ad378d1., hostname=compute-10-1.local,61320,1449162924698, seqNum=24131209 2015-12-03 18:26:25,597 INFO [hconnection-0x33bc72d1-shared--pool2-t153] RpcRetryingCaller:132 - Call exception, tries=27, retries=35, started=409953 ms ago, cancelled=false, msg=row 'metricssystem.MetricsSystem.NumActiveSinks^@compute-10-2.local^@^@^@^AMi���datanode' on table 'METRIC_RECORD' at region=METRIC_RECORD,metricssystem.MetricsSystem.NumActiveSinks\x00compute-10-2.local\x00\x00\x00\x01Mi\xDD\xE3\xF7datanode,1432015934895.363cbca58c745853100106053690db95., hostname=compute-10-1.local,61320,1449162924698, seqNum=34149729 2015-12-03 18:26:25,680 INFO [hconnection-0x33bc72d1-shared--pool2-t215] RpcRetryingCaller:132 - Call exception, tries=22, retries=35, started=309625 ms ago, cancelled=false, msg=row 'metricssystem.MetricsSystem.NumActiveSinks^@compute-10-2.local^@^@^@^AMi���datanode' on table 'METRIC_RECORD' at region=METRIC_RECORD,metricssystem.MetricsSystem.NumActiveSinks\x00compute-10-2.local\x00\x00\x00\x01Mi\xDD\xE3\xF7datanode,1432015934895.363cbca58c745853100106053690db95., hostname=compute-10-1.local,61320,1449162924698, seqNum=34149729 2015-12-03 18:26:26,085 INFO [hconnection-0x33bc72d1-shared--pool2-t228] RpcRetryingCaller:132 - Call exception, tries=29, retries=35, started=450123 ms ago, cancelled=false, msg=row 'metricssystem.MetricsSystem.NumActiveSinks^@compute-10-2.local^@^@^@^AMi���datanode' on table 'METRIC_RECORD' at region=METRIC_RECORD,metricssystem.MetricsSystem.NumActiveSinks\x00compute-10-2.local\x00\x00\x00\x01Mi\xDD\xE3\xF7datanode,1432015934895.363cbca58c745853100106053690db95., hostname=compute-10-1.local,61320,1449162924698, seqNum=34149729 2015-12-03 18:26:26,276 FATAL [pool-1-thread-1] TimelineMetricStoreWatcher:79 - Error getting metrics from TimelineMetricStore. Shutting down by TimelineMetricStoreWatcher. 2015-12-03 18:26:26,279 INFO [pool-1-thread-1] ExitUtil:124 - Exiting with status -1 2015-12-03 18:26:26,281 INFO [Thread-3] ConnectionManager$HConnectionImplementation:2068 - Closing master protocol: MasterService 2015-12-03 18:26:26,426 INFO [hconnection-0x33bc72d1-shared--pool2-t227] RpcRetryingCaller:132 - Call exception, tries=29, retries=35, started=450464 ms ago, cancelled=false, msg=row '' on table 'METRIC_RECORD' at region=METRIC_RECORD,,1432015934895.0f0a9816ffb93fe65176292b6ad378d1., hostname=compute-10-1.local,61320,1449162924698, seqNum=24131209 2015-12-03 18:26:26,442 INFO [Thread-1] log:67 - Stopped [email protected]:6188 2015-12-03 18:26:26,451 WARN [1705435578@qtp-1802896480-9] GenericExceptionHandler:98 - INTERNAL_SERVER_ERROR javax.ws.rs.WebApplicationException: java.sql.SQLException: Sub plan [0] execution interrupted. at org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TimelineWebServices.getTimelineMetrics(TimelineWebServices.java:387) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) ... -- Eirik Thorsnes
