[ https://issues.apache.org/jira/browse/YARN-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Li Lu updated YARN-5432: ------------------------ Attachment: YARN-5432-trunk.001.patch Upload a patch to fix this issue. It fixes the problems on two sides: 1. using a creation timestamp in the leveldb cache store's db name, therefore caches pointing to the same group id will have unique names even if the old cache has been evicted. 2. Perform precondition check on the cache size to avoid the ATS v1.5 store runs with 0 caches. > Lock already held by another process while LevelDB cache store creation for > dag > ------------------------------------------------------------------------------- > > Key: YARN-5432 > URL: https://issues.apache.org/jira/browse/YARN-5432 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver > Affects Versions: 2.8.0, 2.7.3 > Reporter: Karam Singh > Assignee: Li Lu > Attachments: YARN-5432-trunk.001.patch > > > While running ATS stress tests, 15 concurrent ATS reads (python thread > which gives ws/v1/time/TEZ_DAG_ID, > ws/v1/time/TEZ_VERTEX_DI?primaryFilter=TEZ_DAG_ID:<dag_id> etc) calls. > Note: Summary store for ATSv1.5 is RLD, but as we for each dag/application > ATS also creates leveldb cache when vertex/task/taskattempts information is > queried from ATS. > > Getting following type of excpetion very frequently in ATS logs :- > 2016-07-23 00:01:56,089 [1517798697@qtp-1198158701-850] INFO > org.apache.hadoop.service.AbstractService: Service > LeveldbCache.timelineEntityGroupId_1469090881194_4832_application_1469090881194_4832 > failed in state INITED; cause: > org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock > /grid/4/yarn_ats/atsv15_rld/timelineEntityGroupId_1469090881194_4832_application_1469090881194_4832-timeline-cache.ldb/LOCK: > already held by process > org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock > /grid/4/yarn_ats/atsv15_rld/timelineEntityGroupId_1469090881194_4832_application_1469090881194_4832-timeline-cache.ldb/LOCK: > already held by process > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.timeline.LevelDBCacheTimelineStore.serviceInit(LevelDBCacheTimelineStore.java:108) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.timeline.EntityCacheItem.refreshCache(EntityCacheItem.java:113) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getCachedStore(EntityGroupFSTimelineStore.java:1021) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getTimelineStoresFromCacheIds(EntityGroupFSTimelineStore.java:936) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getTimelineStoresForRead(EntityGroupFSTimelineStore.java:989) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getEntities(EntityGroupFSTimelineStore.java:1041) > at > org.apache.hadoop.yarn.server.timeline.TimelineDataManager.doGetEntities(TimelineDataManager.java:168) > at > org.apache.hadoop.yarn.server.timeline.TimelineDataManager.getEntities(TimelineDataManager.java:138) > at > org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getEntities(TimelineWebServices.java:117) > at sun.reflect.GeneratedMethodAccessor82.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) > at > com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185) > at > com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) > at > com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288) > at > com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) > at > com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) > at > com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) > at > com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) > at > com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469) > at > com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400) > at > com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349) > at > com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339) > at > com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416) > at > com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:886) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) > at > com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) > at > com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) > at > com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org