[ https://issues.apache.org/jira/browse/YARN-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14383285#comment-14383285 ]
Nikhil Mulley commented on YARN-3403: ------------------------------------- The more stack trace is here: this is reproducible. --- 2015-03-26 20:04:43,690 FATAL org.apache.hadoop.conf.Configuration: error parsing conf mapred-site.xml org.xml.sax.SAXParseException; systemId: file:/etc/hadoop/conf/mapred-site.xml; lineNumber: 316; columnNumber: 3; The element type "property" must be terminated by the matching end-tag "</property>". at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257) at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:347) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2183) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2171) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2242) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2195) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2112) at org.apache.hadoop.conf.Configuration.get(Configuration.java:858) at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:877) at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1278) at org.apache.hadoop.io.compress.zlib.ZlibFactory.isNativeZlibLoaded(ZlibFactory.java:65) at org.apache.hadoop.io.compress.zlib.ZlibFactory.getZlibCompressorType(ZlibFactory.java:82) at org.apache.hadoop.io.compress.DefaultCodec.getCompressorType(DefaultCodec.java:74) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163) at org.apache.hadoop.io.file.tfile.Compression$Algorithm.getCompressor(Compression.java:274) at org.apache.hadoop.io.file.tfile.BCFile$Writer$WBlockState.<init>(BCFile.java:129) at org.apache.hadoop.io.file.tfile.BCFile$Writer.prepareDataBlock(BCFile.java:430) at org.apache.hadoop.io.file.tfile.TFile$Writer.initDataBlock(TFile.java:642) at org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:533) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.writeVersion(AggregatedLogFormat.java:276) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.<init>(AggregatedLogFormat.java:272) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainer(AppLogAggregatorImpl.java:108) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:166) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:140) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$2.run(LogAggregationService.java:354) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2015-03-26 20:04:43,691 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Aggregation did not complete for application application_1426202183036_103251 2015-03-26 20:04:43,691 ERROR org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[LogAggregationService #2,5,main] threw an Throwable, but we are shutting down, so ignoring this java.lang.RuntimeException: org.xml.sax.SAXParseException; systemId: file:/etc/hadoop/conf/mapred-site.xml; lineNumber: 316; columnNumber: 3; The element type "property" must be terminated by the matching end-tag "</property>". -- > Nodemanager dies after a small typo in mapred-site.xml is induced > ----------------------------------------------------------------- > > Key: YARN-3403 > URL: https://issues.apache.org/jira/browse/YARN-3403 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Nikhil Mulley > Priority: Critical > > Hi, > We have noticed that with a small typo in terms of xml config > (mapred-site.xml) can cause the nodemanager go down completely without > stopping/restarting it externally. > I find it little weird that editing the config files on the filesystem, could > cause the running slave daemon yarn nodemanager shutdown. > In this case, I had a ending tag '/' missed in a property and that induced > the nodemanager go down in a cluster. > Why would nodemanager reload the configs while it is running? Are not they > picked up when they are started? Even if they are automated to pick up the > new configs dynamically, I think the xmllint/config checker should come in > before the nodemanager is asked to reload/restart. > > --- > java.lang.RuntimeException: org.xml.sax.SAXParseException; systemId: > file:/etc/hadoop/conf/mapred-site.xml; lineNumber: 228; columnNumber: 3; The > element type "value" must be terminated by the matching end-tag "</value>". > at > org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2348) > --- > Please shed light on this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)