[ https://issues.apache.org/jira/browse/AMBARI-22644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Hurley updated AMBARI-22644: ------------------------------------- Reporter: Vivek Sharma (was: Jonathan Hurley) > Node Managers fail to start after Spark2 is patched due to CNF > YarnShuffleService > --------------------------------------------------------------------------------- > > Key: AMBARI-22644 > URL: https://issues.apache.org/jira/browse/AMBARI-22644 > Project: Ambari > Issue Type: Bug > Affects Versions: 2.6.1 > Reporter: Vivek Sharma > Assignee: Jonathan Hurley > Priority: Critical > Fix For: 2.6.2 > > > *STR* > # Deploy HDP-2.6.4.0 cluster with Ambari-2.6.1.0-114 > # Apply HBase patch Upgrade on the cluster (this step is optional) > # Then apply Spark2 patch Upgrade on the cluster > # Restart Node Managers > *Result* > NM restart fails with below error: > {code} > 2017-12-10 07:17:02,559 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:shutdown(606)) - NodeManager metrics system shutdown > complete. > 2017-12-10 07:17:02,559 FATAL nodemanager.NodeManager > (NodeManager.java:initAndStartNodeManager(549)) - Error starting NodeManager > org.apache.hadoop.service.ServiceStateException: > java.lang.ClassNotFoundException: > org.apache.spark.network.yarn.YarnShuffleService > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:245) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:291) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:546) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:594) > Caused by: java.lang.ClassNotFoundException: > org.apache.spark.network.yarn.YarnShuffleService > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at > org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:197) > at > org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:165) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.getInstance(AuxiliaryServiceWithCustomClassLoader.java:169) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:131) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 8 more > 2017-12-10 07:17:02,562 INFO nodemanager.NodeManager > (LogAdapter.java:info(45)) - SHUTDOWN_MSG: > {code} > The spark properties are correctly being written out as per AMBARI-22525. > Initially, we had defined Spark properties for ATS like this: > {code} > <name>yarn.nodemanager.aux-services.spark_shuffle.classpath</name> > <value>{{stack_root}}/${hdp.version}/spark/aux/*</value> > {code} > When YARN upgrades without Spark, we run into AMBARI-22525. Seems like the > shuffle classes are installed as part of RPM dependencies, but not the > SparkATSPlugin. > So: > - If we use YARN's version for the Spark classes, then ATS can't find > SparkATSPlugin since that is not part of YARN. > - If we use Spark's version for the classes, then Spark can never upgrade > without YARN since NodeManager can't find the new Spark classes. > However, it seems like shuffle and ATS use different properties. We changed > all 3 properties in AMBARI-22525: > {code} > yarn.nodemanager.aux-services.spark2_shuffle.classpath > yarn.nodemanager.aux-services.spark_shuffle.classpath > yarn.timeline-service.entity-group-fs-store.group-id-plugin-classpath > {code} > It seems like what need to do is change the spark shuffle stuff back to > hdp.version, but leave ATS using the new version since we're guaranteed to > have Spark installed on the ATS machine. -- This message was sent by Atlassian JIRA (v6.4.14#64029)