Forwarding to user list. ---------- Forwarded message ---------- From: Milinda Pathirage <mpath...@umail.iu.edu> Date: Tue, Oct 15, 2013 at 3:23 PM Subject: Some questions related to Giraph Pur YARN implementation To: d...@giraph.apache.org
Hi Eli, I tried scripts (giraph, giraph-env) found in bin directory to run Giraph sample mentioned in quick start guide. But I face some issues and had to do some patching to get it into a working state (Job submission works, but execution fails). Below are some things I noticed: 1. giraph script in 'bin' directory uses -libjars option. But this doesn't work with GiraphYarnClient. It should be -yj. 2. We need to add $GIRAPH_HOME + $VERTEX_IMPL_JAR_DIR (directory containing vertex implementation jar) to CLASSPATH manually due to the way YarnUtils.getLocalFiles is implemented. Basically we should add parent directories of Yarn Jars to class path. I am not sure which is the correct solution * fixing get LocalFiles * CLASSPATH base method 3. YarnUtils.populateJars method uses fileNames.contains(f.getName) to decide adding jar to local resource map. But if we use giraph script fileNames contains absolute paths of 'Yarn Lib Jars'. I got this working by using getAbsolute paths instead of getName. 4. After above changes we can successfully launch a job in YARN cluster using giraph script. But job fails due to a file path issue. When submitting job we serialize Giraph configuration to giraph-conf.xml. But "giraph.yarn.libjars" property contains list of files but with absolute paths from client machine which use to submit the job. For example in my scenario giraph jar is "/Users/mpathira/giraph-bin/giraph-0.2-SNAPSHOT-for-hadoop-2.0.6-alpha/giraph-0.2-SNAPSHOT-for-hadoop-2.0.6-alpha.jar". But GiraphApplicationMaster tries to access these files and fails because the file is not there in HDFS with the above name. If we only use jar names instead of paths for 'yarnjars' option we should be able to fix 4. But I am not sure whether that is the correct approach. May be we need to change how we serialize giraph-conf.xml in to HDFS. We can use HDFS paths instead of paths from client machine. @Eli I really appreciate your comments regarding above. I can create a JIRA ticket if needed. Thanks Milinda -- Milinda Pathirage twitter: milindalakmal skype: milinda.pathirage blog: http://milinda.pathirage.org -- Milinda Pathirage twitter: milindalakmal skype: milinda.pathirage blog: http://milinda.pathirage.org