The problem remains even we submit the job by: hive --service jar giraph-hive-1.0.0-jar-with-dependencies.jar org.apache.giraph.hive.HiveGiraphRunner
We are using hadoop-1.0.4, hive-0.11 and giraph-1.0.0. Are they incompatible? Do we need to configure any environment variable for Giraph? On Sat, Nov 16, 2013 at 10:44 PM, Andy Ho <csz...@comp.polyu.edu.hk> wrote: > Hi folks, > > Currently, we are going to use Giraph to replace some graph processing in > our Hive workflow. > We did't use HiveGiraphRunner to submit the job directly, but customize > and submit it in our own program. > > However, after the job is submitted to hadoop, NPE encountered when > HiveApiInputFormat is computing the InputSplits: > > Below is the scala code snippet about the job configuration: > > ================================================== > > val hive_config_copy = new HiveConf(hive_config) > val workers = 1 > val dbName = "default" > val edgeInputTableStr = "transitMatrix" > val vertexInputTableStr = "initialRank" > val vertexOutputTableStr = "twitterRank" > > HIVE_TO_VERTEX_CLASS.set(hive_config_copy, classOf[InitialRankToVertex]) > HIVE_TO_EDGE_CLASS.set(hive_config_copy, classOf[TransitMatrixToEdge]) > hive_config_copy.setClass(HiveVertexWriter.VERTEX_TO_HIVE_KEY, > classOf[TRVertexToHive], > classOf[VertexToHive[Text, DoubleWritable, Writable]]) > > val job = new GiraphJob(hive_config_copy, getClass().getName()) > var giraphConf = job.getConfiguration() > giraphConf.setVertexClass(classOf[TwitterRankVertex]) > > var hiveVertexInputDescription = new HiveInputDescription() > var hiveEdgeInputDescription = new HiveInputDescription() > var hiveOutputDescription = new HiveOutputDescription() > > /** > * Initialize hive input db and tables > */ > > hiveVertexInputDescription.setDbName(dbName) > hiveEdgeInputDescription.setDbName(dbName) > hiveOutputDescription.setDbName(dbName) > hiveEdgeInputDescription.setTableName(edgeInputTableStr) > hiveVertexInputDescription.setTableName(vertexInputTableStr) > hiveOutputDescription.setTableName(vertexOutputTableStr) > > > /** > * Initialize the hive input settings > */ > > hiveVertexInputDescription.setNumSplits(HIVE_VERTEX_SPLITS.get(giraphConf)) > HiveApiInputFormat.setProfileInputDesc(giraphConf, > hiveVertexInputDescription, VERTEX_INPUT_PROFILE_ID) > giraphConf.setVertexInputFormatClass(classOf[HiveVertexInputFormat[Text, > DoubleWritable, Writable]]) > HiveTableSchemas.put(giraphConf, > VERTEX_INPUT_PROFILE_ID,hiveVertexInputDescription.hiveTableName()) > > hiveEdgeInputDescription.setNumSplits(HIVE_EDGE_SPLITS.get(giraphConf)) > HiveApiInputFormat.setProfileInputDesc(giraphConf, > hiveEdgeInputDescription,EDGE_INPUT_PROFILE_ID) > giraphConf.setEdgeInputFormatClass(classOf[HiveEdgeInputFormat[Text, > DoubleWritable]]); > HiveTableSchemas.put(giraphConf, > EDGE_INPUT_PROFILE_ID,hiveEdgeInputDescription.hiveTableName()) > > /** > * Initialize the hive output settings > */ > > HiveApiOutputFormat.initProfile(giraphConf, > hiveOutputDescription,VERTEX_OUTPUT_PROFILE_ID) > giraphConf.setVertexOutputFormatClass(classOf[HiveVertexOutputFormat[Text, > DoubleWritable, Writable]]) > HiveTableSchemas.put(giraphConf, > VERTEX_OUTPUT_PROFILE_ID,hiveOutputDescription.hiveTableName()) > > /** > * Set number of workers > */ > > giraphConf.setWorkerConfiguration(workers, workers, 100.0f) > > /** > * Run the job > */ > > if (job.run(true)) return true else return false > > > ========================================= > > Here are the task logs: > > 2013-11-16 12:19:19,032 INFO > com.facebook.giraph.hive.input.HiveApiInputFormat: getSplits for profile > vertex_input_profile > 2013-11-16 12:19:19,034 WARN org.apache.hadoop.hive.conf.HiveConf: > hive-site.xml not found on CLASSPATH > 2013-11-16 12:19:19,161 INFO org.apache.hadoop.mapred.FileInputFormat: > Total input paths to process : 1 > 2013-11-16 12:19:19,164 ERROR org.apache.giraph.master.MasterThread: > masterThread: Master algorithm failed with NullPointerException > java.lang.NullPointerException > at > org.apache.hadoop.mapred.TextInputFormat.isSplitable(TextInputFormat.java:42) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:232) > at > com.facebook.giraph.hive.input.HiveApiInputFormat.computeSplits(HiveApiInputFormat.java:183) > at > com.facebook.giraph.hive.input.HiveApiInputFormat.getSplits(HiveApiInputFormat.java:166) > at > com.facebook.giraph.hive.input.HiveApiInputFormat.getSplits(HiveApiInputFormat.java:147) > at > org.apache.giraph.hive.input.vertex.HiveVertexInputFormat.getSplits(HiveVertexInputFormat.java:60) > at > org.apache.giraph.master.BspServiceMaster.generateInputSplits(BspServiceMaster.java:314) > at > org.apache.giraph.master.BspServiceMaster.createInputSplits(BspServiceMaster.java:626) > at > org.apache.giraph.master.BspServiceMaster.createVertexInputSplits(BspServiceMaster.java:692) > > > at org.apache.giraph.master.MasterThread.run(MasterThread.java:100) > > Not sure if anything is missing in the job configuration. > Can anybody help? Thanks in advance. > > Best Wishes, > ~Andy > -- ~Andy