The problem remains even we submit the job by:
hive --service jar giraph-hive-1.0.0-jar-with-dependencies.jar
org.apache.giraph.hive.HiveGiraphRunner

We are using hadoop-1.0.4, hive-0.11 and giraph-1.0.0. Are they
incompatible? Do we need to configure
any environment variable for Giraph?


On Sat, Nov 16, 2013 at 10:44 PM, Andy Ho <csz...@comp.polyu.edu.hk> wrote:

> Hi folks,
>
> Currently, we are going to use Giraph to replace some graph processing in
> our Hive workflow.
> We did't use HiveGiraphRunner to submit the job directly, but customize
> and submit it in our own program.
>
> However, after the job is submitted to hadoop, NPE encountered when
> HiveApiInputFormat is computing the InputSplits:
>
> Below is the scala code snippet about the job configuration:
>
> ==================================================
>
> val hive_config_copy = new HiveConf(hive_config)
> val workers = 1
> val dbName = "default"
> val edgeInputTableStr = "transitMatrix"
> val vertexInputTableStr = "initialRank"
> val vertexOutputTableStr = "twitterRank"
>
> HIVE_TO_VERTEX_CLASS.set(hive_config_copy, classOf[InitialRankToVertex])
> HIVE_TO_EDGE_CLASS.set(hive_config_copy, classOf[TransitMatrixToEdge])
> hive_config_copy.setClass(HiveVertexWriter.VERTEX_TO_HIVE_KEY, 
> classOf[TRVertexToHive],
> classOf[VertexToHive[Text, DoubleWritable, Writable]])
>
> val job = new GiraphJob(hive_config_copy, getClass().getName())
> var giraphConf = job.getConfiguration()
> giraphConf.setVertexClass(classOf[TwitterRankVertex])
>
> var hiveVertexInputDescription = new HiveInputDescription()
> var hiveEdgeInputDescription = new HiveInputDescription()
> var hiveOutputDescription = new HiveOutputDescription()
>
> /**
> * Initialize hive input db and tables
> */
>
> hiveVertexInputDescription.setDbName(dbName)
> hiveEdgeInputDescription.setDbName(dbName)
> hiveOutputDescription.setDbName(dbName)
> hiveEdgeInputDescription.setTableName(edgeInputTableStr)
> hiveVertexInputDescription.setTableName(vertexInputTableStr)
> hiveOutputDescription.setTableName(vertexOutputTableStr)
>
>
> /**
> * Initialize the hive input settings
> */
>
> hiveVertexInputDescription.setNumSplits(HIVE_VERTEX_SPLITS.get(giraphConf))
> HiveApiInputFormat.setProfileInputDesc(giraphConf,
> hiveVertexInputDescription, VERTEX_INPUT_PROFILE_ID)
> giraphConf.setVertexInputFormatClass(classOf[HiveVertexInputFormat[Text,
> DoubleWritable, Writable]])
> HiveTableSchemas.put(giraphConf,
> VERTEX_INPUT_PROFILE_ID,hiveVertexInputDescription.hiveTableName())
>
> hiveEdgeInputDescription.setNumSplits(HIVE_EDGE_SPLITS.get(giraphConf))
> HiveApiInputFormat.setProfileInputDesc(giraphConf,
> hiveEdgeInputDescription,EDGE_INPUT_PROFILE_ID)
> giraphConf.setEdgeInputFormatClass(classOf[HiveEdgeInputFormat[Text,
> DoubleWritable]]);
> HiveTableSchemas.put(giraphConf,
> EDGE_INPUT_PROFILE_ID,hiveEdgeInputDescription.hiveTableName())
>
> /**
> * Initialize the hive output settings
> */
>
> HiveApiOutputFormat.initProfile(giraphConf,
> hiveOutputDescription,VERTEX_OUTPUT_PROFILE_ID)
> giraphConf.setVertexOutputFormatClass(classOf[HiveVertexOutputFormat[Text,
> DoubleWritable, Writable]])
> HiveTableSchemas.put(giraphConf,
> VERTEX_OUTPUT_PROFILE_ID,hiveOutputDescription.hiveTableName())
>
> /**
> * Set number of workers
> */
>
> giraphConf.setWorkerConfiguration(workers, workers, 100.0f)
>
> /**
> * Run the job
> */
>
> if (job.run(true)) return true else return false
>
>
> =========================================
>
> Here are the task logs:
>
> 2013-11-16 12:19:19,032 INFO
> com.facebook.giraph.hive.input.HiveApiInputFormat: getSplits for profile
> vertex_input_profile
> 2013-11-16 12:19:19,034 WARN org.apache.hadoop.hive.conf.HiveConf:
> hive-site.xml not found on CLASSPATH
> 2013-11-16 12:19:19,161 INFO org.apache.hadoop.mapred.FileInputFormat:
> Total input paths to process : 1
> 2013-11-16 12:19:19,164 ERROR org.apache.giraph.master.MasterThread:
> masterThread: Master algorithm failed with NullPointerException
> java.lang.NullPointerException
> at
> org.apache.hadoop.mapred.TextInputFormat.isSplitable(TextInputFormat.java:42)
> at
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:232)
> at
> com.facebook.giraph.hive.input.HiveApiInputFormat.computeSplits(HiveApiInputFormat.java:183)
> at
> com.facebook.giraph.hive.input.HiveApiInputFormat.getSplits(HiveApiInputFormat.java:166)
> at
> com.facebook.giraph.hive.input.HiveApiInputFormat.getSplits(HiveApiInputFormat.java:147)
> at
> org.apache.giraph.hive.input.vertex.HiveVertexInputFormat.getSplits(HiveVertexInputFormat.java:60)
> at
> org.apache.giraph.master.BspServiceMaster.generateInputSplits(BspServiceMaster.java:314)
> at
> org.apache.giraph.master.BspServiceMaster.createInputSplits(BspServiceMaster.java:626)
> at
> org.apache.giraph.master.BspServiceMaster.createVertexInputSplits(BspServiceMaster.java:692)
>
>
>  at org.apache.giraph.master.MasterThread.run(MasterThread.java:100)
>
> Not sure if anything is missing in the job configuration.
> Can anybody help? Thanks in advance.
>
> Best Wishes,
> ~Andy
>



-- 
~Andy

Reply via email to