Hi folks,

Currently, we are going to use Giraph to replace some graph processing in
our Hive workflow.
We did't use HiveGiraphRunner to submit the job directly, but customize and
submit it in our own program.

However, after the job is submitted to hadoop, NPE encountered when
HiveApiInputFormat is computing the InputSplits:

Below is the scala code snippet about the job configuration:

==================================================

val hive_config_copy = new HiveConf(hive_config)
val workers = 1
val dbName = "default"
val edgeInputTableStr = "transitMatrix"
val vertexInputTableStr = "initialRank"
val vertexOutputTableStr = "twitterRank"

HIVE_TO_VERTEX_CLASS.set(hive_config_copy, classOf[InitialRankToVertex])
HIVE_TO_EDGE_CLASS.set(hive_config_copy, classOf[TransitMatrixToEdge])
hive_config_copy.setClass(HiveVertexWriter.VERTEX_TO_HIVE_KEY,
classOf[TRVertexToHive],
classOf[VertexToHive[Text, DoubleWritable, Writable]])

val job = new GiraphJob(hive_config_copy, getClass().getName())
var giraphConf = job.getConfiguration()
giraphConf.setVertexClass(classOf[TwitterRankVertex])

var hiveVertexInputDescription = new HiveInputDescription()
var hiveEdgeInputDescription = new HiveInputDescription()
var hiveOutputDescription = new HiveOutputDescription()

/**
* Initialize hive input db and tables
*/

hiveVertexInputDescription.setDbName(dbName)
hiveEdgeInputDescription.setDbName(dbName)
hiveOutputDescription.setDbName(dbName)
hiveEdgeInputDescription.setTableName(edgeInputTableStr)
hiveVertexInputDescription.setTableName(vertexInputTableStr)
hiveOutputDescription.setTableName(vertexOutputTableStr)


/**
* Initialize the hive input settings
*/

hiveVertexInputDescription.setNumSplits(HIVE_VERTEX_SPLITS.get(giraphConf))
HiveApiInputFormat.setProfileInputDesc(giraphConf,
hiveVertexInputDescription, VERTEX_INPUT_PROFILE_ID)
giraphConf.setVertexInputFormatClass(classOf[HiveVertexInputFormat[Text,
DoubleWritable, Writable]])
HiveTableSchemas.put(giraphConf,
VERTEX_INPUT_PROFILE_ID,hiveVertexInputDescription.hiveTableName())

hiveEdgeInputDescription.setNumSplits(HIVE_EDGE_SPLITS.get(giraphConf))
HiveApiInputFormat.setProfileInputDesc(giraphConf,
hiveEdgeInputDescription,EDGE_INPUT_PROFILE_ID)
giraphConf.setEdgeInputFormatClass(classOf[HiveEdgeInputFormat[Text,
DoubleWritable]]);
HiveTableSchemas.put(giraphConf,
EDGE_INPUT_PROFILE_ID,hiveEdgeInputDescription.hiveTableName())

/**
* Initialize the hive output settings
*/

HiveApiOutputFormat.initProfile(giraphConf,
hiveOutputDescription,VERTEX_OUTPUT_PROFILE_ID)
giraphConf.setVertexOutputFormatClass(classOf[HiveVertexOutputFormat[Text,
DoubleWritable, Writable]])
HiveTableSchemas.put(giraphConf,
VERTEX_OUTPUT_PROFILE_ID,hiveOutputDescription.hiveTableName())

/**
* Set number of workers
*/

giraphConf.setWorkerConfiguration(workers, workers, 100.0f)

/**
* Run the job
*/

if (job.run(true)) return true else return false


=========================================

Here are the task logs:

2013-11-16 12:19:19,032 INFO
com.facebook.giraph.hive.input.HiveApiInputFormat: getSplits for profile
vertex_input_profile
2013-11-16 12:19:19,034 WARN org.apache.hadoop.hive.conf.HiveConf:
hive-site.xml not found on CLASSPATH
2013-11-16 12:19:19,161 INFO org.apache.hadoop.mapred.FileInputFormat:
Total input paths to process : 1
2013-11-16 12:19:19,164 ERROR org.apache.giraph.master.MasterThread:
masterThread: Master algorithm failed with NullPointerException
java.lang.NullPointerException
at
org.apache.hadoop.mapred.TextInputFormat.isSplitable(TextInputFormat.java:42)
at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:232)
at
com.facebook.giraph.hive.input.HiveApiInputFormat.computeSplits(HiveApiInputFormat.java:183)
at
com.facebook.giraph.hive.input.HiveApiInputFormat.getSplits(HiveApiInputFormat.java:166)
at
com.facebook.giraph.hive.input.HiveApiInputFormat.getSplits(HiveApiInputFormat.java:147)
at
org.apache.giraph.hive.input.vertex.HiveVertexInputFormat.getSplits(HiveVertexInputFormat.java:60)
at
org.apache.giraph.master.BspServiceMaster.generateInputSplits(BspServiceMaster.java:314)
at
org.apache.giraph.master.BspServiceMaster.createInputSplits(BspServiceMaster.java:626)
at
org.apache.giraph.master.BspServiceMaster.createVertexInputSplits(BspServiceMaster.java:692)


 at org.apache.giraph.master.MasterThread.run(MasterThread.java:100)

Not sure if anything is missing in the job configuration.
Can anybody help? Thanks in advance.

Best Wishes,
~Andy

Reply via email to