HiveGiraphRunner - SanityCheck fails due to missing partition of output table?

RainShine79 Thu, 16 Oct 2014 04:59:20 -0700

  Hello fellow coders,

i am currently trying to start a standart giraph job (PageRankVertex
) on a set of vertexes and edges drawn from hive tables. For this i use the 
class HiveGiraphRunner
 as it is recommended at different tutorials.


As required i started a thrift-server, implemented the abstract classes which 
convert table data to vertexes or edges and vice versa. I also created the 
tables test1
 (containing all vertexes), test2
(containing all edges) and test3
 (being an empty 2-column table: containing one String to store the vertexIds 
and another Double to store the page rank values).

The final command i used to start the page rank job looks like this (except for 
the newlines which i added for better readability):

hadoop jar giraph-hive-1.0.0.jar org.apache.giraph.hive.HiveGiraphRunner
 -libjars ~/giraph/giraph-examples-1.0.0.jar 
 -vertexClass org.apache.giraph.examples.PageRankVertex 
 -hiveToVertexClass org.apache.giraph.hive.input.vertex.MyHiveToVertexImpl 
 -hiveToEdgeClass org.apache.giraph.hive.input.edge.MyHiveToEdgeImpl 
 -vertexToHiveClass org.apache.giraph.hive.output.MyVertexToHiveImpl 
 -w 5 -vi test1 -ei test2 -o test3 
 -hiveconf hive.metastore.uris=thrift://localhost:10000 
 -hiveconf 
javax.jdo.option.ConnectionURL=jdbc:derby:;databaseName=metastore_db;create=true

The result i get looks like this:

14/10/16 08:22:49 INFO hive.metastore: Trying to connect to metastore with URI 
thrift://localhost:10000
14/10/16 08:22:49 INFO hive.metastore: Waiting 1 seconds before next connection 
attempt.
14/10/16 08:22:50 INFO hive.metastore: Connected to metastore.
Exception in thread "main" java.lang.NullPointerException
        at 
com.facebook.giraph.hive.output.HiveOutputDescription.numPartitionValues(HiveOutputDescription.java:106)
        at 
com.facebook.giraph.hive.output.HiveApiOutputFormat.sanityCheck(HiveApiOutputFormat.java:185)
        at 
com.facebook.giraph.hive.output.HiveApiOutputFormat.initProfile(HiveApiOutputFormat.java:142)
        at 
org.apache.giraph.hive.HiveGiraphRunner.setupHiveOutput(HiveGiraphRunner.java:282)
        at 
org.apache.giraph.hive.HiveGiraphRunner.run(HiveGiraphRunner.java:236)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at 
org.apache.giraph.hive.HiveGiraphRunner.main(HiveGiraphRunner.java:212)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

The exception suggests, that the output table test3
 needs to be partitioned, but i do not understand why and in what way, since 
using a partition only makes inserting data more complex as it requires 
additional information.

Does there have to be a special kind of partition for the output table? And how 
will giraph be able to choose a partition to insert the output data into?

As i also put this question up at StackOverflow, where the text is better 
formatted, i give you the link to the question: 

http://stackoverflow.com/questions/26401418/hivegiraphrunner-sanitycheck-fails-due-to-missing-partition-of-output-table





Thanks for your help in advance!

R.






Sent with Unibox

HiveGiraphRunner - SanityCheck fails due to missing partition of output table?

Reply via email to