Hello fellow coders, i am currently trying to start a standart giraph job (PageRankVertex ) on a set of vertexes and edges drawn from hive tables. For this i use the class HiveGiraphRunner as it is recommended at different tutorials.
As required i started a thrift-server, implemented the abstract classes which convert table data to vertexes or edges and vice versa. I also created the tables test1 (containing all vertexes), test2 (containing all edges) and test3 (being an empty 2-column table: containing one String to store the vertexIds and another Double to store the page rank values). The final command i used to start the page rank job looks like this (except for the newlines which i added for better readability): hadoop jar giraph-hive-1.0.0.jar org.apache.giraph.hive.HiveGiraphRunner -libjars ~/giraph/giraph-examples-1.0.0.jar -vertexClass org.apache.giraph.examples.PageRankVertex -hiveToVertexClass org.apache.giraph.hive.input.vertex.MyHiveToVertexImpl -hiveToEdgeClass org.apache.giraph.hive.input.edge.MyHiveToEdgeImpl -vertexToHiveClass org.apache.giraph.hive.output.MyVertexToHiveImpl -w 5 -vi test1 -ei test2 -o test3 -hiveconf hive.metastore.uris=thrift://localhost:10000 -hiveconf javax.jdo.option.ConnectionURL=jdbc:derby:;databaseName=metastore_db;create=true The result i get looks like this: 14/10/16 08:22:49 INFO hive.metastore: Trying to connect to metastore with URI thrift://localhost:10000 14/10/16 08:22:49 INFO hive.metastore: Waiting 1 seconds before next connection attempt. 14/10/16 08:22:50 INFO hive.metastore: Connected to metastore. Exception in thread "main" java.lang.NullPointerException at com.facebook.giraph.hive.output.HiveOutputDescription.numPartitionValues(HiveOutputDescription.java:106) at com.facebook.giraph.hive.output.HiveApiOutputFormat.sanityCheck(HiveApiOutputFormat.java:185) at com.facebook.giraph.hive.output.HiveApiOutputFormat.initProfile(HiveApiOutputFormat.java:142) at org.apache.giraph.hive.HiveGiraphRunner.setupHiveOutput(HiveGiraphRunner.java:282) at org.apache.giraph.hive.HiveGiraphRunner.run(HiveGiraphRunner.java:236) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.giraph.hive.HiveGiraphRunner.main(HiveGiraphRunner.java:212) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) The exception suggests, that the output table test3 needs to be partitioned, but i do not understand why and in what way, since using a partition only makes inserting data more complex as it requires additional information. Does there have to be a special kind of partition for the output table? And how will giraph be able to choose a partition to insert the output data into? As i also put this question up at StackOverflow, where the text is better formatted, i give you the link to the question: http://stackoverflow.com/questions/26401418/hivegiraphrunner-sanitycheck-fails-due-to-missing-partition-of-output-table Thanks for your help in advance! R. Sent with Unibox