Hey,
if you read closely:
http://wiki.apache.org/hama/WriteHamaGraphFile#Google_Web_dataset_.28local_mode.2C_pseudo_distributed_cluser.29
You find that there is a property called "hama.graph.repair":
// hama takes care that the graph is complete
pageJob.set("hama.graph.repair", "true");
This basically sends messages along the known edges and adds vertices
if there aren't any on the "other side".
If this isn't to scalable for you, then a preprocessing mapreduce job
is fine, where you emit the vertex id as key along with the complete
edge list as value, also the edge keys with an empty value.
In the reducer you should get either multiple complete lines or empty values.
In the case you get only an empty value, you know that this vertex
wasn't included in the dataset and you can repair by emitting it in
the reducer as single line.
2012/9/19 Sandy Ding <[email protected]>:
> Hi, guys,
>
> The web-google dataset seems to miss some key sites, for example, there is
> no entry starting with 111067.
> This leads to weird NullPointerException. How do you fix this?
>
> Cheers,
> Sandy