Hi,
I believe the answer to your question is yes, though I've never done it. If
you use only the edge reader, only the vertices in your graph that have at
least one edge attached to them will be present in your graph. So, if you
have vertices that are entirely disconnected that you want included,
Hi Ralph,
you can set a vertex or edge input format when running a Giraph job.
In the example, you used the vertex input format (vif)
-vif
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
Your wikitalk input format is an edge list and Giraph offers, e.g.,
I was able to increase the counters limit with: Counters.MAX_COUNTER_LIMIT
= 2024 (works for hadoop_1 and hadoop 1.2.1).
Then it turned out that whatever limit I set, it is always exceeded.
It turned out that for some reason IntOverwriteAggregator that
SccPhaseMasterCompute uses to propagate
Gang:
I am getting further with my attempt to get Giraph running on a YARN cluster,
but now I'm stuck at this error:
Could not find or load main class org.apache.giraph.yarn.GiraphApplicationMaster
I've tried everything I can find in previous messages on this topic,
to no avail. My command line
And finally, success is at hand! This is a bit quirky, but here's what
fixed it:
My command line originally looked like this:
$ hadoop jar
/home/prhodes/giraph/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.5.2-jar-with-dependencies.jar
org.apache.giraph.GiraphRunner
OK, this was easy enough to fix, once I understood what
was actually happening. Since I'm running on EC2 nodes on
AWS, it is not the case that any give node can talk to any other
node on any port (at least not by default). I had tried to
cherry-pick which ports to whitelist in the security