There are already a couple of partitioners in the codebase, check those out.
Also, keep in mind that by using fewer workers you diminish network
communication but you also decrease parallelism.
On Fri, Jul 20, 2012 at 8:52 PM, Jonathan Bishop jbishop@gmail.com wrote:
Avery,
Is there an
Giraph partitions the vertices using a hashing function that's basically
the equivalent of (hash(vertexID) mod #ofComputeNodes).
You can mitigate memory issues by starting the job with a minimum of
vertices in your file and then add them dynamically as your job progresses
(assuming that your job