Now that I have the loading and computation completing successfully, I am having issues when saving the edges back to disk. During the saving step, the machines will get to ~1-2 partitions before the cluster freezes up entirely (as in, I can't even SSH into the machine or view the Hadoop web console).
As in my message before, I have about 1.3 billion edges total (600 million undirected, converted using the reverser) and a cluster of 19 machines, each with 8 cores and 60 GB of RAM. I am also using a custom linked-list based OutEdges class because of the computation's high number of mutations of edge values (the byte array/big data byte array was not efficient for this use case). The specific computation I am running has three supersteps (0, 1, 2), and during supersteps 1 and 2 there is extremely high RAM usage (~97%), but the steps do complete. During saving this high RAM usage is maintained and does not increase significantly until the cluster freezes up. When saving the edges (I am using a custom edge output format as well, that is basically a CSV), are they flushed to disk immediately/in batches or is the entire output file held in memory before being flushed? If the latter, this seems like it might cause the same sort of behavior I see. Also, if this is the case, is there a way this can be changed? If this doesn't seem like the issue, does anyone have any ideas what may be causing the lockup? Thanks in advance! -- Andrew