Now that I have the loading and computation completing
successfully, I am having issues when saving the edges back to
disk. During the saving step, the machines will get to ~1-2
partitions before the cluster freezes up entirely (as in, I
can't even SSH into the machine or view the Hadoop web
console).



As in my message before, I have about 1.3 billion edges total
(600 million undirected, converted using the reverser) and a
cluster of 19 machines, each with 8 cores and 60 GB of RAM.



I am also using a custom linked-list based OutEdges class
because of the computation's high number of mutations of edge
values (the byte array/big data byte array was not efficient
for this use case).



The specific computation I am running has three supersteps (0,
1, 2), and during supersteps 1 and 2 there is extremely high
RAM usage (~97%), but the steps do complete. During saving this
high RAM usage is maintained and does not increase
significantly until the cluster freezes up.



When saving the edges (I am using a custom edge output format
as well, that is basically a CSV), are they flushed to disk
immediately/in batches or is the entire output file held in
memory before being flushed? If the latter, this seems like it
might cause the same sort of behavior I see. Also, if this is
the case, is there a way this can be changed?



If this doesn't seem like the issue, does anyone have any ideas
what may be causing the lockup?



Thanks in advance!



--
Andrew

Reply via email to