Thanks Matthew for your replies! They are quite helpful. Regarding question number 4, I see a commit of PartitionContext here by Maja http://mail-archives.apache.org/mod_mbox/giraph-commits/201302.mbox/%3c20130209001122.ddad73a...@tyr.zones.apache.org%3E, but it seems to be removed from the current version?
On Wed, Feb 25, 2015 at 3:30 AM, Matthew Saltz <sal...@gmail.com> wrote: > Hi, > > 1) The partitions are processed in parallel based on the number of threads > you specify. The vertices within a partition are processed sequentially. > You may want to use more partitions than threads, that way if one partition > takes a particularly long time to be processed, the other threads can > continue processing the remaining partitions. If you have four machines > with 12 threads each for example, with one worker per machine, the default > number of partitions will be 4^2 = 16 partitions, whereas you actually have > 48 threads available, so you'd probably want to specify the number of > partitions manually to a larger number to take advantage of parallelism. > 2) Yes > 3) If you are only doing single threading, there's no reason to do > multiple partitions per worker > 3 (the second one)) I'm not familiar with the out-of-core functionality > 4) I'm not sure > > I'm basing this on the version of Giraph from this summer, not the most > recent release, but I don't think this part has changed. May want to verify > by looking at the code. > > Best, > Matthew > > On Wed, Feb 25, 2015 at 3:25 AM, Arjun Sharma <as469...@gmail.com> wrote: > >> Hi, >> >> I understand that by default, the number of partitions = number of >> workers ^ 2. So, if we have N workers, each worker will process N >> partitions. I have a number of questions: >> >> 1- By default, does Giraph process the N partitions within a single >> worker sequentially? If yes, when setting the parameter >> giraph.numComputeThreads, will partitions within each thread be computed >> sequentially? >> >> 2- By default, does Giraph keep all partitions in memory? >> >> 3- If the answers to 1 and 2 are yes and yes, is there any advantage from >> using multiple partitions versus a single partition in the case of single >> threading per worker? >> >> 3- How does the out-of-core partitions affect out-of-core messages? Are >> they completely independent? For example, if the number of partitions to be >> kept in memory is set to a number less than N, and at the same time all >> messages are set to be kept in memory, will ALL messages be kept in memory, >> even those from out-of-core partitions? If the situation is reversed, where >> all partitions are kept in memory, and out-of-core messaging is set, will >> messages from memory-based partitions be saved on disk? >> >> 4- Is there a class like a PartitionContext, where you can access >> preSuperstep and postSuperstep *per partition*, along the lines of >> WorkerContext? >> >> >