Re: How to specify parameters in order to run giraph job in parallel

2013-10-19 Thread Claudio Martella
how many mapper tasks do you have set for each node? how many workers are you using for giraph? On Fri, Oct 18, 2013 at 7:12 PM, YAN Da ya...@ust.hk wrote: Dear Claudio Martella, I don't quite get what you mean. Our cluster has 15 servers each with 24 cores, so ideally there can be 15*24

Re: How to specify parameters in order to run giraph job in parallel

2013-10-18 Thread YAN Da
Dear Claudio Martella, According to https://reviews.apache.org/r/7990/diff/?page=2, Giraph currently organize vertices as byte streams, probabily pages. In the url, This also significantly reduces GC time, as there are less objects to GC. Why there's also there? I mean, is reducing GC time the

Re: How to specify parameters in order to run giraph job in parallel

2013-10-18 Thread Sebastian Schelter
Da, Holding objects in serialized form as bytes in byte arrays consumes much less memory than holding them as Java objects (which have a huge overhead), I think that is the other main reason for serialization. --sebastian On 18.10.2013 19:28, YAN Da wrote: Dear Claudio Martella, According

Re: How to specify parameters in order to run giraph job in parallel

2013-10-17 Thread Claudio Martella
It actually depends on the setup of your cluster. Ideally, with 15 nodes (tasktrackers) you'd want 1 mapper slot per node (ideally to run giraph), so that you would have 14 workers, one per computing node, plus one for master+zookeeper. Once that is reached, you would have a number of compute