Partitioning

Benedikt Elser Wed, 05 Dec 2012 02:44:31 -0800

Hi List,

I am using the hama-0.6.0 release to run graph jobs on various input graphs in 
a ec2 based cluster of size 12. However as I see in the logs not every node on 
the cluster contributes to that job (they have no tasklog/job<ID> dir and are 
idle). Theoretically a distribution of 1 Million nodes across 12 buckets should 
hit every node at least once. Therefore I think its a configuration problem. So 
far I messed around with these settings:


   <name>bsp.max.tasks.per.job</name>
   <name>bsp.local.tasks.maximum</name>
   <name>bsp.tasks.maximum</name>
   <name>bsp.child.java.opts</name>

Setting bsp.local.tasks.maximum to 1 and bsp.tasks.maximum.per.job to 12 hat 
not the desired effect. I also split the input into 12 files (because of 
something in 0.5, that was fixed in 0.6). 

Could you recommend me some settings or guide me through the system's partition 
decision? I thought it would be:

Input -> Input Split based on input, max* conf values -> number of tasks
HashPartition.class distributes Ids across that number of tasks.

Thanks,

Benedikt

Partitioning

Reply via email to