Thank you, I will try that. However if I set bsp.local.tasks.maximum to 1, why doesn't it distribute one task to each machine?
On Dec 5, 2012, at 11:58 AM, Thomas Jungblut wrote: > So it will spawn 12 tasks. If this doesn't satisfy the load on your > machines, try to use smaller blocksizes. > > 2012/12/5 Benedikt Elser <[email protected]> > >> Hi, >> >> thanks for your reply! >> >> Total size: 49078776 B >> Total dirs: 1 >> Total files: 12 >> Total blocks (validated): 12 (avg. block size 4089898 B) >> >> Benedikt >> >> On Dec 5, 2012, at 11:47 AM, Thomas Jungblut wrote: >> >>> So how many blocks has your data in HDFS? >>> >>> 2012/12/5 Benedikt Elser <[email protected]> >>> >>>> Hi List, >>>> >>>> I am using the hama-0.6.0 release to run graph jobs on various input >>>> graphs in a ec2 based cluster of size 12. However as I see in the logs >> not >>>> every node on the cluster contributes to that job (they have no >>>> tasklog/job<ID> dir and are idle). Theoretically a distribution of 1 >>>> Million nodes across 12 buckets should hit every node at least once. >>>> Therefore I think its a configuration problem. So far I messed around >> with >>>> these settings: >>>> >>>> <name>bsp.max.tasks.per.job</name> >>>> <name>bsp.local.tasks.maximum</name> >>>> <name>bsp.tasks.maximum</name> >>>> <name>bsp.child.java.opts</name> >>>> >>>> Setting bsp.local.tasks.maximum to 1 and bsp.tasks.maximum.per.job to 12 >>>> hat not the desired effect. I also split the input into 12 files >> (because >>>> of something in 0.5, that was fixed in 0.6). >>>> >>>> Could you recommend me some settings or guide me through the system's >>>> partition decision? I thought it would be: >>>> >>>> Input -> Input Split based on input, max* conf values -> number of tasks >>>> HashPartition.class distributes Ids across that number of tasks. >>>> >>>> Thanks, >>>> >>>> Benedikt >> >>
