Re: Partitioning

Benedikt Elser Wed, 05 Dec 2012 03:03:29 -0800

Thank you, I will try that. However if I set bsp.local.tasks.maximum to 1, why 
doesn't it distribute one task to each machine?


On Dec 5, 2012, at 11:58 AM, Thomas Jungblut wrote:

> So it will spawn 12 tasks. If this doesn't satisfy the load on your
> machines, try to use smaller blocksizes.
> 
> 2012/12/5 Benedikt Elser <[email protected]>
> 
>> Hi,
>> 
>> thanks for your reply!
>> 
>> Total size:    49078776 B
>> Total dirs:    1
>> Total files:   12
>> Total blocks (validated):      12 (avg. block size 4089898 B)
>> 
>> Benedikt
>> 
>> On Dec 5, 2012, at 11:47 AM, Thomas Jungblut wrote:
>> 
>>> So how many blocks has your data in HDFS?
>>> 
>>> 2012/12/5 Benedikt Elser <[email protected]>
>>> 
>>>> Hi List,
>>>> 
>>>> I am using the hama-0.6.0 release to run graph jobs on various input
>>>> graphs in a ec2 based cluster of size 12. However as I see in the logs
>> not
>>>> every node on the cluster contributes to that job (they have no
>>>> tasklog/job<ID> dir and are idle). Theoretically a distribution of 1
>>>> Million nodes across 12 buckets should hit every node at least once.
>>>> Therefore I think its a configuration problem. So far I messed around
>> with
>>>> these settings:
>>>> 
>>>>  <name>bsp.max.tasks.per.job</name>
>>>>  <name>bsp.local.tasks.maximum</name>
>>>>  <name>bsp.tasks.maximum</name>
>>>>  <name>bsp.child.java.opts</name>
>>>> 
>>>> Setting bsp.local.tasks.maximum to 1 and bsp.tasks.maximum.per.job to 12
>>>> hat not the desired effect. I also split the input into 12 files
>> (because
>>>> of something in 0.5, that was fixed in 0.6).
>>>> 
>>>> Could you recommend me some settings or guide me through the system's
>>>> partition decision? I thought it would be:
>>>> 
>>>> Input -> Input Split based on input, max* conf values -> number of tasks
>>>> HashPartition.class distributes Ids across that number of tasks.
>>>> 
>>>> Thanks,
>>>> 
>>>> Benedikt
>> 
>>

Re: Partitioning

Reply via email to