Re: Partitioning

Thomas Jungblut Wed, 05 Dec 2012 02:48:01 -0800

So how many blocks has your data in HDFS?

2012/12/5 Benedikt Elser <[email protected]>


> Hi List,
>
> I am using the hama-0.6.0 release to run graph jobs on various input
> graphs in a ec2 based cluster of size 12. However as I see in the logs not
> every node on the cluster contributes to that job (they have no
> tasklog/job<ID> dir and are idle). Theoretically a distribution of 1
> Million nodes across 12 buckets should hit every node at least once.
> Therefore I think its a configuration problem. So far I messed around with
> these settings:
>
>    <name>bsp.max.tasks.per.job</name>
>    <name>bsp.local.tasks.maximum</name>
>    <name>bsp.tasks.maximum</name>
>    <name>bsp.child.java.opts</name>
>
> Setting bsp.local.tasks.maximum to 1 and bsp.tasks.maximum.per.job to 12
> hat not the desired effect. I also split the input into 12 files (because
> of something in 0.5, that was fixed in 0.6).
>
> Could you recommend me some settings or guide me through the system's
> partition decision? I thought it would be:
>
> Input -> Input Split based on input, max* conf values -> number of tasks
> HashPartition.class distributes Ids across that number of tasks.
>
> Thanks,
>
> Benedikt

Re: Partitioning

Reply via email to