Hi Edward,
Thank you for the reply.
But I want the opposite: I want to create more tasks than blocks, not fewer tasks than blocks. That is, I want to be able to send less than one block to each task (for example, only 10000 bytes). Sending less data to a task will speed-up execution and will require less memory at each node. Hadoop map-reduce, Spark, and Flink allow you to use a split size smaller than a block. Also, I used to be able to do this with Hama 0.5.0 but not with Hama 0.6.4. Did you remove this capability because it is a bad idea or because it is very hard to implement?

Based on your instructions, I tried the following:

    job.setNumBspTask(10);
    job.setBoolean("bsp.input.runtime.partitioning",false);
job.setPartitioner(org.apache.hama.bsp.HashPartitioner.class);

I get the following error:

java.lang.ArrayIndexOutOfBoundsException: 1
    at org.apache.hama.bsp.BSPJobClient.writeSplits(BSPJobClient.java:556)
at org.apache.hama.bsp.BSPJobClient.submitJobInternal(BSPJobClient.java:354)
    at org.apache.hama.bsp.BSPJobClient.submitJob(BSPJobClient.java:296)
    at org.apache.hama.bsp.BSPJob.submit(BSPJob.java:219)
    at org.apache.hama.bsp.BSPJob.waitForCompletion(BSPJob.java:226)

Thanks.
Leonidas


On 10/20/2014 10:06 AM, Edward J. Yoon wrote:
Hi Leonidas,

The bsp.min.split.size property is used to prevent to create too many tasks, like Hadoop MR (NOTE: if bsp.min.split.size is less than block size then 1 block is sent to each task).

I guess this will work fine. BTW, if you set the input partitioner then input partitioner creates the new partitions as you specified in the setNumBspTask() method (graph job pre-processes the (hash) input partition by default).

Thanks.

--
Best Regards, Edward J. Yoon
Chief Executive Officer
DataSayer Co., Ltd.

2014. 10. 20., 오후 10:51, Leonidas Fegaras <[email protected] <mailto:[email protected]>> 작성:

Dear Hama developers,
I still have a problem setting the split size of an HDFS input file using Hama 0.6.4. For example, when I use:

BSPJob job = new BSPJob(conf,BSPop.class);
job.setNumBspTask(10);
job.setLong("bsp.min.split.size",10000L);   // 10000 bytes

For a small file with 2 blocks, this will use only 2 BSP tasks (one for each block), instead of 10.
This used to work in Hama 0.5.0.
Any suggestions?
Thanks.
Leonidas Fegaras

On 01/04/2013 05:45 PM, Edward J. Yoon wrote:
Hello,

than a block. But if you have more nodes in your cluster than data blocks, you may get faster execution if you allow splits smaller than a block. Is
You're right. So, we're working on partitioning issues now.

you may get faster execution if you allow splits smaller than a block. Is
there any way to use splits smaller than a block in Hama 0.6.0?
Yes. But, Hama 0.6.1 version will support it.

On Sat, Jan 5, 2013 at 4:59 AM, Leonidas Fegaras <[email protected] <mailto:[email protected]>> wrote:
Dear Hama developers,
It seems that the splits generated by the FileInputFormat in Hama 0.6.0
cannot be smaller than a block. In Hama 0.5.0, I could set any split size
using  job.set("bsp.min.split.size",...) and set the task numbers using
job.setNumBspTask(...). This is ignored by Hama 0.6.0 for a split smaller than a block. But if you have more nodes in your cluster than data blocks, you may get faster execution if you allow splits smaller than a block. Is
there any way to use splits smaller than a block in Hama 0.6.0?
Thanks for your help,
Leonidas






Reply via email to