If you pack your images into sequence files, as the value items, the cluster
will automatically do a decent job of ensuring that the input splits made
from the sequences files are local to the map task.
We did this in production at a previous job and it worked very well for us.
Might as well turn
Sameer Tilak wrote:
>
> Hi everyone,
> I would like to use Hadoop for analyzing tens of thousands of images.
> Ideally each mapper gets few hundred images to process and I'll have few
> hundred mappers. However, I want the mapper function to run on the machine
> where its images are stored. How ca
Hi everyone,
I would like to use Hadoop for analyzing tens of thousands of images.
Ideally each mapper gets few hundred images to process and I'll have few
hundred mappers. However, I want the mapper function to run on the machine
where its images are stored. How can I achieve that. With text data