i have solution integrating spring beans and spring batch directly into
hadoop core. its far more advanced then spring data hadoop support with
pojo patch.
in my solution every component of mapreduce can be hadoop bean. You will
get spring batch integrated directly into mapper, which means that you
can run multiple steps in one mapper pass and because of async write
done by spring batch you will get about 3x higher write performance. i
have rewriten HDFS which has way faster writes. Spring batch component
replaces standard hadoop job manager (that thing with web gui) and
spring integration is used for advanced stuff like multiresource
scheduling. You can write simple java bean for every new resource you
want to add into system and another bean for logic for assigning jobs
based on that particular resource.
I submitted few patches to hadoop but they were not interesting enough
to get into core. If you want to buy my hadoop with integrated spring,
let me know.