i have solution integrating spring beans and spring batch directly into hadoop core. its far more advanced then spring data hadoop support with pojo patch.

in my solution every component of mapreduce can be hadoop bean. You will get spring batch integrated directly into mapper, which means that you can run multiple steps in one mapper pass and because of async write done by spring batch you will get about 3x higher write performance. i have rewriten HDFS which has way faster writes. Spring batch component replaces standard hadoop job manager (that thing with web gui) and spring integration is used for advanced stuff like multiresource scheduling. You can write simple java bean for every new resource you want to add into system and another bean for logic for assigning jobs based on that particular resource.

I submitted few patches to hadoop but they were not interesting enough to get into core. If you want to buy my hadoop with integrated spring, let me know.

Reply via email to