Hello, Current Design: I have a java object MyObjectA. MyObjectA goes through Three processors (jars) that are run in sequence and do a lot of processing to beef up A with tons of additional stuff (think ETL) and the final result is MyObjectD (note: MyObjectD is really A with more fields if you will added to it but I wanted to clarify here that they are very different). MyObjectD when ready is saved to my non relational database (accumulo). Currently, all this is done by making use of Quartz Scheduler - a List<MyObjectA> is submitted for processing every N mintues. Everything is written in Java and there is a lot of talking back n forth with Accumulo (to access tables that will help convert A to D).
We split the processing into three processors just because it was more convenient and we didn't want everything rolled up in one processor. Having said that I can definitely merge the three into ONE processor. But my question is, what are all the things (obviously generically speaking) I need to be concerned about/ look into to make this a map reduce job? I am asking for pointers on where to even start here. Lets say, all my processing is done in mappers. So my input will be MyObjectA and my output will be MyObjectD from each mapper. And then my reducer simple writes all MyObjectD objects to accumulo. Is achieving this as easy as just submitting the jar to hadoop ???? I guess overall, I want to know how does one go about blindly submitting a .jar (java apps) and make this a map reduce task. We are going this route, because multi-threading won't solve our problem. We have to process objects in batch now and they are coming in every minute. Thank you in advance for any and all help.