Hello, I am trying to write MapReduce jobs to read data from JSON files and load it into HBase tables. Please suggest me an efficient way to do it. I am trying to do it using Spring Data Hbase Template to make it thread safe and enable table locking.
I use the Map methods to read and parse the JSON files. I use the Reduce methods to call the HBase Template and store the data into the HBase tables. My questions: 1. Is this the right approach or should I do all of the above the Map method? 2. How can I pass the Java Object I create holding the data read from the Json file to the Reduce method, which needs to be saved to the HBase table? I can only pass the inbuilt data types to the reduce method from my mapper. 3. I thought of using the distributed cache for the above problem, to store the object in the cache and pass only the key to the reduce method. But how do I generate the unique key for all the objects I store in the distributed cache. Please help me with the above. Please tell me if I am missing some detail or over looking some important detail. Thanking You, -- Regards, Ouch Whisper 010101010101