You might find these links helpful : http://stackoverflow.com/questions/10961474/how-in-hadoop-is-the-data-put-into-map-and-reduce-functions-in-correct-types/10965026#10965026 http://stackoverflow.com/questions/13877077/how-do-i-set-an-object-as-the-value-for-map-output-in-hadoop-mapreduce/13877688#13877688
HTH Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Thu, Feb 7, 2013 at 5:05 PM, Panshul Whisper <[email protected]>wrote: > Hello, > > Thank you for the reply. > 1. I cannot serialize the Json and store it as a whole. I need to extract > individual values and store them as later I need to query the stored values > in various aggregation algorithms. > 2. Can u please point me in direction where I can find out how to write a > data type to be Writable+Comparable. I will look into Avro, but I prefer to > write my owm data type. > 3. I will look into MR counters. > > Regards, > > > On Thu, Feb 7, 2013 at 12:28 PM, Mohammad Tariq <[email protected]>wrote: > >> Hello Panshul, >> >> My answers : >> 1- You can serialize the entire jSON into a byte[ ] and store it in a >> cell.(Is it important for you extract individual values from your JSON and >> then put them into the table?) >> 2- You can write your own datatype to pass your object to the reducer. >> But, it must be a Writable+Comparable. Alternatively you van use Avro. >> 3- For generating unique keys, you can use MR counters. >> >> Warm Regards, >> Tariq >> https://mtariq.jux.com/ >> cloudfront.blogspot.com >> >> >> On Thu, Feb 7, 2013 at 4:52 PM, Panshul Whisper <[email protected]>wrote: >> >>> Hello, >>> >>> I am trying to write MapReduce jobs to read data from JSON files and >>> load it into HBase tables. >>> Please suggest me an efficient way to do it. I am trying to do it using >>> Spring Data Hbase Template to make it thread safe and enable table locking. >>> >>> I use the Map methods to read and parse the JSON files. I use the Reduce >>> methods to call the HBase Template and store the data into the HBase tables. >>> >>> My questions: >>> 1. Is this the right approach or should I do all of the above the Map >>> method? >>> 2. How can I pass the Java Object I create holding the data read from >>> the Json file to the Reduce method, which needs to be saved to the HBase >>> table? I can only pass the inbuilt data types to the reduce method from my >>> mapper. >>> 3. I thought of using the distributed cache for the above problem, to >>> store the object in the cache and pass only the key to the reduce method. >>> But how do I generate the unique key for all the objects I store in the >>> distributed cache. >>> >>> Please help me with the above. Please tell me if I am missing some >>> detail or over looking some important detail. >>> >>> Thanking You, >>> >>> >>> -- >>> Regards, >>> Ouch Whisper >>> 010101010101 >>> >> >> > > > -- > Regards, > Ouch Whisper > 010101010101 >
