Hi All I have an JavaPairRDD<Long,String> where each long key have 4 string values associated with it. I want to fire the Hbase query for look up of the each String part of RDD. This look-up will give result of around 7K integers.so for each key I will have 7k values. now my input RDD always already more than GB and after getting these result it will become around 50 GB which I want avoid .
My problem. <1, Test1> <1,test2> <1.test3> <1, test4> ....... ............. Now I will query Hbase for Test1, test2 test3 ,test4 in parallel ech query will give result around 2K so total 8k of integers. Now for each record I will have 1*8000 entries in my RDD and suppose I have 1 million record it will become 1 million*8000 will is huge to process even using GroupBy.