Hi All I am running the below code and its running for very long time where input to flatMapTopair is record of 50K. and I am calling Hbase for 50K times just a range scan query to should not take time. can anybody guide me what is wrong here?
JavaPairRDD<VendorRecord, Iterable<VendorRecord>> pairvendorData =matchRdd.flatMapToPair( new PairFlatMapFunction<VendorRecord, VendorRecord, VendorRecord>(){ @Override public Iterable<Tuple2<VendorRecord,VendorRecord>> call( VendorRecord t) throws Exception { List<Tuple2<VendorRecord, VendorRecord>> pairs = new LinkedList<Tuple2<VendorRecord, VendorRecord>>(); MatcherKeys matchkeys=CompanyMatcherHelper.getBlockinkeys(t); List<VendorRecord> Matchedrecords =ckdao.getMatchingRecordsWithscan(matchkeys); for(int i=0;i<Matchedrecords.size();i++){ pairs.add( new Tuple2<VendorRecord,VendorRecord>(t,Matchedrecords.get(i))); } return pairs; } } ).groupByKey(200).persist(StorageLevel.DISK_ONLY_2());