Hi I had a storm topology that reads records from kafka, extracts timestamp present in the record, and does a lookup on hbase table, apply business logic, and then updates the hbase table with latest values in the current record!!
I have written a custom hbase bolt extending BaseRichBolt, where, the code, does a lookup on the hbase table and apply some business logic on the message that has been read from kafka, and then updates the hbase table with latest data! The problem i am seeing is, some times, the bolt is receiving/processing the records in a jumbled order, due to which my application is thinking that a particular record is already processed, and ignoring the record!!! Application is not processing a serious amount of records due to this!! For Example: suppose there are two records that are read from kafka, one record belongs to 10th hour and second records belongs to 11th hour... My custom HBase bolt, processing the 11th hour record first... then reading/processing the 10th hour record later!! Because, 11th hour record is processed first, application is assuming 10th record is already processed and ignoring the 10th hour record from processing!! Can someone pls help me understand, why my custom hbase bolt is not processing the records in order it receive ? should i have to mention any additional properties to ensure, the bolt processes the records in the order it receives ? what are possible alternatives i can try to fix this ? FYI, i am using field grouping for hbase bolt, thru which i want to ensure, all the records of a particular user goes into same task!! Nevertheless to mention, thinking field grouping might causing the issue, reduces the no.of tasks for my custom hbase bolt to 1 task, still the same issue!! Wondering why hbase bolt is not reading/processing records in the order it receives !!! Please someone help me with your thoughts!! Thanks a lot. Regards, Raja.
