Hi

I had a storm topology that reads records from kafka, extracts timestamp 
present in the record, and does a lookup on hbase table, apply business logic, 
and then updates the hbase table with latest values in the current record!!

I have written a custom hbase bolt extending BaseRichBolt, where, the code, 
does a lookup on the hbase table and apply some business logic on the message 
that has been read from kafka, and then updates the hbase table with latest 
data!

The problem i am seeing is, some times, the bolt is receiving/processing the 
records in a jumbled order, due to which my application is thinking that a 
particular record is already processed, and ignoring the record!!! Application 
is not processing a serious amount of records due to this!!

For Example:

suppose there are two records that are read from kafka, one record belongs to 
10th hour and second records belongs to 11th hour...

My custom HBase bolt, processing the 11th hour record first... then 
reading/processing the 10th hour record later!! Because, 11th hour record is 
processed first, application is assuming 10th record is already processed and 
ignoring the 10th hour record from processing!!

Can someone pls help me understand, why my custom hbase bolt is not processing 
the records in order it receive ?

should i have to mention any additional properties to ensure, the bolt 
processes the records in the order it receives ? what are possible alternatives 
i can try to fix this ?

FYI, i am using field grouping for hbase bolt, thru which i want to ensure, all 
the records of a particular user goes into same task!! Nevertheless to mention, 
thinking field grouping might causing the issue, reduces the no.of tasks for my 
custom hbase bolt to 1 task, still the same issue!!

Wondering why hbase bolt is not reading/processing records in the order it 
receives !!! Please someone help me with your thoughts!!

Thanks a lot.


Regards,
Raja.

Reply via email to