There really are two issues here that you need to tackle. This first is how to track the number of retries in a message. The second is what is the best way to move the failed messages to a database. For the first one how hard is the X retry requirement? For example if the spout crashes we can do up to 2 X retries would that be acceptable? If it is a very hard requirement then you will need to track it in a highly consistent database. If it is not a hard requirement I would just track it in memory in the spout. I actually have done this several times with spouts I have written where I put the retry number in the messageId the spout hands to the output collector. https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/spout/ISpoutOutputCollector.java#L28 This makes it simple because in fail, I can just look at count right there and decide what to do with it. For the second part, do you ever see the error processing changing beyond the current requirement to put it in a database? If you think there may be complex processing in the future or writing to the DB might be expensive I would suggest that you publish the failed messages to kafka, and then let a connector or something else like that publish the messages to the DB. Otherwise just write them to the DB directly from the spout. I would suggest that you add in some metrics to try and track how long the spout is spending writing failed messages into the DB though.
- Bobby On Monday, March 13, 2017, 6:03:33 AM CDT, Janardhanan, Kailas (Contractor) <kailas.janardhan...@nordstrom.com> wrote:#yiv6656075349 #yiv6656075349 -- filtered {panose-1:2 4 5 3 5 4 6 3 2 4;}#yiv6656075349 filtered {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;}#yiv6656075349 filtered {font-family:Tahoma;panose-1:2 11 6 4 3 5 4 4 2 4;}#yiv6656075349 p.yiv6656075349MsoNormal, #yiv6656075349 li.yiv6656075349MsoNormal, #yiv6656075349 div.yiv6656075349MsoNormal {margin:0in;margin-bottom:.0001pt;font-size:11.0pt;}#yiv6656075349 a:link, #yiv6656075349 span.yiv6656075349MsoHyperlink {color:blue;text-decoration:underline;}#yiv6656075349 a:visited, #yiv6656075349 span.yiv6656075349MsoHyperlinkFollowed {color:purple;text-decoration:underline;}#yiv6656075349 p.yiv6656075349MsoAcetate, #yiv6656075349 li.yiv6656075349MsoAcetate, #yiv6656075349 div.yiv6656075349MsoAcetate {margin:0in;margin-bottom:.0001pt;font-size:8.0pt;}#yiv6656075349 p.yiv6656075349MsoListParagraph, #yiv6656075349 li.yiv6656075349MsoListParagraph, #yiv6656075349 div.yiv6656075349MsoListParagraph {margin-top:0in;margin-right:0in;margin-bottom:0in;margin-left:.5in;margin-bottom:.0001pt;font-size:11.0pt;}#yiv6656075349 span.yiv6656075349BalloonTextChar {}#yiv6656075349 span.yiv6656075349EmailStyle20 {color:windowtext;}#yiv6656075349 span.yiv6656075349EmailStyle21 {color:#1F497D;}#yiv6656075349 span.yiv6656075349EmailStyle22 {color:#1F497D;}#yiv6656075349 .yiv6656075349MsoChpDefault {font-size:10.0pt;}#yiv6656075349 filtered {margin:1.0in 1.0in 1.0in 1.0in;}#yiv6656075349 div.yiv6656075349WordSection1 {}#yiv6656075349 filtered {}#yiv6656075349 filtered {}#yiv6656075349 filtered {}#yiv6656075349 filtered {}#yiv6656075349 filtered {}#yiv6656075349 filtered {}#yiv6656075349 filtered {}#yiv6656075349 filtered {}#yiv6656075349 filtered {}#yiv6656075349 filtered {}#yiv6656075349 ol {margin-bottom:0in;}#yiv6656075349 ul {margin-bottom:0in;}#yiv6656075349 Hello All, We are working on a storm cluster which is listening from a kafka queue. Our requirement is to do retry on bolt execution(on exception) for a specified number of times. After these retries the messages should be moved to Database. In the previous release we were doing a fieldsGrouping and maintaining a hash map to track the retry count but this was causing major performance impact. We will have to do shuffleGrouping for maximum throughput. What is the best possible way to implement this. The following options are considered. 1) Maintain the retry count status in database(might be a performance issue) 2) Maintain the retry count status in external caches( like elasticache in Amazon AWS), however it might be expensive 3) Maintain the retry count status in message itself and repost the message to queue. Also ignoring the exception occurred in bolt to make in success. We are not considering exponential retry inside bolt itself as it might also cause performance issues. Please advice which one is best and if there is a better way to do detail the same. Thanks and Regards Kailas J C RCL | Infosys Trivandrum Office : +91-4713024230 Mob : +91-9497891540