For the record, here is how I 'fixed' this: 1. Stop Storm, it's crashing constantly anyway. Stop sending messages to your Metron installation.
2. Export the messages from the Kafka topic that's crashing Storm so that they're not lost. In my case that's the indexing topic. I have no idea yet on how to re-ingest them. 3. Set the 'retention.ms' Kafka configuration setting to a small value, then wait a minute. The command for this is "/usr/hdp/current/kafka-broker/bin/kafka-configs.sh --zookeeper localhost:2181 --entity-type topics --alter --add-config retention.ms=1000 --entity-name indexing". 4. Make sure that the 'retention.ms' value is set: "/usr/hdp/current/kafka-broker/bin/kafka-configs.sh --zookeeper localhost:2181 --entity-type topics --describe --entity-name indexing" 5. Wait a couple of minutes, the Kafka log files should be empty. You can check this with "ls -altr /tmp/kafka-logs/indexing/" or "du -h /tmp/kafka-logs/indexing/". Replace "/tmp/kafka-logs/" with the correct path to your Kafka logs directory. In my case, there was approx. 11GB of data in the indexing topic. 6. Restore the default retention time: "/usr/hdp/current/kafka-broker/bin/kafka-configs.sh --zookeeper localhost:2181 --entity-type topics --alter --delete-config retention.ms --entity-name indexing". (7. try to re-index the lost data. I have not found a way for this yet) At this point, start Storm again. It shouldn't crash anymore as there's no data to index. Does this sound like a sound way to 'fix' these kinds of problems? I suspect that I received a big burst of logs (Kibana seems to support this) that Storm couldn't handle. Is there a way to better handle big bursts? Or a rate control mechanism of some sort? On 13-Sep-18 11:39, Vets, Laurens wrote: > 1. worker.childopts: -Xmx2048m > > 2. As in individual messages? Just small(-ish) JSON messages. A few KBytes? > > On 13-Sep-18 11:21, Casey Stella wrote: >> Two questions: >> 1. How much memory are you giving the workers for the indexing topology? >> 2. how large are the messages you're sending through? >> >> On Thu, Sep 13, 2018 at 2:00 PM Vets, Laurens <laur...@daemon.be >> <mailto:laur...@daemon.be>> wrote: >> >> Hello list, >> >> I've installed OS updates on my Metron 0.4.2 yesterday, restarted all >> nodes and now my indexing topology keeps crashing. >> >> This is what I see in the Storm UI for the indexing topology: >> >> Topology stats: >> 10m 0s 1304380 1953520 12499.833 1320 >> 3h 0m 0s 1304380 1953520 12499.833 1320 >> 1d 0h 0m 0s 1304380 1953520 12499.833 1320 >> All time 1304380 1953520 12499.833 1320 >> >> Spouts: >> kafkaSpout 1 1 1299940 1949080 12499.833 1320 >> 0 >> metron3 6702 java.lang.OutOfMemoryError: GC overhead limit >> exceeded at java.lang.Long.valueOf(Long.java:840) at >> >> org.apache.storm.kafka.spout.KafkaSpoutRetryExponentialBackoff$RetryEntryTimeStampComparator.compar >> >> Bolts: >> hdfsIndexingBolt 1 1 1800 1800 0.278 7.022 >> 1820 >> 38.633 1800 0 metron3 6702 >> java.lang.NullPointerException >> at >> org.apache.metron.writer.hdfs.SourceHandler.handle(SourceHandler.java:80) >> at org.apache.metron.writer.hdfs.HdfsWriter.write(HdfsWriter.java:113) >> at org.apache.metr Thur, 13 Sep 2018 07:35:02 >> indexingBolt 1 1 1320 1320 0.217 7.662 1300 >> 47.815 1300 0 metron3 6702 >> java.lang.OutOfMemoryError: GC >> overhead limit exceeded at >> java.util.Arrays.copyOfRange(Arrays.java:3664) at >> java.lang.String.<init>(String.java:207) at >> org.json.simple.parser.Yylex.yytext(Yylex.jav Thur, 13 Sep 2018 >> 07:37:33 >> >> When I check the Kafka topic, I can see that there's at least 3 >> million >> messages in the kafka indexing topic... I _suspect_ that the indexing >> topology tries to write those but fails, restarts, tries to write, >> fails, etc... Metron is currently not ingesting any additional >> messages, >> but also can't seem to index the current ones... >> >> Any idea on how to proceed? >>