Hey Ryan, 

Thank you for that information. The problem is that we don't know how
many logs we'll be ingesting and we wanted to scale out when we got into
problems... 

What would the minimum requirements be for a simple cluster ingesting
maybe 10000-25000 messages a day or so... 

On 2017-08-14 15:33, Ryan Merriman wrote:

> Laurens, 
> 
> 2 nodes with 32G of RAM is really small considering all the different 
> components included with Metron.  Assuming this is a demo or POC cluster, 
> otherwise it's WAY under-sized.  This isn't specific to Metron by the way, it 
> applies to any HDP (or similar distribution) cluster.  You will likely need 
> to tune everything down in terms of memory, similar to what we do with full 
> dev.  You can reference those settings here:  
> https://github.com/apache/metron/blob/master/metron-deployment/roles/ambari_config/vars/single_node_vm.yml.
>   You would also need to lower the HDFS replication. 
> 
> How are your services distributed across the 2 nodes?  Many services (Storm 
> and Kafka for example) are not under YARN control and are not aware of other 
> running services and the resources they are consuming.  Since you say one 
> node is overloaded and one is barely utilized, I would first look at 
> redistributing your services so that the load is more balanced.  You would 
> almost certainly want ES and Storm on different nodes. 
> 
> Ryan 
> 
> On Mon, Aug 14, 2017 at 5:08 PM, Laurens Vets <laur...@daemon.be> wrote:
> 
> I don't think those are the problem, but I've heightened them anyways: 
> 
> Before:
> [storm@metron1 ~]$ ulimit -a
> core file size          (blocks, -c) 0
> data seg size           (kbytes, -d) unlimited
> scheduling priority             (-e) 0
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 128354
> max locked memory       (kbytes, -l) 64
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 32768
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> real-time priority              (-r) 0
> stack size              (kbytes, -s) 10240
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 257597
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited
> [storm@metron1 ~]$
> 
> After:
> [storm@metron1 ~]$ ulimit -a
> core file size          (blocks, -c) 0
> data seg size           (kbytes, -d) unlimited
> scheduling priority             (-e) 0
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 128354
> max locked memory       (kbytes, -l) 64
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 1048576
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> real-time priority              (-r) 0
> stack size              (kbytes, -s) 10240
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 1048576
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited
> [storm@metron1 ~]$ 
> 
> I had an issue with setting them to unlimited, so changed this to 1048576. 
> Currently, for the storm account I see: 
> 
> [root@metron1 flux]# lsof -u storm | wc -l
> 1655
> [root@metron1 flux]# 
> 
> Any idea why memory might still be an issue?
> 
> On 2017-08-14 09:57, zeo...@gmail.com wrote: 
> 
> Try increasing nofile and nproc for your storm service account. 
> 
> Jon 
> 
> On Mon, Aug 14, 2017, 12:46 Laurens Vets <laur...@daemon.be> wrote: Hi List,
> 
> I'm seeing the following errors in our indexing topology:
> 
> kafkaSpout:
> java.lang.OutOfMemoryError: GC overhead limit exceeded at
> org.apache.kafka.common.utils.Utils.toArray(Utils.java:272) at
> org.apache.kafka.common.utils.Utils.toArray(Utils.java:265) at
> org.apache.kafka.clients.consumer.internals.Fetcher.parseRecord(Fetcher.java:626)
> at
> org.apache.kafka.clients.consumer.internals.Fetcher.parseFetchedData(Fetcher.java:548)
> at
> org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:354)
> at
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1000)
> at
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:938)
> at
> org.apache.storm.kafka.spout.KafkaSpout.pollKafkaBroker(KafkaSpout.java:286)
> at
> org.apache.storm.kafka.spout.KafkaSpout.nextTuple(KafkaSpout.java:224)
> at
> org.apache.storm.daemon.executor$fn__6505$fn__6520$fn__6551.invoke(executor.clj:651)
> at org.apache.storm.util$async_loop$fn__554.invoke(util.clj:484) at
> clojure.lang.AFn.run(AFn.java:22) at
> java.lang.Thread.run(Thread.java:745)
> 
> java.lang.OutOfMemoryError: GC overhead limit exceeded at
> java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) at
> java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93)
> at
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
> at
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:154)
> at
> org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:135)
> at
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:323)
> at org.apache.kafka.common.network.Selector.poll(Selector.java:283) at
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260) at
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
> at
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
> at
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:201)
> at
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:999)
> at
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:938)
> at
> org.apache.storm.kafka.spout.KafkaSpout.pollKafkaBroker(KafkaSpout.java:286)
> at
> org.apache.storm.kafka.spout.KafkaSpout.nextTuple(KafkaSpout.java:224)
> at
> org.apache.storm.daemon.executor$fn__6505$fn__6520$fn__6551.invoke(executor.clj:651)
> at org.apache.storm.util$async_loop$fn__554.invoke(util.clj:484) at
> clojure.lang.AFn.run(AFn.java:22) at
> java.lang.Thread.run(Thread.java:745)
> 
> hdfsIndexingBolt:
> java.lang.Exception: WARNING: Default and (likely) unoptimized writer
> config used for hdfs writer and sensor cloudtrail at
> org.apache.metron.writer.bolt.BulkMessageWriterBolt.execute(BulkMessageWriterBolt.java:115)
> at
> org.apache.storm.daemon.executor$fn__6573$tuple_action_fn__6575.invoke(executor.clj:734)
> at
> org.apache.storm.daemon.executor$mk_task_receiver$fn__6494.invoke(executor.clj:466)
> at
> org.apache.storm.disruptor$clojure_handler$reify__6007.onEvent(disruptor.clj:40)
> at
> org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:451)
> at
> org.apache.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:430)
> at
> org.apache.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:73)
> at
> org.apache.storm.daemon.executor$fn__6573$fn__6586$fn__6639.invoke(executor.clj:853)
> at org.apache.storm.util$async_loop$fn__554.invoke(util.clj:484) at
> clojure.lang.AFn.run(AFn.java:22) at
> java.lang.Thread.run(Thread.java:745)
> 
> java.lang.OutOfMemoryError: GC overhead limit exceeded at
> java.util.Arrays.copyOf(Arrays.java:3236) at
> sun.misc.Resource.getBytes(Resource.java:117) at
> java.net.URLClassLoader.defineClass(URLClassLoader.java:462) at
> java.net.URLClassLoader.access$100(URLClassLoader.java:73) at
> java.net.URLClassLoader$1.run(URLClassLoader.java:368) at
> java.net.URLClassLoader$1.run(URLClassLoader.java:362) at
> java.security.AccessController.doPrivileged(Native Method) at
> java.net.URLClassLoader.findClass(URLClassLoader.java:361) at
> java.lang.ClassLoader.loadClass(ClassLoader.java:424) at
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at
> java.lang.ClassLoader.loadClass(ClassLoader.java:357) at
> org.apache.metron.common.error.MetronError.addStacktrace(MetronError.java:120)
> at
> org.apache.metron.common.error.MetronError.getJSONObject(MetronError.java:99)
> at
> org.apache.metron.common.utils.ErrorUtils.handleError(ErrorUtils.java:94)
> at
> org.apache.metron.writer.BulkWriterComponent.error(BulkWriterComponent.java:81)
> at
> org.apache.metron.writer.BulkWriterComponent.write(BulkWriterComponent.java:152)
> at
> org.apache.metron.writer.bolt.BulkMessageWriterBolt.execute(BulkMessageWriterBolt.java:117)
> at
> org.apache.storm.daemon.executor$fn__6573$tuple_action_fn__6575.invoke(executor.clj:734)
> at
> org.apache.storm.daemon.executor$mk_task_receiver$fn__6494.invoke(executor.clj:466)
> at
> org.apache.storm.disruptor$clojure_handler$reify__6007.onEvent(disruptor.clj:40)
> at
> org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:451)
> at
> org.apache.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:430)
> at
> org.apache.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:73)
> at
> org.apache.storm.daemon.executor$fn__6573$fn__6586$fn__6639.invoke(executor.clj:853)
> at org.apache.storm.util$async_loop$fn__554.invoke(util.clj:484) at
> clojure.lang.AFn.run(AFn.java:22) at
> java.lang.Thread.run(Thread.java:745)
> 
> Some backgroud information:
> We're currently using Metron on 2 EC2 nodes (32GB RAM, 8 cores) and only
> changed the following default options:
> worker.childopts: -Xmx4096m.
> topology.acker.executors: from "null" to 1.
> logviewer.childopts: from "-Xmx128m" to "-Xmx1024m
> topology.transfer.buffer.size: from 1024 to 32
> elasticsearch heap_size: 8192m
> 
> 1 node is at 100% load & memory and the other is almost doing nothing...
> 
> The messages we're ingesting are only approx. 1 kbyte JSON and we're
> limiting ingestion to 1200 messages/minute via NiFi. Initially,
> everything seemed to be going fine, but then Storm started throwing
> memory errors at various places.
> 
> Any idea what might be going on and how I can further troubleshoot this? 
> -- 
> 
> Jon

Reply via email to