Is your timeout value big enough? I do not see any differences in the stacktraces, do I miss something?
> Am 06.04.2018 um 14:15 schrieb Steffen Brüntjen <steffen.bruent...@macmon.eu>: > > Hi Frank > > Mmh, I don't think it's because of uncaught exceptions (at least not outside > of SNMP4J). SNMP packets are continued to be sent out normally, the responses > are received by kernel but not "recognized" by SNMP4J (see tcpdump output). I > get normal timeouts as if the devices wouldn't respond any more (but they > do). I reproduced the problem once again with TRACE logging enabled. The > SNMP4J logs appear to be very normal, except that there are no more "Received > message from ... with length ...: ..." messages any more > (DefaultUdpTransportMapping.java:409). Here are the different types of log > messages after problem occurs (in random order): > > Sending message to .../161 with length 43: ... | > (DefaultUdpTransportMapping.java:112) > Removed pending request with handle: PduHandle[...] | (Snmp.java:995) > Request timed out: ... | (Snmp.java:1900) > Running pending sync request with handle PduHandle[...] and retry count left > 1 | (Snmp.java:1808) > > > The stacktraces look the same when everything is running fine: > > "snmp.3" #39 prio=5 os_prio=0 tid=0x00007f682a81a000 nid=0x2a71 in > Object.wait() [0x00007f67932f9000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275) > - locked <0x0000000088861810> (a org.snmp4j.util.ThreadPool$TaskManager) > > "snmp.2" #38 prio=5 os_prio=0 tid=0x00007f682a818000 nid=0x2a70 in > Object.wait() [0x00007f67933fa000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275) > - locked <0x0000000088861a20> (a org.snmp4j.util.ThreadPool$TaskManager) > > "snmp.1" #37 prio=5 os_prio=0 tid=0x00007f6829986800 nid=0x2a6f in > Object.wait() [0x00007f67934fb000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275) > - locked <0x0000000088861c30> (a org.snmp4j.util.ThreadPool$TaskManager) > > "snmp.0" #36 prio=5 os_prio=0 tid=0x00007f6829988800 nid=0x2a6e in > Object.wait() [0x00007f67935fc000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275) > - locked <0x0000000088861e40> (a org.snmp4j.util.ThreadPool$TaskManager) > > > Here are the stacktraces after problem starts: > > "snmp.3" #39 prio=5 os_prio=0 tid=0x00007f682a81a000 nid=0x2a71 in > Object.wait() [0x00007f67932f9000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275) > - locked <0x00000000888185a0> (a org.snmp4j.util.ThreadPool$TaskManager) > > "snmp.2" #38 prio=5 os_prio=0 tid=0x00007f682a818000 nid=0x2a70 in > Object.wait() [0x00007f67933fa000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275) > - locked <0x00000000888187b0> (a org.snmp4j.util.ThreadPool$TaskManager) > > "snmp.1" #37 prio=5 os_prio=0 tid=0x00007f6829986800 nid=0x2a6f in > Object.wait() [0x00007f67934fb000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275) > - locked <0x00000000888189c0> (a org.snmp4j.util.ThreadPool$TaskManager) > > "snmp.0" #36 prio=5 os_prio=0 tid=0x00007f6829988800 nid=0x2a6e in > Object.wait() [0x00007f67935fc000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275) > - locked <0x0000000088818bd0> (a org.snmp4j.util.ThreadPool$TaskManager) > > > > Best regards > Steffen Brüntjen > > > -----Original Message----- > From: Frank Fock [mailto:f...@agentpp.com] > Sent: Mittwoch, 4. April 2018 00:54 > To: Steffen Brüntjen <steffen.bruent...@macmon.eu> > Cc: snmp4j@agentpp.org > Subject: Re: [SNMP4J] Timeouts in version 2.5.7+ > > Hi Steffen, > > From the stack trace it seems that there is no idle TaskManager in your > ThreadPool for the MultiThreadMessageDispatcher left. > Maybe all TaskManagers died because of exceptions? > > You can simply use the DefaultMessageDispatcher in your case. I do not expect > advantages from the multi-threaded variant. Maybe you can find the root cause > easier with the simple message dispatcher. > > Best regards, > Frank > > >> On 3. Apr 2018, at 13:26, Steffen Brüntjen <steffen.bruent...@macmon.eu> >> wrote: >> >> Hi Frank >> >> >> Thanks for your reply. I don't reuse PDUs. In case of SNMPv2c, the effective >> statements would be: >> >> PDU pdu = new PDU(); >> pdu.setType(PDU.GET); >> pdu.add(new VariableBinding(...)); >> ResponseEvent responseEvent = snmp.send(pdu, target); // timeout >> >> I believe the PDU request IDs will be coming from this code in >> MessageDispatcherImpl: >> >> protected PduHandle createPduHandle() { >> return new PduHandle(getNextRequestID()); >> } >> >> >> There's only one static snmp instance for all targets, I read somewhere that >> this is the preferred implementation. >> >> >> I'am also able to remote debug the program, but I wouldn't know where to >> look at. Also, it's a bit of extra work and it will take a while, but it's >> possible. >> >> >> Best regards, >> Steffen >> >> >> -----Original Message----- >> From: Frank Fock [mailto:f...@agentpp.com] >> Sent: Sonntag, 1. April 2018 08:23 >> To: Steffen Brüntjen <steffen.bruent...@macmon.eu> >> Cc: snmp4j@agentpp.org >> Subject: Re: [SNMP4J] Timeouts in version 2.5.7+ >> >> Hi Steffen, >> >> There was no change on the DefaultUdpTransportMapping between 2.5.6 and >> 2.5.7, thus I assume that behaviour change is a side effect of some other >> changes. SNMP4J is no a bit faster. >> Have you checked your code for race conditions? >> When you use Snmp.send, how do you create the request IDs of the PDUs? >> >> Also reusing the PDU objects before the request is finished might cause this >> issue. >> >> Best regards, >> Frank >> >>> On 16. Mar 2018, at 12:14, Steffen Brüntjen <steffen.bruent...@macmon.eu> >>> wrote: >>> >>> Hi! >>> >>> I'm using SNMP4J for monitoring around 200 devices. It has been running >>> stably for months. Now I found a problem in the latest releases: When my >>> process runs for a couple of hours (sometimes days), after some time SNMP4J >>> returns nothing but timeouts. My program initially uses Snmp.send() and >>> TableUtils (I think, this doesn't matter). After the problem shows up once, >>> all my program effectively does is querying the SysObjectIds (because this >>> already results in timeouts): >>> >>> org.snmp4j.Snmp.send(get(SnmpV2Mib.sysObjectID), target) >>> >>> >>> Some of the targets are community targets, some are user targets, but I get >>> timeouts for all devices. I repeated testing with only community targets >>> resulting in the same behavior. I tracked down the problem to release >>> 2.5.7. The previous version 2.5.6 works without problems, all releases >>> starting with 2.5.7 show the problem. I also analyzed the SNMP packages >>> using tcpdump. The devices send their answers quickly, the packets don't >>> look any special. It seems that SNMP4J doesn't receive, recognize or >>> process the UDP packets. >>> >>> Can you reproduce the problem? Does anyone see problems in source code >>> changes from 2.5.6 to 2.5.7? >>> >>> Best regards >>> Steffen Brüntjen >>> >>> >>> >>> Here are some stacktraces at the time the problem exists: >>> >>> "DefaultUDPTransportMapping_0.0.0.0/162" #72 daemon prio=5 os_prio=0 >>> tid=0x00007fa0ac00a000 nid=0x4433 runnable [0x00007fa08b4f3000] >>> java.lang.Thread.State: RUNNABLE >>> at java.net.PlainDatagramSocketImpl.receive0(Native Method) >>> - locked <0x0000000089977c28> (a java.net.PlainDatagramSocketImpl) >>> at >>> java.net.AbstractPlainDatagramSocketImpl.receive(AbstractPlainDatagramSocketImpl.java:143) >>> - locked <0x0000000089977c28> (a java.net.PlainDatagramSocketImpl) >>> at java.net.DatagramSocket.receive(DatagramSocket.java:812) >>> - locked <0x0000000089b2b3a8> (a java.net.DatagramPacket) >>> - locked <0x0000000089977c78> (a java.net.DatagramSocket) >>> at >>> org.snmp4j.transport.DefaultUdpTransportMapping$ListenThread.run(DefaultUdpTransportMapping.java:397) >>> at java.lang.Thread.run(Thread.java:745) >>> >>> "DefaultUDPTransportMapping_0.0.0.0/0" #33 daemon prio=5 os_prio=0 >>> tid=0x00007fa126c9d800 nid=0x440c in Object.wait() [0x00007fa0b60f1000] >>> java.lang.Thread.State: WAITING (on object monitor) >>> at java.lang.Object.wait(Native Method) >>> at java.lang.Object.wait(Object.java:502) >>> at org.snmp4j.util.ThreadPool.execute(ThreadPool.java:103) >>> - locked <0x00000000883cde50> (a org.snmp4j.util.ThreadPool) >>> at >>> org.snmp4j.util.MultiThreadedMessageDispatcher.processMessage(MultiThreadedMessageDispatcher.java:162) >>> at >>> org.snmp4j.transport.AbstractTransportMapping.fireProcessMessage(AbstractTransportMapping.java:76) >>> at >>> org.snmp4j.transport.DefaultUdpTransportMapping$ListenThread.run(DefaultUdpTransportMapping.java:430) >>> at java.lang.Thread.run(Thread.java:745) >>> >>> "snmp.3" #37 prio=5 os_prio=0 tid=0x00007fa125a38000 nid=0x4410 in >>> Object.wait() [0x00007fa0b5ced000] >>> java.lang.Thread.State: WAITING (on object monitor) >>> at java.lang.Object.wait(Native Method) >>> at java.lang.Object.wait(Object.java:502) >>> at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275) >>> - locked <0x00000000883ddeb8> (a org.snmp4j.util.ThreadPool$TaskManager) >>> >>> "snmp.2" #36 prio=5 os_prio=0 tid=0x00007fa126ca1800 nid=0x440f in >>> Object.wait() [0x00007fa0b5dee000] >>> java.lang.Thread.State: WAITING (on object monitor) >>> at java.lang.Object.wait(Native Method) >>> at java.lang.Object.wait(Object.java:502) >>> at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275) >>> - locked <0x00000000883de0c8> (a org.snmp4j.util.ThreadPool$TaskManager) >>> >>> "snmp.1" #35 prio=5 os_prio=0 tid=0x00007fa126ca0800 nid=0x440e in >>> Object.wait() [0x00007fa0b5eef000] >>> java.lang.Thread.State: WAITING (on object monitor) >>> at java.lang.Object.wait(Native Method) >>> at java.lang.Object.wait(Object.java:502) >>> at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275) >>> - locked <0x00000000883de2d8> (a org.snmp4j.util.ThreadPool$TaskManager) >>> >>> "snmp.0" #34 prio=5 os_prio=0 tid=0x00007fa126ca0000 nid=0x440d in >>> Object.wait() [0x00007fa0b5ff0000] >>> java.lang.Thread.State: WAITING (on object monitor) >>> at java.lang.Object.wait(Native Method) >>> at java.lang.Object.wait(Object.java:502) >>> at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275) >>> - locked <0x00000000883de4e8> (a org.snmp4j.util.ThreadPool$TaskManager) >>> >>> "trap.0" #71 prio=5 os_prio=0 tid=0x00007fa0ac004000 nid=0x4432 in >>> Object.wait() [0x00007fa08b5f4000] >>> java.lang.Thread.State: WAITING (on object monitor) >>> at java.lang.Object.wait(Native Method) >>> at java.lang.Object.wait(Object.java:502) >>> at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275) >>> - locked <0x0000000089979080> (a org.snmp4j.util.ThreadPool$TaskManager) >>> >>> >>> And then there are some monitoring threads that send SNMP queries: >>> >>> "mon-001" #43 prio=5 os_prio=0 tid=0x00007fa0c000c000 nid=0x4416 in >>> Object.wait() [0x00007fa0b52e7000] >>> java.lang.Thread.State: TIMED_WAITING (on object monitor) >>> at java.lang.Object.wait(Native Method) >>> at org.snmp4j.Snmp.send(Snmp.java:991) >>> - locked <0x00000000fdc0c6e0> (a org.snmp4j.Snmp$SyncResponseListener) >>> at org.snmp4j.Snmp.send(Snmp.java:963) >>> at org.snmp4j.Snmp.send(Snmp.java:928) >>> at [...] >>> >>> >>> >>> And here's some non-binary tcpdump output: >>> >>> 12:05:49.820810 IP 192.168.101.211.57522 > 10.100.7.204.snmp: >>> GetRequest(28) system.sysObjectID.0 >>> E..G..@.@.@...e. >>> d.......38.0).....public..../.........0.0...+......... >>> 12:05:49.821321 IP 10.100.7.204.snmp > 192.168.101.211.57522: >>> GetResponse(37) system.sysObjectID.0=E:cisco.1.614 >>> E..P..@.@.SW >>> d....e......<..02.....public.%../.........0.0...+........ +.... ..f >>> ^C >>> _______________________________________________ >>> SNMP4J mailing list >>> SNMP4J@agentpp.org >>> https://oosnmp.net/mailman/listinfo/snmp4j >> > _______________________________________________ SNMP4J mailing list SNMP4J@agentpp.org https://oosnmp.net/mailman/listinfo/snmp4j