Is your timeout value big enough?
I do not see any differences in the stacktraces, do I miss something?


> Am 06.04.2018 um 14:15 schrieb Steffen Brüntjen <steffen.bruent...@macmon.eu>:
> 
> Hi Frank
> 
> Mmh, I don't think it's because of uncaught exceptions (at least not outside 
> of SNMP4J). SNMP packets are continued to be sent out normally, the responses 
> are received by kernel but not "recognized" by SNMP4J (see tcpdump output). I 
> get normal timeouts as if the devices wouldn't respond any more (but they 
> do). I reproduced the problem once again with TRACE logging enabled. The 
> SNMP4J logs appear to be very normal, except that there are no more "Received 
> message from ... with length ...: ..." messages any more 
> (DefaultUdpTransportMapping.java:409). Here are the different types of log 
> messages after problem occurs (in random order):
> 
> Sending message to .../161 with length 43: ... | 
> (DefaultUdpTransportMapping.java:112)
> Removed pending request with handle: PduHandle[...] | (Snmp.java:995)
> Request timed out: ... | (Snmp.java:1900)
> Running pending sync request with handle PduHandle[...] and retry count left 
> 1 | (Snmp.java:1808)
> 
> 
> The stacktraces look the same when everything is running fine:
> 
> "snmp.3" #39 prio=5 os_prio=0 tid=0x00007f682a81a000 nid=0x2a71 in 
> Object.wait() [0x00007f67932f9000]
>   java.lang.Thread.State: WAITING (on object monitor)
>    at java.lang.Object.wait(Native Method)
>    at java.lang.Object.wait(Object.java:502)
>    at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275)
>    - locked <0x0000000088861810> (a org.snmp4j.util.ThreadPool$TaskManager)
> 
> "snmp.2" #38 prio=5 os_prio=0 tid=0x00007f682a818000 nid=0x2a70 in 
> Object.wait() [0x00007f67933fa000]
>   java.lang.Thread.State: WAITING (on object monitor)
>    at java.lang.Object.wait(Native Method)
>    at java.lang.Object.wait(Object.java:502)
>    at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275)
>    - locked <0x0000000088861a20> (a org.snmp4j.util.ThreadPool$TaskManager)
> 
> "snmp.1" #37 prio=5 os_prio=0 tid=0x00007f6829986800 nid=0x2a6f in 
> Object.wait() [0x00007f67934fb000]
>   java.lang.Thread.State: WAITING (on object monitor)
>    at java.lang.Object.wait(Native Method)
>    at java.lang.Object.wait(Object.java:502)
>    at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275)
>    - locked <0x0000000088861c30> (a org.snmp4j.util.ThreadPool$TaskManager)
> 
> "snmp.0" #36 prio=5 os_prio=0 tid=0x00007f6829988800 nid=0x2a6e in 
> Object.wait() [0x00007f67935fc000]
>   java.lang.Thread.State: WAITING (on object monitor)
>    at java.lang.Object.wait(Native Method)
>    at java.lang.Object.wait(Object.java:502)
>    at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275)
>    - locked <0x0000000088861e40> (a org.snmp4j.util.ThreadPool$TaskManager)
> 
> 
> Here are the stacktraces after problem starts:
> 
> "snmp.3" #39 prio=5 os_prio=0 tid=0x00007f682a81a000 nid=0x2a71 in 
> Object.wait() [0x00007f67932f9000]
>   java.lang.Thread.State: WAITING (on object monitor)
>    at java.lang.Object.wait(Native Method)
>    at java.lang.Object.wait(Object.java:502)
>    at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275)
>    - locked <0x00000000888185a0> (a org.snmp4j.util.ThreadPool$TaskManager)
> 
> "snmp.2" #38 prio=5 os_prio=0 tid=0x00007f682a818000 nid=0x2a70 in 
> Object.wait() [0x00007f67933fa000]
>   java.lang.Thread.State: WAITING (on object monitor)
>    at java.lang.Object.wait(Native Method)
>    at java.lang.Object.wait(Object.java:502)
>    at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275)
>    - locked <0x00000000888187b0> (a org.snmp4j.util.ThreadPool$TaskManager)
> 
> "snmp.1" #37 prio=5 os_prio=0 tid=0x00007f6829986800 nid=0x2a6f in 
> Object.wait() [0x00007f67934fb000]
>   java.lang.Thread.State: WAITING (on object monitor)
>    at java.lang.Object.wait(Native Method)
>    at java.lang.Object.wait(Object.java:502)
>    at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275)
>    - locked <0x00000000888189c0> (a org.snmp4j.util.ThreadPool$TaskManager)
> 
> "snmp.0" #36 prio=5 os_prio=0 tid=0x00007f6829988800 nid=0x2a6e in 
> Object.wait() [0x00007f67935fc000]
>   java.lang.Thread.State: WAITING (on object monitor)
>    at java.lang.Object.wait(Native Method)
>    at java.lang.Object.wait(Object.java:502)
>    at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275)
>    - locked <0x0000000088818bd0> (a org.snmp4j.util.ThreadPool$TaskManager)
> 
> 
> 
> Best regards
> Steffen Brüntjen
> 
> 
> -----Original Message-----
> From: Frank Fock [mailto:f...@agentpp.com] 
> Sent: Mittwoch, 4. April 2018 00:54
> To: Steffen Brüntjen <steffen.bruent...@macmon.eu>
> Cc: snmp4j@agentpp.org
> Subject: Re: [SNMP4J] Timeouts in version 2.5.7+
> 
> Hi Steffen,
> 
> From the stack trace it seems that there is no idle TaskManager in your 
> ThreadPool for the MultiThreadMessageDispatcher left.
> Maybe all TaskManagers died because of exceptions?
> 
> You can simply use the DefaultMessageDispatcher in your case. I do not expect 
> advantages from the multi-threaded variant. Maybe you can find the root cause 
> easier with the simple message dispatcher.
> 
> Best regards,
> Frank 
> 
> 
>> On 3. Apr 2018, at 13:26, Steffen Brüntjen <steffen.bruent...@macmon.eu> 
>> wrote:
>> 
>> Hi Frank
>> 
>> 
>> Thanks for your reply. I don't reuse PDUs. In case of SNMPv2c, the effective 
>> statements would be:
>> 
>> PDU pdu = new PDU();
>> pdu.setType(PDU.GET);
>> pdu.add(new VariableBinding(...));
>> ResponseEvent responseEvent = snmp.send(pdu, target); // timeout
>> 
>> I believe the PDU request IDs will be coming from this code in 
>> MessageDispatcherImpl:
>> 
>> protected PduHandle createPduHandle() {
>>   return new PduHandle(getNextRequestID());
>> }
>> 
>> 
>> There's only one static snmp instance for all targets, I read somewhere that 
>> this is the preferred implementation.
>> 
>> 
>> I'am also able to remote debug the program, but I wouldn't know where to 
>> look at. Also, it's a bit of extra work and it will take a while, but it's 
>> possible.
>> 
>> 
>> Best regards,
>> Steffen
>> 
>> 
>> -----Original Message-----
>> From: Frank Fock [mailto:f...@agentpp.com] 
>> Sent: Sonntag, 1. April 2018 08:23
>> To: Steffen Brüntjen <steffen.bruent...@macmon.eu>
>> Cc: snmp4j@agentpp.org
>> Subject: Re: [SNMP4J] Timeouts in version 2.5.7+
>> 
>> Hi Steffen,
>> 
>> There was no change on the DefaultUdpTransportMapping between 2.5.6 and 
>> 2.5.7, thus I assume that behaviour change is a side effect of some other 
>> changes. SNMP4J is no a bit faster.
>> Have you checked your code for race conditions?
>> When you use Snmp.send, how do you create the request IDs of the PDUs? 
>> 
>> Also reusing the PDU objects before the request is finished might cause this 
>> issue. 
>> 
>> Best regards,
>> Frank  
>> 
>>> On 16. Mar 2018, at 12:14, Steffen Brüntjen <steffen.bruent...@macmon.eu> 
>>> wrote:
>>> 
>>> Hi!
>>> 
>>> I'm using SNMP4J for monitoring around 200 devices. It has been running 
>>> stably for months. Now I found a problem in the latest releases: When my 
>>> process runs for a couple of hours (sometimes days), after some time SNMP4J 
>>> returns nothing but timeouts. My program initially uses Snmp.send() and 
>>> TableUtils (I think, this doesn't matter). After the problem shows up once, 
>>> all my program effectively does is querying the SysObjectIds (because this 
>>> already results in timeouts):
>>> 
>>>  org.snmp4j.Snmp.send(get(SnmpV2Mib.sysObjectID), target)
>>> 
>>> 
>>> Some of the targets are community targets, some are user targets, but I get 
>>> timeouts for all devices. I repeated testing with only community targets 
>>> resulting in the same behavior. I tracked down the problem to release 
>>> 2.5.7. The previous version 2.5.6 works without problems, all releases 
>>> starting with 2.5.7 show the problem. I also analyzed the SNMP packages 
>>> using tcpdump. The devices send their answers quickly, the packets don't 
>>> look any special. It seems that SNMP4J doesn't receive, recognize or 
>>> process the UDP packets.
>>> 
>>> Can you reproduce the problem? Does anyone see problems in source code 
>>> changes from 2.5.6 to 2.5.7?
>>> 
>>> Best regards
>>> Steffen Brüntjen
>>> 
>>> 
>>> 
>>> Here are some stacktraces at the time the problem exists:
>>> 
>>> "DefaultUDPTransportMapping_0.0.0.0/162" #72 daemon prio=5 os_prio=0 
>>> tid=0x00007fa0ac00a000 nid=0x4433 runnable [0x00007fa08b4f3000]
>>> java.lang.Thread.State: RUNNABLE
>>>    at java.net.PlainDatagramSocketImpl.receive0(Native Method)
>>>    - locked <0x0000000089977c28> (a java.net.PlainDatagramSocketImpl)
>>>    at 
>>> java.net.AbstractPlainDatagramSocketImpl.receive(AbstractPlainDatagramSocketImpl.java:143)
>>>    - locked <0x0000000089977c28> (a java.net.PlainDatagramSocketImpl)
>>>    at java.net.DatagramSocket.receive(DatagramSocket.java:812)
>>>    - locked <0x0000000089b2b3a8> (a java.net.DatagramPacket)
>>>    - locked <0x0000000089977c78> (a java.net.DatagramSocket)
>>>    at 
>>> org.snmp4j.transport.DefaultUdpTransportMapping$ListenThread.run(DefaultUdpTransportMapping.java:397)
>>>    at java.lang.Thread.run(Thread.java:745)
>>> 
>>> "DefaultUDPTransportMapping_0.0.0.0/0" #33 daemon prio=5 os_prio=0 
>>> tid=0x00007fa126c9d800 nid=0x440c in Object.wait() [0x00007fa0b60f1000]
>>> java.lang.Thread.State: WAITING (on object monitor)
>>>    at java.lang.Object.wait(Native Method)
>>>    at java.lang.Object.wait(Object.java:502)
>>>    at org.snmp4j.util.ThreadPool.execute(ThreadPool.java:103)
>>>    - locked <0x00000000883cde50> (a org.snmp4j.util.ThreadPool)
>>>    at 
>>> org.snmp4j.util.MultiThreadedMessageDispatcher.processMessage(MultiThreadedMessageDispatcher.java:162)
>>>    at 
>>> org.snmp4j.transport.AbstractTransportMapping.fireProcessMessage(AbstractTransportMapping.java:76)
>>>    at 
>>> org.snmp4j.transport.DefaultUdpTransportMapping$ListenThread.run(DefaultUdpTransportMapping.java:430)
>>>    at java.lang.Thread.run(Thread.java:745)
>>> 
>>> "snmp.3" #37 prio=5 os_prio=0 tid=0x00007fa125a38000 nid=0x4410 in 
>>> Object.wait() [0x00007fa0b5ced000]
>>> java.lang.Thread.State: WAITING (on object monitor)
>>>    at java.lang.Object.wait(Native Method)
>>>    at java.lang.Object.wait(Object.java:502)
>>>    at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275)
>>>    - locked <0x00000000883ddeb8> (a org.snmp4j.util.ThreadPool$TaskManager)
>>> 
>>> "snmp.2" #36 prio=5 os_prio=0 tid=0x00007fa126ca1800 nid=0x440f in 
>>> Object.wait() [0x00007fa0b5dee000]
>>> java.lang.Thread.State: WAITING (on object monitor)
>>>    at java.lang.Object.wait(Native Method)
>>>    at java.lang.Object.wait(Object.java:502)
>>>    at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275)
>>>    - locked <0x00000000883de0c8> (a org.snmp4j.util.ThreadPool$TaskManager)
>>> 
>>> "snmp.1" #35 prio=5 os_prio=0 tid=0x00007fa126ca0800 nid=0x440e in 
>>> Object.wait() [0x00007fa0b5eef000]
>>> java.lang.Thread.State: WAITING (on object monitor)
>>>    at java.lang.Object.wait(Native Method)
>>>    at java.lang.Object.wait(Object.java:502)
>>>    at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275)
>>>    - locked <0x00000000883de2d8> (a org.snmp4j.util.ThreadPool$TaskManager)
>>> 
>>> "snmp.0" #34 prio=5 os_prio=0 tid=0x00007fa126ca0000 nid=0x440d in 
>>> Object.wait() [0x00007fa0b5ff0000]
>>> java.lang.Thread.State: WAITING (on object monitor)
>>>    at java.lang.Object.wait(Native Method)
>>>    at java.lang.Object.wait(Object.java:502)
>>>    at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275)
>>>    - locked <0x00000000883de4e8> (a org.snmp4j.util.ThreadPool$TaskManager)
>>> 
>>> "trap.0" #71 prio=5 os_prio=0 tid=0x00007fa0ac004000 nid=0x4432 in 
>>> Object.wait() [0x00007fa08b5f4000]
>>> java.lang.Thread.State: WAITING (on object monitor)
>>>    at java.lang.Object.wait(Native Method)
>>>    at java.lang.Object.wait(Object.java:502)
>>>    at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275)
>>>    - locked <0x0000000089979080> (a org.snmp4j.util.ThreadPool$TaskManager)
>>> 
>>> 
>>> And then there are some monitoring threads that send SNMP queries:
>>> 
>>> "mon-001" #43 prio=5 os_prio=0 tid=0x00007fa0c000c000 nid=0x4416 in 
>>> Object.wait() [0x00007fa0b52e7000]
>>> java.lang.Thread.State: TIMED_WAITING (on object monitor)
>>>    at java.lang.Object.wait(Native Method)
>>>    at org.snmp4j.Snmp.send(Snmp.java:991)
>>>    - locked <0x00000000fdc0c6e0> (a org.snmp4j.Snmp$SyncResponseListener)
>>>    at org.snmp4j.Snmp.send(Snmp.java:963)
>>>    at org.snmp4j.Snmp.send(Snmp.java:928)
>>>   at [...]
>>> 
>>> 
>>> 
>>> And here's some non-binary tcpdump output:
>>> 
>>> 12:05:49.820810 IP 192.168.101.211.57522 > 10.100.7.204.snmp:  
>>> GetRequest(28)  system.sysObjectID.0
>>> E..G..@.@.@...e.
>>> d.......38.0).....public..../.........0.0...+.........
>>> 12:05:49.821321 IP 10.100.7.204.snmp > 192.168.101.211.57522:  
>>> GetResponse(37)  system.sysObjectID.0=E:cisco.1.614
>>> E..P..@.@.SW
>>> d....e......<..02.....public.%../.........0.0...+........       +....   ..f
>>> ^C
>>> _______________________________________________
>>> SNMP4J mailing list
>>> SNMP4J@agentpp.org
>>> https://oosnmp.net/mailman/listinfo/snmp4j
>> 
> 

_______________________________________________
SNMP4J mailing list
SNMP4J@agentpp.org
https://oosnmp.net/mailman/listinfo/snmp4j

Reply via email to