Hi During implementation / test of my snmp management application I ran into some issues with regards to timeliness management.
1. snmpEngineTime calculations uses "System.currentTimeMillis()" as reference. "System.currentTimeMillis()" is based on system time which may change forward / backwards at any time (like user setting of time / ntp activation / daylight saving / ....) I made the simple testsequences that would 1) snmp.get 2) modify system time 3) snmp.get snmp.get is configured with retries=1 and timeout=6s 1a The effect of setting system time backwards in step 2 is that the snmpEngineTime used in messages will be increased with the time that systemtime was set backwards. This results in a usmStatsNotInTimeWindows report from the agent and the manager now recovers fine from the usage of the wrong snmpEngineTime 1b The effect of setting system time forward in step 2 is that the snmpEngineTime used in messages will be decreased with the time that systemtime was set forward. This results in a usmStatsNotInTimeWindows report from the agent but in this case the manager "ignores" the report from the agent and tries to retransmit the message, thus ending in a timeout on the api level. All future communication to the snmp agent will now fail with timeouts. The attached wireshark capture shows this communication, the system time is increased with 160 seconds in between line 6 and 7 in the capture. <<systemtimechange_plus160s.pcap>> A solution to the above could / might be / should be that calculations is based on "System.nanoTime" instead of "System.currentTimeMillis()" (I tried doing this with fine result, but i havnt considered all aspect of susch a change) 2. Changing system may just as well happen on the agent side as well, so i decided to make similar test for such cases. The agent I did such test on, is based on jdmk. It turned out that this agent implementation also is affected by changes to system time. (the manager application is still based on snmp4j as above) The testsequence is similar to before: 1) snmp.get 2) modify time in agent 3) snmp.get 2a) Agent time is adjusted forwards with some value larger than 150 seconds. A usmStatsNotInTimeWindows-report is send from the agent to the manager in step 3 and the manager now recovers fine. 2b) Agent time is adjusted backwards with some value larger than 150 seconds. Behaviour is similar to 1b and end result is also that all future communication fails with timeout. Currently im not able to identify that this has happened since the communication fails with timeout. The attached wireshark capture shows this communication, the agents system time is decreased with ~5minutes between line 6 and 7 in the capture (and two snmp.gets (retries 1 timeout 6) is made after that <<systemtimechange_plus160s.pcap>> Solutions ? * Make sure agents isnt affected by changes to system time. Hard to do this as I have to deal with an installed base of these agents. * Extend snmp4j so an errorstatus will be given for such a case (jdmk has such a detector and gives an errormessage). When errormessage is received the maanger application would have to reset snmpEngineTime / snmpEngineBoots for that agent * Whenever a timeout occurs i could simply reset snmpEngineTime / snmpEngineBoots for that agent Easy workaround, but id rather have that errormessage so i wouldnt have to wait for timeouts. Kind Regards Tjip Pasma System Engineer Ericsson
_______________________________________________ SNMP4J mailing list [email protected] http://lists.agentpp.org/mailman/listinfo/snmp4j
