Greetings, all.
Just started monitoring three Windows and one Linux server with Zenoss 2.0.3.
Overall, very impressed.
I'm sure my network topology is somewhat unique. My Windows servers are at a
third-party hosting provider and sitting behind a firewall. So I installed
Hamachi VPN services on all my boxes. If you're not familiar, Hamachi is a
pretty cool free VPN/Virtual LAN tool (www.hamachi.cc). It allows me to use
SNMP and WMI monitoring securely and without poking holes in firewalls or
dealing with overly complex VPN solutions.
As I've seen one or two other people report, I've been having a recurring issue
with zenwin and zenwinmodeler. When polling my Windows boxes, I'll
occasionally get the following error:
Code:
ERROR_SEM_TIMEOUT: The semaphore timeout period has expired. (121)
This is a Windows error, I believe. It's probably due to network slowness or
maybe some vaguary of Hamachi, I don't know. The main issue is that it seems
to cause zenwin and zenwinmodeler to up and die. Then I receive heartbeat
failure emails until they restart themselves.
Here's an excerpt from my zenwinmodeler.log right when it happens:
Code:
2007-07-13 17:51:28 ERROR zen.zenwinmodeler: ERROR_SEM_TIMEOUT: The semaphore
timeout period has expired. (121)
Traceback (most recent call last):
File "/usr/local/zenoss/Products/ZenWin/zenwinmodeler.py", line 63, in
processLoop
svcs = self.getServices(name, user, passwd)
File "/usr/local/zenoss/Products/ZenWin/zenwinmodeler.py", line 97, in
getServices
dev.connect()
File "/usr/local/zenoss/Products/ZenWin/wmiclient.py", line 51, in connect
self.flags,self.valueset)
File "usr/local/zenoss/lib/python/win32com/client.py", line 33, in
ConnectServer
services = pywmi.WBEM_ConnectServer(name, namespace, user, passwd, locale,
flags, authority, valueset)
com_error: com_error(121): DOS code 0x00000079
2007-07-13 17:51:28 INFO zen.zenwinmodeler: collecting from my1.server.com
using user .\Administrator
2007-07-13 17:53:28 ERROR zen.zenwinmodeler: ERROR_SEM_TIMEOUT: The semaphore
timeout period has expired. (121)
Traceback (most recent call last):
File "/usr/local/zenoss/Products/ZenWin/zenwinmodeler.py", line 63, in
processLoop
svcs = self.getServices(name, user, passwd)
File "/usr/local/zenoss/Products/ZenWin/zenwinmodeler.py", line 97, in
getServices
dev.connect()
File "/usr/local/zenoss/Products/ZenWin/wmiclient.py", line 51, in connect
self.flags,self.valueset)
File "usr/local/zenoss/lib/python/win32com/client.py", line 33, in
ConnectServer
services = pywmi.WBEM_ConnectServer(name, namespace, user, passwd, locale,
flags, authority, valueset)
com_error: com_error(121): DOS code 0x00000079
Also sprinkled throughout the logs are:
Code:
2007-07-13 17:54:28 WARNING zen.zenwinmodeler: skipping my1.server.com has bad
wmi state
2007-07-13 17:55:28 WARNING zen.zenwinmodeler: skipping my1.server.com has bad
wmi state
2007-07-13 17:56:28 WARNING zen.zenwinmodeler: skipping my1.server.com has bad
wmi state
I do seem to be tracking performance metrics on these machines, so I'm not sure
what the "bad wmi state" thing is about.
Anyway, I don't know what the solution is. Maybe someone does. Perhaps the
Win32 Python libraries could be more forgiving with respect to their semaphore
timeouts? Is that configurable somewhere?
Also, are there some settings that could possibly be tweaked in the registry on
the servers to help in this situation?
Thanks
------------------------
Max Edison
-------------------- m2f --------------------
Read this topic online here:
http://community.zenoss.com/forums/viewtopic.php?p=8721#8721
-------------------- m2f --------------------
_______________________________________________
zenoss-users mailing list
[email protected]
http://lists.zenoss.org/mailman/listinfo/zenoss-users