I have been asked to investigate a strange issue we are encountering at a
customer site in Mexico. I am a contractor for a company which supplied
surveillance and monitoring software based on the ICS component set. The
software runs fine on other sites with no problems encountered for over 8
months but on the site in Mexico after a matter of hours or days the
software (and or server) crashes.

The servers are all identical HP Blade servers running Windows Server 2003
vanilla installs. This is true of sites that are functioning and the ones in
Mexico that are not.


If the software runs fine on several indentical systems and fails on a single system, I would concentrate on what make that failing site different because it has to be different. Fist check the service pack level. I suggest first to verify that no malware is intercepting winsock calls. This is done by malware to capture trafic. Then, I would check if any suspect LSP is not installed on the system. Also check if some security products are not interfering with winsock: they frequently intercept winsock calls to block some kind of trafic. Those security products could be buggy.

My analysis of the problem to date suggests that an OnClientConnect is
firing but the passed Client object is incomplete or invalid. The code for
the OnClientConnect event does not check the ErrorCode and accepts the
connection but traffic appears not to flow correctly between client and
server.

I suggest checking the error code and reporting it into the logile for analisys.

if I run
NetStat on the server it appears a windows socket object is left in FIN-WAIT
1 or FIN-WAIT2 state. Eventually the system fails as all windows socket
objects are expended and there is a catastrophic failure of the software
and/or server.

the steps that should be taken when an error does occur to ensure that
the windows sockets are correctly 'cleaned
up' and released back to the Operating System ?

FIN-WAIT-1 and FIN-WAIT-2 means the orderly shutdown sequence is occuring but the remote site do not answer (Have a look here: http://www.tcpipguide.com/free/t_TCPConnectionTermination-2.htm). An orderly shutdown is a multiple steps sequence between client and server. What is strange here is that FIN-WAIT-1 and FIN-WAIT-2 states are client side states, not server side. So it is possible that the socket you see in that sate are NOT the one failing. Maybe something else is failing (maybe in the same software) causing those sockets to be in those states and consume all available sockets which cause trouble in the software for accepting a new connection because accepting a new connection means creating a new socket.

So I see the possibility that some other software or another part of your software has an issue with /client/ connection close, this result in a lot of sockets in the FIN-WAIT-1 or FIN-WAIT-2 state, consuming all available socket and making new connection acceptance failure.

Why those client connexions could have problems with their server not answering ? This could be cause by malware sending forget IP packets to break existing connection or a misconfiguring security product (firewall) dropping packets, or simply an overloaded network segment which is dropping packets because trafic is too high. An overloaded layer 2 switch may simply drop packets when is it not able to switch the packets fast enough.

--
francois.pie...@overbyte.be
The author of the freeware multi-tier middleware MidWare
The author of the freeware Internet Component Suite (ICS)
http://www.overbyte.be

--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be

Reply via email to