Just a follow up. We upgraded to 2.6.27-openvz-kuindzhi.1 on Gentoo and have had no crashes since.

Full disclosure for the archives in case someone else has problems.

System Information
        Manufacturer: Dell Inc.
        Product Name: PowerEdge 2950

01:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04)


-------- Original Message --------
Subject: Need help with hanging servers
Date: Tue, 06 Jul 2010 10:33:36 -0500
From: Brian Moon <[email protected]>
To: [email protected] <[email protected]>

Hi,

Been lurking on the list for a bit before I posted. We are relatively
new and light OpenVZ users. We have three physical boxes that use
OpenVZ. One is the server that is home to our developers' environment.
Each developer has his own container. We have the occasional container
stop responding due to too many resources used, but the entire server is
fine. That is almost always the devs fault.

The other two installs we have are in production. They are sort of
miscellaneous installation boxes. Things like cacti, nagios, misc web
apps (web mail, etc.) as well as having containers for custom outgoing
SMTP servers and running Gearman workers written in PHP on a dedicated
container.

The management of OpenVZ is great. We love it. We just have one problem.
On no regular schedule, the two production servers will hang. And it is
a weird hang. They still respond to ping. And TCP connnections answer
(connect) but don't respond. So, our monitoring hangs for a while
waiting on an answer. Likewise our load balancers don't see them as down
for a while after they are not responding. It is just weird. I am hoping
that is some clue for someone. There is nothing in syslog on the host
server or any containers. There is nothing on the console. It sounds
like a resource issue. We have tried moving containers around, leaving
some off for a while, and other stuff to find the offending container.
But, nothing has worked. One or the other locks up every 5-6 days. Not
on a schedule like it is a particular cron job causing the problem.

I am sure it is something we have done. We have allocated something
wrong most likely and just need to be slapped one good time and told NO!
But, I don't know where to look. I will jump into the IRC channel too in
case someone is willing to help me and wants some real time data.

Thanks in advance for any help.

System information below. If there is more information that may help
solve this problem, let me know what to look for.

# uname -a
Linux atl-vz1 2.6.18-028stab056 #1 SMP Tue Jun 30 07:50:32 EDT 2009
x86_64 Intel(R) Xeon(R) CPU E5420 @ 2.50GHz GenuineIntel GNU/Linux

*  sys-kernel/openvz-sources
      Latest version installed: 2.6.27.5.3

System Information
        Manufacturer: Dell Inc.
        Product Name: PowerEdge 2950

# free
             total       used       free     shared    buffers     cached
Mem:      32872312   26336688    6535624          0         12   20952484
-/+ buffers/cache:    5384192   27488120
Swap:      8388656          0    8388656

# vzlist -o ctid,kmemsize,kmemsize.l -s kmemsize
      CTID   KMEMSIZE KMEMSIZE.L
       119    2025130  115710537
       116    2649072  231421075
       118    3145806   28927633
       111    3518587  115710537
       112    8613133   57855268
       121    8779664   57855268
       120   10341711  115710537
       122   10931070  231421075
       117   11024345  231421075
       113   22290970  231421075


--

Brian.
--------
http://brian.moonspot.net/
_______________________________________________
Users mailing list
[email protected]
https://openvz.org/mailman/listinfo/users

Reply via email to