Hello,

we are hosting a big ZEO based zope site and encountered a problem with a
limited size of file descriptors (FDs) of the ZEO server.

Scenario

  36 ZEO clients (each 2 threads)
  1  ZEO server  (serving 10 storages)

It was not possible to connect all ZEO clients to the ZEO server.
After a short amount of time following events occur in the event.log
of the ZEO clients

[snip]

2010-02-08T14:03:25 PROBLEM(100) zrpc:21615 CW: error connecting to
('zeo-server.dummy.de', 10000): ECONNREFUSED

[snip]

and simultaneously the ZEO server hangs and the whole site goes down.
Unfortunately, there was no hint in the ZEO server logs. After 'Googling'
we found the following hint

  http://comments.gmane.org/gmane.comp.web.zope.plone.user/101892

that each zeo client connection is consuming three file descriptors at the ZEO
server side. It was possible to calculate the theoretical required number of FDs
with this info

   75 (base) + 36 (zeo-clients) x 10 (storages) = 1155

We tried to open as many connections as possible to the ZEO server with a simple
script (see attachment) and counted the number of open FDs of the ZEO server 
using "lsof".
The result was that the ZEO server hangs at 1025 open FDs. Therefore, we 
assumed that
the OS (here Linux) limits the available number of FDs to 1024 by configuration.
Using "ulimit" (hard/soft) we increased the number of allowed open FDs to 2048.
However, there was no chance to open more than 329 (instead of 360) connections
(=1025 FDs) to the ZEO server :(

After looking at the sources, ZEO server uses the asyncore library to manage the
incoming connections. After *intensive* 'Googling' we have to notice that 
python's
asyncore library has a hard compiled in size limit of open FDs (namely 1024). 
The
limit is defined as macro __FD_SETSIZE in the header file of the libc6 library

    /usr/include/bits/typesizes.h

Therefore, it was unfortunately necessary to change the limit in the header file
to

  #define __FD_SETSIZE 2048

and to re-compile python's sources to overcome the problem. However, our ZEO 
scenario
now works with the re-compiled python interpreter :)

I hope you will find this information useful.
Kind regards
Andreas


-- 
Dr. Andreas Gabriel, Hochschulrechenzentrum, http://www.uni-marburg.de/hrz
Hans-Meerwein-Str., 35032 Marburg,  fon +49 (0)6421 28-23560  fax 28-26994
-------------------- Philipps-Universitaet Marburg -----------------------

#!/usr/bin/python2.3

"""Connect to a ZEO server and check for maximal connections.

Usage: zeo-check-max-conections.py [options]

Options:

    -p port -- port to connect to

    -h host -- host to connect to (default is current host)

    -U path -- Unix-domain socket to connect to

    -S name -- comma separated list of storage names (default is '1')

    -c connections -- simultaneous connections


You must specify either -p and -h or -U.

"""

import getopt
import socket
import sys
import time

from ZEO.ClientStorage import ClientStorage


def multiConnect(addr, storages, connections):
	
    cs={}    

    for s in storages:
	for i in range(0,connections):
	   key = '%s-%s' % (s,i)
	   print 'connecting storage %s' % key
	   cs[key] = ClientStorage(addr, storage=s, wait=1, read_only=0)
           print '%s. connection established' % (len(cs))

    # release connections after 10 seconds	
    time.sleep(10)
    for s in cs.keys():
        cs[s].close()

def usage(exit=1):
    print __doc__
    print " ".join(sys.argv)
    sys.exit(exit)

def main():
    host = None
    port = None
    unix = None
    storages = ['1']
    connections = 1

    try:
        opts, args = getopt.getopt(sys.argv[1:], 'p:h:U:S:c:')
        for o, a in opts:
            if o == '-p':
                port = int(a)
            elif o == '-h':
                host = a
            elif o == '-U':
                unix = a
            elif o == '-S':
                storages = a.split(',')
            elif o == '-c':
                connections = int(a)

    except Exception, err:
        print err
        usage()

    if unix is not None:
        addr = unix
    else:
        if host is None:
            host = socket.gethostname()
        if port is None:
            usage()
        addr = host, port
	
    
    multiConnect(addr, storages, connections)


if __name__ == "__main__":
    try:
        main()
    except Exception, err:
        print err
        sys.exit(1)

_______________________________________________
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev

Reply via email to