I'm upgrading an Apache HTTP server of our SVN repos on RedHat Enterprise Linux
8. Using Subversion 1.14.1, svn checkout of even a small, simple repo with
about 150 files hangs about 90% of the time, crashes 5%, and succeeds 5%.
Given enough time, the hangs eventually time out after checking out much of the
repo. A debugger shows the following stack during the hang.
#0 epoll_wait /usr/lib64/libc.so.6
#1 impl_pollset_poll /usr/lib64/libapr-1.so.0
#2 serf_context_run /usr/lib64/libserf-1.so.0
#3 svn_ra_serf.context_run /usr/lib64/libsvn_ra_serf-1.so.0
#4 finish_report /usr/lib64/libsvn_ra_serf-1.so.0
#5 svn_wc_crawl_revisions5 /usr/lib64/libsvn_wc-1.so.0
#6 update_internal.isra /usr/lib64/libsvn_client-1.so.0
#7 svn_client.update_internal /usr/lib64/libsvn_client-1.so.0
#8 svn_client.checkout_internal /usr/lib64/libsvn_client-1.so.0
#9 svn_client_checkout3 /usr/lib64/libsvn_client-1.so.0
#10 svn_cl.checkout
#11 sub_main
#12 main
strace shows repeated calls to epoll_wait about 1 sec apart.
When the checkout crashes, it's a SIGSEGV with this stack,
#0 apr_pool_create_ex (libapr-1.so.0)
#1 svn_pool_create_ex (libsvn_subr-1.so.0)
#2 update_opened (libsvn_ra_serf-1.so.0)
#3 expat_start (libsvn_ra_serf-1.so.0)
#4 expat_start_handler (libsvn_subr-1.so.0)
#5 doContent (libexpat.so.1)
#6 contentProcessor (libexpat.so.1)
#7 XML_ParseBuffer (libexpat.so.1)
#8 svn_xml_parse (libsvn_subr-1.so.0)
#9 expat_response_handler (libsvn_ra_serf-1.so.0)
#10 process_buffer.isra.9 (libsvn_ra_serf-1.so.0)
#11 finish_report (libsvn_ra_serf-1.so.0)
#12 svn_wc_crawl_revisions5 (libsvn_wc-1.so.0)
#13 update_internal.isra.0 (libsvn_client-1.so.0)
#14 svn_client__update_internal (libsvn_client-1.so.0)
#15 svn_client__checkout_internal (libsvn_client-1.so.0)
#16 svn_client_checkout3 (libsvn_client-1.so.0)
#17 svn_cl__checkout (svn)
#18 sub_main (svn)
#19 main (svn)
#20 __libc_start_main (libc.so.6)
#21 _start (svn)
or this one,
#0 apr_allocator_alloc (libapr-1.so.0)
#1 serf_bucket_mem_alloc (libserf-1.so.0)
#2 serf_bucket_response_create (libserf-1.so.0)
#3 serf.process_connection (libserf-1.so.0)
#4 serf_event_trigger (libserf-1.so.0)
#5 serf_context_run (libserf-1.so.0)
#6 svn_ra_serf.context_run (libsvn_ra_serf-1.so.0)
#7 finish_report (libsvn_ra_serf-1.so.0)
#8 svn_wc_crawl_revisions5 (libsvn_wc-1.so.0)
#9 update_internal.isra (libsvn_client-1.so.0)
#10 svn_client.update_internal (libsvn_client-1.so.0)
#11 svn_client.checkout_internal (libsvn_client-1.so.0)
#12 svn_client_checkout3 (libsvn_client-1.so.0)
#13 svn_cl.checkout (svn)
#14 sub_main (svn)
#15 main (svn)
After a failure, I'm left with a half-checked out working copy with many locks.
I can complete it with svn cleanup and another svn checkout, but that's not
realistic for our CI/CD or general use. Server logs show no indication of a
problem; the server appears healthy.
I've tried a million things before submitting this bug report, read half a
million posts and searches, but haven't been able to get past this. I'd sure
appreciate any ideas you have on the way forward. Here's a bit more about my
system.
* svn, version 1.14.1 (r1886195)
* ra_svn : Module for accessing a repository using the svn network
protocol.
- with Cyrus SASL authentication
- handles 'svn' scheme
* ra_local : Module for accessing a repository on local disk.
- handles 'file' scheme
* ra_serf : Module for accessing a repository via WebDAV protocol using
serf.
- using serf 1.3.9 (compiled with 1.3.9)
- handles 'http' scheme
- handles 'https' scheme
The following authentication credential caches are available:
* Plaintext cache in /home/me/.subversion
* Gnome Keyring
* GPG-Agent
* svn 1.10.2 was failing the same way before we upgraded to 1.14.1 as a
possible fix.
* Checking out to a local disk succeeds more often, but still hangs and
crashes. Checking out to an NFS drive just makes it worse.
And here's more about our Apache.
* Server version: Apache/2.4.37 (Red Hat Enterprise Linux)
Server built: Aug 30 2023 11:01:53
Server's Module Magic Number: 20120211:83
Server loaded: APR 1.6.3, APR-UTIL 1.6.1
Compiled using: APR 1.6.3, APR-UTIL 1.6.1
Architecture: 64-bit
Server MPM: worker
threaded: yes (fixed thread count)
forked: yes (variable process count)
Server compiled with....
-D APR_HAS_SENDFILE
-D APR_HAS_MMAP
-D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)
-D APR_USE_SYSVSEM_SERIALIZE
-D APR_USE_PTHREAD_SERIALIZE
-D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
-D APR_HAS_OTHER_CHILD
-D AP_HAVE_RELIABLE_PIPED_LOGS
-D DYNAMIC_MODULE_LIMIT=256
-D HTTPD_ROOT="/etc/httpd"
-D SUEXEC_BIN="/usr/sbin/suexec"
-D DEFAULT_PIDLOG="run/httpd.pid"
-D DEFAULT_SCOREBOARD="logs/apache_runtime_status"
-D DEFAULT_ERRORLOG="logs/error_log"
-D AP_TYPES_CONFIG_FILE="conf/mime.types"
-D SERVER_CONFIG_FILE="conf/httpd.conf"
* Access to this server from a browser with SVN and ViewVC pages seems to
work.
* Authentication is over Kerberos with mod_auth_gssapi.
* Authorization uses AuthzSVNAccessFile and an access file.
* SSL is used with SSLCryptoDevice set to builtin, based on
this<https://lists.apache.org/thread/zysfq4cb0jkz59p0wkhfm49xwr8lj5to>.
* I've tried all three MPMs with no change, based on another post:
prefork, worker, and event.
* We've had Apache running on RedHat 6 with these repos for many years.
I'd be glad to provide additional details or run more tests. Thanks for any
ideas you have, and for supporting this software.
Jim