We are investigating a curious problem with radosgw:

We see intermittent timeouts and http connections breaking when streaming video 
files through the rados gateway.

On server 1 we have Ubuntu 13.10 (saucy) with the stock Apache 2.4 and 
associated fastcgi (and a mon)

On server 2 we also have Ubuntu 13.10 but installed the ceph patched Apache 2.2 
with fastcgi (and a mon)

We can now do the following:

wget a 700 MB file from Server 2 ( wget --no-check-certificate 
https://staging.server2.example.ch/2f/e4491dbfa00c328828bbbc2c8d128a/test2.mp4 
-O /dev/null )

Stream the same file trough Chrome from server 1: -> the wget process is 
interrupted:

2013-12-05 16:32:18 (364 KB/s) - Connection closed at byte 172487549. Retrying.

the apache error log shows this:

[Thu Dec 05 16:32:50 2013] [error] [client 2001xxxx:6502] FastCGI: comm with 
server "/var/www/s3gw.fcgi" aborted: idle timeout (30 sec)
[Thu Dec 05 16:32:50 2013] [error] [client 2001:xxx:6502] FastCGI: incomplete 
headers (0 bytes) received from server "/var/www/s3gw.fcgi"
[Thu Dec 05 16:55:54 2013] [error] [client 123.123.123.123] FastCGI: comm with 
server "/var/www/s3gw.fcgi" aborted: idle timeout (30 sec)
[Thu Dec 05 16:55:54 2013] [error] [client 123.123.123.123] Handler for 
fastcgi-script returned invalid result code 1
[Thu Dec 05 16:56:25 2013] [error] [client 123.123.123.123] FastCGI: comm with 
server "/var/www/s3gw.fcgi" aborted: idle timeout (30 sec)
[Thu Dec 05 16:56:25 2013] [error] [client 123.123.123.123] FastCGI: incomplete 
headers (0 bytes) received from server "/var/www/s3gw.fcgi"


radosgw log shows this (from another test, that's why timestamps don't match)

2013-12-05 17:14:02.675911 7f2b9cfeb700  0 ERROR: s->cio->print() returned 
err=-1
2013-12-05 17:14:02.675942 7f2b9cfeb700  0 ERROR: s->cio->print() returned 
err=-1
2013-12-05 17:14:02.675946 7f2b9cfeb700  0 ERROR: s->cio->print() returned 
err=-1
2013-12-05 17:14:02.675948 7f2b9cfeb700  0 ERROR: s->cio->print() returned 
err=-1
2013-12-05 17:14:02.675973 7f2b9cfeb700  2 req 5:8.176355:s3:GET 
/2f/e4491dbfa00c328828bbbc2c8d128a/test2.mp4:get_obj:http status=403
2013-12-05 17:14:02.675971 7f2b597fa700 20 rados->get_obj_iterate_cb 
oid=default.40804.6__shadow__W8D84-M3taNmGJG4UCxDxbmNDJqubhP_5 obj-ofs=17301504 
read_ofs=0 len=4194304
2013-12-05 17:14:02.676068 7f2b597fa700 20 rados->aio_operate r=0 bl.length=0
2013-12-05 17:14:02.676299 7f2b9cfeb700  1 ====== req done req=0x1d60d40 
http_status=403 ======
2013-12-05 17:14:08.909528 7f2bbcbc0700  2 
RGWDataChangesLog::ChangesRenewThread: start
2013-12-05 17:14:30.909803 7f2bbcbc0700  2 
RGWDataChangesLog::ChangesRenewThread: start
2013-12-05 17:14:32.321141 7f2bbd3c1700 20 get_obj_aio_completion_cb: io 
completion ofs=13107200 len=4194304
2013-12-05 17:14:32.321176 7f2bbd3c1700 20 get_obj_aio_completion_cb: io 
completion ofs=8912896 len=4194304
2013-12-05 17:14:32.321185 7f2bbd3c1700 20 get_obj_aio_completion_cb: io 
completion ofs=4718592 len=4194304
2013-12-05 17:14:32.321188 7f2bbd3c1700 20 get_obj_aio_completion_cb: io 
completion ofs=8912896 len=4194304
2013-12-05 17:14:32.321194 7f2bbd3c1700 20 get_obj_aio_completion_cb: io 
completion ofs=138936320 len=4194304
2013-12-05 17:14:32.321248 7f2b597fa700 20 rados->get_obj_iterate_cb 
oid=default.40804.6__shadow__W8D84-M3taNmGJG4UCxDxbmNDJqubhP_6 obj-ofs=21495808 
read_ofs=0 len=4194304
2013-12-05 17:14:32.321281 7f2b22fad700 20 rados->get_obj_iterate_cb 
oid=default.40804.6__shadow__W8D84-M3taNmGJG4UCxDxbmNDJqubhP_38 
obj-ofs=155713536 read_ofs=0 len=4194304
2013-12-05 17:14:32.321383 7f2b597fa700 20 rados->aio_operate r=0 bl.length=0
2013-12-05 17:14:32.321436 7f2b597fa700 20 rados->get_obj_iterate_cb 
oid=default.40804.6__shadow__W8D84-M3taNmGJG4UCxDxbmNDJqubhP_7 obj-ofs=25690112 
read_ofs=0 len=4194304
2013-12-05 17:14:32.321530 7f2b22fad700 20 rados->aio_operate r=0 bl.length=0
2013-12-05 17:14:32.321672 7f2b597fa700 20 rados->aio_operate r=0 bl.length=0
2013-12-05 17:14:32.321690 7f2b597fa700 20 rados->get_obj_iterate_cb 
oid=default.40804.6__shadow__W8D84-M3taNmGJG4UCxDxbmNDJqubhP_8 obj-ofs=29884416 
read_ofs=0 len=4194304
2013-12-05 17:14:32.321779 7f2b597fa700 20 rados->aio_operate r=0 bl.length=0
2013-12-05 17:14:35.635836 7f2bbd3c1700 20 get_obj_aio_completion_cb: io 
completion ofs=147324928 len=4194304
2013-12-05 17:14:35.635881 7f2bbd3c1700 20 get_obj_aio_completion_cb: io 
completion ofs=143130624 len=4194304
2013-12-05 17:14:35.635886 7f2bbd3c1700 20 get_obj_aio_completion_cb: io 
completion ofs=17301504 len=4194304
2013-12-05 17:14:35.635961 7f2b22fad700 20 rados->get_obj_iterate_cb 
oid=default.40804.6__shadow__W8D84-M3taNmGJG4UCxDxbmNDJqubhP_39 
obj-ofs=159907840 read_ofs=0 len=4194304
2013-12-05 17:14:35.636133 7f2b22fad700 20 rados->aio_operate r=0 bl.length=0
2013-12-05 17:14:35.636157 7f2b22fad700 20 rados->get_obj_iterate_cb 
oid=default.40804.6__shadow__W8D84-M3taNmGJG4UCxDxbmNDJqubhP_40 
obj-ofs=164102144 read_ofs=0 len=4194304
2013-12-05 17:14:35.636226 7f2b22fad700 20 rados->aio_operate r=0 bl.length=0
2013-12-05 17:14:35.636551 7f2bbd3c1700 20 get_obj_aio_completion_cb: io 
completion ofs=17301504 len=4194304
2013-12-05 17:14:35.636645 7f2b597fa700 20 rados->get_obj_iterate_cb 
oid=default.40804.6__shadow__W8D84-M3taNmGJG4UCxDxbmNDJqubhP_9 obj-ofs=34078720 
read_ofs=0 len=4194304
2013-12-05 17:14:35.636764 7f2b597fa700 20 rados->aio_operate r=0 bl.length=0
2013-12-05 17:14:52.909931 7f2bbcbc0700  2 
RGWDataChangesLog::ChangesRenewThread: start
2013-12-05 17:15:06.803464 7f2bc96237c0 20 enqueued request req=0x1d62fe0
2013-12-05 17:15:06.803499 7f2bc96237c0 20 RGWWQ:
2013-12-05 17:15:06.803503 7f2bc96237c0 20 req: 0x1d62fe0
2013-12-05 17:15:06.803511 7f2bc96237c0 10 allocated request req=0x1d64bc0
2013-12-05 17:15:06.803559 7f2b5affd700 20 dequeued request req=0x1d62fe0
2013-12-05 17:15:06.803574 7f2b5affd700 20 RGWWQ: empty
2013-12-05 17:15:06.803692 7f2b5affd700  2 req 7:0.000110::GET 
/2f/e4491dbfa00c328828bbbc2c8d128a/test2.mp4::initializing
2013-12-05 17:15:06.803702 7f2b5affd700 10 host=xxxxxx rgw_dns_name=xxxxxx




Ceph Versions:
root@server2:/var/log/apache2# radosgw --version
ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7)


root@server1:/etc# ceph -s
  cluster 6b3bd327-2f97-44f6-a8fc-2af8534a2e67
   health HEALTH_OK
   monmap e1: 3 mons at 
{h0s=[2001:]:6789/0,h1s=[2001:]:6789/0,hxs=[2001:]:6789/0}, election epoch 
1270, quorum 0,1,2 server2, server3, server1
   osdmap e6645: 24 osds: 24 up, 24 in
    pgmap v2541337: 7368 pgs: 7368 active+clean; 2602 GB data, 5213 GB used, 
61822 GB / 67035 GB avail; 31013KB/s rd, 151KB/s wr, 34op/s
   mdsmap e1: 0/0/1 up

root@server1:/etc# ceph --version
ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7)


        
more log files available upon request….


any ideas?

cheers
Jens-Christian

-- 
SWITCH
Jens-Christian Fischer, Peta Solutions
Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland
phone +41 44 268 15 15, direct +41 44 268 15 71
jens-christian.fisc...@switch.ch
http://www.switch.ch

http://www.switch.ch/socialmedia

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to