Hi,

i have a question on RGW/multisite. The sync traffic is running a lot of 
requests per second (around 1500), which seems to be high, especially compared 
to the actual volume of user/client-requests.

We have a rather simple multisite-setup with 
- two ceph clusters (16.2.6), 1 realm, 1 zonegroup, and one zone on each side, 
one of them ist the master zone.
- latency between those cluster around 0.3ms
- each cluster has 3 RGW/beast daemons running.
- a handful of buckets (around 20), and a check script which creates one bucket 
per second (and deletes it after validating the successful bucket creation).
- one of the buckets has a few million (smaller) objects, the others are (more 
or less) empty.
- from the client side, there are just a few requests per second (mostly PUT 
objects into the one larger bucket), writing a few kilobytes per second.
- roughly 5 GB in total disk size consumed currently, with the idea to increase 
the total consumption to a few TB over time.

Both clusters are in sync (after the initial full sync, they now do incremental 
sync). Although they do sync the new objects from cluster A (master, to which 
the clients connect to) to B, we see a lot of „internal“ sync requests in our 
monitoring: each rgw daemon does about 500 requests per second to a rgw daemon 
on cluster A, especially to "/admin/log?…", which leads to a total of 1500 
requests per second just for the sync, and this results in almost 60% cpu usage 
for the rgw/beast processes.

When stopping and restarting the rgw-instances on cluster-B, it first catches 
up with the delta, and as soon as it finishes, it starts to request in this 
endless loop "/admin/log…" 

Is this amount of internal, sync-related requests normal and expected?

Thanks for any ideas how to debug / introspect this.

Best
Stefan

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to