** Description changed: [Impact] Cancelling large S3/Swift object puts may result in garbage collection entries with zero-length chains. Rados gateway garbage collection does not efficiently process and clean up these zero-length chains. A large number of zero-length chains will result in rgw processes quickly spinning through the garbage collection lists doing very little work. This can result in abnormally high cpu utilization and op workloads. [Test Case] - Disable garbage collection: - `juju config ceph-radosgw config-flags='{"rgw": {"rgw enable gc threads": "false"}}'` + Modify garbage collection parameters by editing ceph.conf on the target rgw: + ``` + [client.rgw.juju-29f238-sf00242079-4] + rgw enable gc threads = false + rgw gc obj min wait = 60 + rgw gc processor period = 60 + ``` - Repeatedly kill 256MB object put requests for randomized object names. - `for i in {0.. 1000}; do f=$(mktemp); fallocate -l 256M $f; s3cmd put $f s3://test_bucket &; pid=$!; sleep $((RANDOM % 3)); kill $pid; rm $f; done` + Restart the ceph-radosgw service to apply the new configuration: + `sudo systemctl restart ceph-rado...@rgw.juju-29f238-sf00242079-4` - Capture omap detail. Verify zero-length chains were created: - `for i in $(seq 0 ${RGW_GC_MAX_OBJS:-32}); do rados -p default.rgw.log --namespace gc listomapvals gc.$i; done` + Repeatedly interrupt 512MB object put requests for randomized object names: + ``` + for i in {0..1000}; do + f=$(mktemp); fallocate -l 512M $f + s3cmd put $f s3://test_bucket.juju-29f238-sf00242079-4 --disable-multipart & + pid=$! + sleep $((RANDOM % 7 + 3)); kill $pid + rm $f + done + ``` - Raise radosgw debug levels, and enable garbage collection: - `juju config ceph-radosgw config-flags='{"rgw": {"rgw enable gc threads": "false"}}' loglevel=20` + Delete all objects in the bucket index: + ``` + for f in $(s3cmd ls s3://test_bucket.juju-29f238-sf00242079-4 | awk '{print $4}'); do + s3cmd del $f + done + ``` - Verify zero-lenth chains are processed correctly by inspecting radosgw - logs. + By default rgw_max_gc_objs splits the garbage collection list into 32 shards. + Capture omap detail and verify zero-length chains were left over: + ``` + for i in {0..31}; do + sudo rados -p default.rgw.log --namespace gc listomapvals gc.$i + done + ``` + + Confirm the garbage collection list contains expired objects by listing expiration timestamps: + `sudo radosgw-admin gc list | grep time; date` + + Raise the debug level and process the garbage collection list: + `CEPH_ARGS="--debug-rgw=20 --err-to-stderr" sudo -E radosgw-admin gc process` + + Use the logs to verify the garbage collection process iterates through all remaining omap entry tags. Then confirm all rados objects have been cleaned up: + `sudo rados -p default.rgw.buckets.data ls` + [Regression Potential] Backport has been accepted into the Luminous release stable branch upstream. [Other Information] This issue has been reported upstream [0] and was fixed in Nautilus alongside a number of other garbage collection issues/enhancements in pr#26601 [1]: * adds additional logging to make future debugging easier. * resolves bug where the truncated flag was not always set correctly in gc_iterate_entries * resolves bug where marker in RGWGC::process was not advanced * resolves bug in which gc entries with a zero-length chain were not trimmed * resolves bug where same gc entry tag was added to list for deletion multiple times These fixes were slated for back-port into Luminous and Mimic, but the Luminous work was not completed because of a required dependency: AIO GC [2]. This dependency has been resolved upstream, and is pending SRU verification in Ubuntu packages [3]. [0] https://tracker.ceph.com/issues/38454 [1] https://github.com/ceph/ceph/pull/26601 [2] https://tracker.ceph.com/issues/23223 [3] https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1838858
** Tags removed: verification-needed-bionic ** Tags added: verification-done-bionic -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1843085 Title: Backport of zero-length gc chain fixes to Luminous To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1843085/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs