** Description changed:

  [Impact]
  Cancelling large S3/Swift object puts may result in garbage collection 
entries with zero-length chains. Rados gateway garbage collection does not 
efficiently process and clean up these zero-length chains.
  
  A large number of zero-length chains will result in rgw processes
  quickly spinning through the garbage collection lists doing very little
  work. This can result in abnormally high cpu utilization and op
  workloads.
  
  [Test Case]
- Disable garbage collection:
- `juju config ceph-radosgw config-flags='{"rgw": {"rgw enable gc threads": 
"false"}}'`
+ Modify garbage collection parameters by editing ceph.conf on the target rgw:
+ ```
+ [client.rgw.juju-29f238-sf00242079-4]
+ rgw enable gc threads = false
+ rgw gc obj min wait = 60
+ rgw gc processor period = 60
+ ```
  
- Repeatedly kill 256MB object put requests for randomized object names.
- `for i in {0.. 1000}; do f=$(mktemp); fallocate -l 256M $f; s3cmd put $f 
s3://test_bucket &; pid=$!; sleep $((RANDOM % 3)); kill $pid; rm $f; done`
+ Restart the ceph-radosgw service to apply the new configuration:
+ `sudo systemctl restart ceph-rado...@rgw.juju-29f238-sf00242079-4`
  
- Capture omap detail. Verify zero-length chains were created:
- `for i in $(seq 0 ${RGW_GC_MAX_OBJS:-32}); do rados -p default.rgw.log 
--namespace gc listomapvals gc.$i; done`
+ Repeatedly interrupt 512MB object put requests for randomized object names:
+ ```
+ for i in {0..1000}; do 
+   f=$(mktemp); fallocate -l 512M $f
+   s3cmd put $f s3://test_bucket.juju-29f238-sf00242079-4 --disable-multipart &
+   pid=$!
+   sleep $((RANDOM % 7 + 3)); kill $pid
+   rm $f
+ done
+ ```
  
- Raise radosgw debug levels, and enable garbage collection:
- `juju config ceph-radosgw config-flags='{"rgw": {"rgw enable gc threads": 
"false"}}' loglevel=20`
+ Delete all objects in the bucket index:
+ ```
+ for f in $(s3cmd ls s3://test_bucket.juju-29f238-sf00242079-4 | awk '{print 
$4}'); do
+   s3cmd del $f
+ done
+ ```
  
- Verify zero-lenth chains are processed correctly by inspecting radosgw
- logs.
+ By default rgw_max_gc_objs splits the garbage collection list into 32 shards.
+ Capture omap detail and verify zero-length chains were left over:
+ ```
+ for i in {0..31}; do 
+   sudo rados -p default.rgw.log --namespace gc listomapvals gc.$i
+ done
+ ```
+ 
+ Confirm the garbage collection list contains expired objects by listing 
expiration timestamps:
+ `sudo radosgw-admin gc list | grep time; date`
+ 
+ Raise the debug level and process the garbage collection list:
+ `CEPH_ARGS="--debug-rgw=20 --err-to-stderr" sudo -E radosgw-admin gc process`
+ 
+ Use the logs to verify the garbage collection process iterates through all 
remaining omap entry tags. Then confirm all rados objects have been cleaned up:
+ `sudo rados -p default.rgw.buckets.data ls`
+ 
  
  [Regression Potential]
  Backport has been accepted into the Luminous release stable branch upstream.
  
  [Other Information]
  This issue has been reported upstream [0] and was fixed in Nautilus alongside 
a number of other garbage collection issues/enhancements in pr#26601 [1]:
  * adds additional logging to make future debugging easier.
  * resolves bug where the truncated flag was not always set correctly in 
gc_iterate_entries
  * resolves bug where marker in RGWGC::process was not advanced
  * resolves bug in which gc entries with a zero-length chain were not trimmed
  * resolves bug where same gc entry tag was added to list for deletion 
multiple times
  
  These fixes were slated for back-port into Luminous and Mimic, but the
  Luminous work was not completed because of a required dependency: AIO GC
  [2]. This dependency has been resolved upstream, and is pending SRU
  verification in Ubuntu packages [3].
  
  [0] https://tracker.ceph.com/issues/38454
  [1] https://github.com/ceph/ceph/pull/26601
  [2] https://tracker.ceph.com/issues/23223
  [3] https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1838858

** Tags removed: verification-needed-bionic
** Tags added: verification-done-bionic

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1843085

Title:
  Backport of zero-length gc chain fixes to Luminous

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1843085/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to