[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-23 Thread Denis Krienbühl
Thanks Frédéric, we’ve done that in the meantime to work around issue #47866. The error has been reproduced and there’s a PR associated with the issue: https://tracker.ceph.com/issues/47866 Cheers, Denis > On 23 Nov 2020, at 11:56, Frédéric Nass >

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-23 Thread Frédéric Nass
Hi Denis, You might want to look at rgw_gc_obj_min_wait from [1] and try increasing the default value of 7200s (2 hours) to whatever suits your need < 2^64. Just remind that at some point you'll have to get these objects processed by the gc. Or manually through the API [2]. One thing that

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-19 Thread Janek Bevendorff
We are doing that as well. But we need to be able to check specific buckets additionally. For that we use this second approach. Since we double-check all output from our script anyway (to see if NoSuchKey actually happens), we can rule out false positives. So far all the files detected this

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-19 Thread Janek Bevendorff
I would recommend you get a dump with rados ls -p poolname (can be several GB, mine is 61GB) and grep (or ack, which is faster) for the names there to get an overview of what is there and what isn't. Looking up the names directly can easily give you the wrong picture, because it is kinda

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-19 Thread Denis Krienbühl
Thanks, we are currently scanning our object storage. It looks like we can detect the missing objects that return “No Such Key” looking at all “__multipart_” objects returned by radosgw-admin bucket radoslist, and checking if they exist using rados stat. We are currently not looking at shadow

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-19 Thread Janek Bevendorff
- The head object had a size of 0. - There was an object with a ’shadow’ in its name, belonging to that path. That is normal. What is not normal is if there are NO shadow objects. On 18/11/2020 10:06, Denis Krienbühl wrote: It looks like a single-part object. But we did replace that object

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-18 Thread Denis Krienbühl
By the way, since there’s some probability that this is a GC refcount issue, would it be possible and sane to somehow slow the GC down or disable it altogether? Is that something we could implement on our end as a stop-gap measure to prevent dataloss? > On 18 Nov 2020, at 10:46, Denis

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-18 Thread Denis Krienbühl
I can now confirm that last night’s missing object was a multi-part file. > On 18 Nov 2020, at 10:01, Janek Bevendorff > wrote: > > Sorry, it's radosgw-admin object stat --bucket=BUCKETNAME --object=OBJECTNAME > (forgot the "object" there) > > On 18/11/2020 09:58, Janek Bevendorff wrote: >>>

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-18 Thread Denis Krienbühl
It looks like a single-part object. But we did replace that object last night from backup, so I can’t know for sure if the lost one was like that. Another engineer that looked at the Rados objects last night did notice two things: - The head object had a size of 0. - There was an object with a

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-18 Thread Janek Bevendorff
Sorry, it's radosgw-admin object stat --bucket=BUCKETNAME --object=OBJECTNAME (forgot the "object" there) On 18/11/2020 09:58, Janek Bevendorff wrote: The object, a Docker layer, that went missing has not been touched in 2 months. It worked for a while, but then suddenly went missing. Was

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-18 Thread Janek Bevendorff
FYI: I have radosgw-admin gc list --include-all running every three minutes for a day, but the list has stayed empty. Though I haven't seen any further data loss, either. I will keep it running until the next time I seen an object vanish. On 17/11/2020 09:22, Janek Bevendorff wrote: I have

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-17 Thread Janek Bevendorff
I have run radosgw-admin gc list (without --include-all) a few times already, but the list was always empty. I will create a cron job running it every few minutes and writing out the results. On 17/11/2020 02:22, Eric Ivancich wrote: I’m wondering if anyone experiencing this bug would mind

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-16 Thread Eric Ivancich
I’m wondering if anyone experiencing this bug would mind running `radosgw-admin gc list --include-all` on a schedule and saving the results. I’d like to know whether these tail objects are getting removed by the gc process. If we find that that’s the case then there’s the issue of how they got

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-16 Thread Janek Bevendorff
As noted in the bug report, the issue has affected only multipart objects at this time. I have added some more remarks there. And yes, multipart objects tend to have 0 byte head objects in general. The affected objects are simply missing all shadow objects, leaving us with nothing but the

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-13 Thread Eric Ivancich
Thank you for the answers to those questions, Janek. And in case anyone hasn’t seen it, we do have a tracker for this issue: https://tracker.ceph.com/issues/47866 We may want to move most of the conversation to the comments there, so everything’s together. I do want to follow up on

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-13 Thread Janek Bevendorff
1. It seems like those reporting this issue are seeing it strictly after upgrading to Octopus. From what version did each of these sites upgrade to Octopus? From Nautilus? Mimic? Luminous? I upgraded from the latest Luminous release. 2. Does anyone have any lifecycle rules on a bucket

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-13 Thread Eric Ivancich
I have some questions for those who’ve experienced this issue. 1. It seems like those reporting this issue are seeing it strictly after upgrading to Octopus. From what version did each of these sites upgrade to Octopus? From Nautilus? Mimic? Luminous? 2. Does anyone have any lifecycle rules on

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-12 Thread huxia...@horebdata.cn
Looks like this is a very dangerous bug for data safety. Hope the bug would be quickly identified and fixed. best regards, Samuel huxia...@horebdata.cn From: Janek Bevendorff Date: 2020-11-12 18:17 To: huxia...@horebdata.cn; EDH - Manuel Rios; Rafael Lopez CC: Robin H. Johnson; ceph-users

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-12 Thread EDH - Manuel Rios
This same error caused us to wipe a full cluster of 300TB... will be related to some rados index/database bug not to s3. As Janek exposed is a mayor issue, because the error silent happend and you can only detect it with S3, when you're going to delete/purge a S3 bucket. Dropping NoSuchKey.

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-12 Thread Janek Bevendorff
I have never seen this on Luminous. I recently upgraded to Octopus and the issue started occurring only few weeks later. On 12/11/2020 16:37, huxia...@horebdata.cn wrote: which Ceph versions are affected by this RGW bug/issues? Luminous, Mimic, Octupos, or the latest? any idea? samuel

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-12 Thread huxia...@horebdata.cn
which Ceph versions are affected by this RGW bug/issues? Luminous, Mimic, Octupos, or the latest? any idea? samuel huxia...@horebdata.cn From: EDH - Manuel Rios Date: 2020-11-12 14:27 To: Janek Bevendorff; Rafael Lopez CC: Robin H. Johnson; ceph-users Subject: [ceph-users] Re: NoSuchKey on

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-12 Thread Janek Bevendorff
Here is a bug report concerning (probably) this exact issue: https://tracker.ceph.com/issues/47866 I left a comment describing the situation and my (limited) experiences with it. On 11/11/2020 10:04, Janek Bevendorff wrote: Yeah, that seems to be it. There are 239 objects prefixed

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-11 Thread Janek Bevendorff
Yeah, that seems to be it. There are 239 objects prefixed .8naRUHSG2zfgjqmwLnTPvvY1m6DZsgh in my dump. However, there are none of the multiparts from the other file to be found and the head object is 0 bytes. I checked another multipart object with an end pointer of 11. Surprisingly, it had

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-10 Thread Rafael Lopez
Hi Janek, What you said sounds right - an S3 single part obj won't have an S3 multipart string as part of the prefix. S3 multipart string looks like "2~m5Y42lPMIeis5qgJAZJfuNnzOKd7lme". >From memory, single part S3 objects that don't fit in a single rados object are assigned a random prefix that

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-10 Thread Janek Bevendorff
We are having the exact same problem (also Octopus). The object is listed by s3cmd, but trying to download it results in a 404 error. radosgw-admin object stat shows that the object still exists. Any further ideas how I can restore access to this object?

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-10 Thread Janek Bevendorff
Here's something else I noticed: when I stat objects that work via radosgw-admin, the stat info contains a "begin_iter" JSON object with RADOS key info like this "key": { "name":

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-10 Thread Janek Bevendorff
I found some of the data in the rados ls dump. We host some WARCs from the Internet Archive and one affected WARC still has its warc.os.cdx.gz file intact, while the actual warc.gz is gone. A rados stat revealed WIDE-20110903143858-01166.warc.os.cdx.gz mtime 2019-07-14T17:48:39.00+0200,

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-10 Thread Janek Bevendorff
Thanks for the reply. This issue seems to be VERY serious. New objects are disappearing every day. This is a silent, creeping data loss. I couldn't find the object with rados stat, but I am now listing all the objects and will grep the dump to see if there is anything left. Janek On

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-09 Thread Rafael Lopez
Hi Mariusz, all We have seen this issue as well, on redhat ceph 4 (I have an unresolved case open). In our case, `radosgw-admin stat` is not a sufficient check to guarantee that there are rados objects. You have to do a `rados stat` to know that. In your case, the object is ~48M in size, appears

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-09 Thread Janek Bevendorff
We are having the exact same problem (also Octopus). The object is listed by s3cmd, but trying to download it results in a 404 error. radosgw-admin object stat shows that the object still exists. Any further ideas how I can restore access to this object? (Sorry if this is a duplicate, but it

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-07-28 Thread Mariusz Gronczewski
Dnia 2020-07-27, o godz. 21:31:33 "Robin H. Johnson" napisał(a): > On Mon, Jul 27, 2020 at 08:02:23PM +0200, Mariusz Gronczewski wrote: > > Hi, > > > > I've got a problem on Octopus (15.2.3, debian packages) install, > > bucket S3 index shows a file: > > > > s3cmd ls s3://upvid/255/38355

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-07-27 Thread Robin H. Johnson
On Mon, Jul 27, 2020 at 08:02:23PM +0200, Mariusz Gronczewski wrote: > Hi, > > I've got a problem on Octopus (15.2.3, debian packages) install, bucket > S3 index shows a file: > > s3cmd ls s3://upvid/255/38355 --recursive > 2020-07-27 17:48 50584342 > >