Re: [ceph-users] scrub errors on rgw data pool

2019-11-29 Thread M Ranga Swami Reddy
Primary OSD crashes with below assert: 12.2.11/src/osd/ReplicatedBackend.cc:1445 assert(peer_missing.count( fromshard)) == here I have 2 OSDs with bluestore backend and 1 osd with filestore backend. On Mon, Nov 25, 2019 at 3:34 PM M Ranga Swami Reddy wrote: > Hello - We are using the ceph 12.2.1

Re: [ceph-users] scrub errors on rgw data pool

2019-11-25 Thread M Ranga Swami Reddy
graded to nautilus (clean > bluestore installation) > > - Original Message - > > From: "M Ranga Swami Reddy" > > To: "ceph-users" , "ceph-devel" < > ceph-de...@vger.kernel.org> > > Sent: Monday, 25 November, 2019 12:04:46 &

Re: [ceph-users] scrub errors on rgw data pool

2019-11-25 Thread Fyodor Ustinov
Hi! I had similar errors in pools on SSD until I upgraded to nautilus (clean bluestore installation) - Original Message - > From: "M Ranga Swami Reddy" > To: "ceph-users" , "ceph-devel" > > Sent: Monday, 25 November, 2019 12:04:46 > Subjec

[ceph-users] scrub errors on rgw data pool

2019-11-25 Thread M Ranga Swami Reddy
Hello - We are using the ceph 12.2.11 version (upgraded from Jewel 10.2.12 to 12.2.11). In this cluster, we are having mix of filestore and bluestore OSD backends. Recently we are seeing the scrub errors on rgw buckets.data pool every day, after scrub operation performed by Ceph. If we run the PG r

[ceph-users] scrub errors because of missing shards on luminous

2019-09-19 Thread Mattia Belluco
Dear ml, we are currently trying to wrap our heads around a HEALTH_ERR problem on our Luminous 12.2.12 cluster (upgraded from Jewel a couple of weeks ago). Before attempting a 'ceph pg repair' we would like to have a better understanding of what has happened. ceph -s reports: cluster: id:

Re: [ceph-users] scrub errors

2019-03-28 Thread Brad Hubbard
On Fri, Mar 29, 2019 at 7:54 AM solarflow99 wrote: > > ok, I tried doing ceph osd out on each of the 4 OSDs 1 by 1. I got it out of > backfill mode but still not sure if it'll fix anything. pg 10.2a still shows > state active+clean+inconsistent. Peer 8 is now > remapped+inconsistent+peering

Re: [ceph-users] scrub errors

2019-03-28 Thread solarflow99
ok, I tried doing ceph osd out on each of the 4 OSDs 1 by 1. I got it out of backfill mode but still not sure if it'll fix anything. pg 10.2a still shows state active+clean+inconsistent. Peer 8 is now remapped+inconsistent+peering, and the other peer is active+clean+inconsistent On Wed, Mar 2

Re: [ceph-users] scrub errors

2019-03-27 Thread Brad Hubbard
On Thu, Mar 28, 2019 at 8:33 AM solarflow99 wrote: > > yes, but nothing seems to happen. I don't understand why it lists OSDs 7 in > the "recovery_state": when i'm only using 3 replicas and it seems to use > 41,38,8 Well, osd 8s state is listed as "active+undersized+degraded+remapped+wait_bac

Re: [ceph-users] scrub errors

2019-03-27 Thread solarflow99
yes, but nothing seems to happen. I don't understand why it lists OSDs 7 in the "recovery_state": when i'm only using 3 replicas and it seems to use 41,38,8 # ceph health detail HEALTH_ERR 1 pgs inconsistent; 47 scrub errors pg 10.2a is active+clean+inconsistent, acting [41,38,8] 47 scrub errors

Re: [ceph-users] scrub errors

2019-03-26 Thread Brad Hubbard
http://docs.ceph.com/docs/hammer/rados/troubleshooting/troubleshooting-pg/ Did you try repairing the pg? On Tue, Mar 26, 2019 at 9:08 AM solarflow99 wrote: > > yes, I know its old. I intend to have it replaced but thats a few months > away and was hoping to get past this. the other OSDs appe

Re: [ceph-users] scrub errors

2019-03-25 Thread solarflow99
yes, I know its old. I intend to have it replaced but thats a few months away and was hoping to get past this. the other OSDs appear to be ok, I see them up and in, why do you see something wrong? On Mon, Mar 25, 2019 at 4:00 PM Brad Hubbard wrote: > Hammer is no longer supported. > > What's t

Re: [ceph-users] scrub errors

2019-03-25 Thread Brad Hubbard
Hammer is no longer supported. What's the status of osds 7 and 17? On Tue, Mar 26, 2019 at 8:56 AM solarflow99 wrote: > > hi, thanks. Its still using Hammer. Here's the output from the pg query, > the last command you gave doesn't work at all but be too old. > > > # ceph pg 10.2a query > { >

Re: [ceph-users] scrub errors

2019-03-25 Thread solarflow99
hi, thanks. Its still using Hammer. Here's the output from the pg query, the last command you gave doesn't work at all but be too old. # ceph pg 10.2a query { "state": "active+clean+inconsistent", "snap_trimq": "[]", "epoch": 23265, "up": [ 41, 38, 8

Re: [ceph-users] scrub errors

2019-03-25 Thread Brad Hubbard
It would help to know what version you are running but, to begin with, could you post the output of the following? $ sudo ceph pg 10.2a query $ sudo rados list-inconsistent-obj 10.2a --format=json-pretty Also, have a read of http://docs.ceph.com/docs/mimic/rados/troubleshooting/troubleshooting-pg

[ceph-users] scrub errors

2019-03-25 Thread solarflow99
I noticed my cluster has scrub errors but the deep-scrub command doesn't show any errors. Is there any way to know what it takes to fix it? # ceph health detail HEALTH_ERR 1 pgs inconsistent; 47 scrub errors pg 10.2a is active+clean+inconsistent, acting [41,38,8] 47 scrub errors # zgrep 10.2a

Re: [ceph-users] scrub errors

2018-10-23 Thread Sergey Malinin
There is an osd_scrub_auto_repair setting which defaults to 'false'. > On 23.10.2018, at 12:12, Dominque Roux wrote: > > Hi all, > > We lately faced several scrub errors. > All of them were more or less easily fixed with the ceph pg repair X.Y > command. > > We're using ceph version 12.2.7 an

[ceph-users] scrub errors

2018-10-23 Thread Dominque Roux
Hi all, We lately faced several scrub errors. All of them were more or less easily fixed with the ceph pg repair X.Y command. We're using ceph version 12.2.7 and have SSD and HDD pools. Is there a way to prevent our datastore from these kind of errors, or is there a way to automate the fix (It w

Re: [ceph-users] Scrub Errors

2016-05-06 Thread Blade
Oliver Dzombic writes: > > Hi Blade, > > you can try to set the min_size to 1, to get it back online, and if/when > the error vanish ( maybe after another repair command ) you can set the > min_size again to 2. > > you can try to simply out/down/?remove? the osd where it is on. > Hi Oliver

Re: [ceph-users] Scrub Errors

2016-05-04 Thread Oliver Dzombic
Hi Blade, you can try to set the min_size to 1, to get it back online, and if/when the error vanish ( maybe after another repair command ) you can set the min_size again to 2. you can try to simply out/down/?remove? the osd where it is on. -- Mit freundlichen Gruessen / Best regards Oliver Dz

Re: [ceph-users] Scrub Errors

2016-05-04 Thread Blade Doyle
When I issue the "ceph pg repair 1.32" command I *do* see it reported in the "ceph -w" output but I *do not* see any new messages about page 1.32 in the log of osd.6 - even if I turn debug messages way up. # ceph pg repair 1.32 instructing pg 1.32 on osd.6 to repair (ceph -w shows) 2016-05-04 11:

Re: [ceph-users] Scrub Errors

2016-05-03 Thread Oliver Dzombic
Hi Blade, if you dont see anything in the logs, then you should raise the debug level/frequency. You must at least see, that the repair command has been issued ( started ). Also i am wondering about the [6] from your output. That means, that there is only 1 copy of it ( on osd.6 ). What is yo

Re: [ceph-users] Scrub Errors

2016-05-03 Thread Blade Doyle
Hi Oliver, Thanks for your reply. The problem could have been caused by crashing/flapping OSD's. The cluster is stable now, but lots of pg problems remain. $ ceph health HEALTH_ERR 4 pgs degraded; 158 pgs inconsistent; 4 pgs stuck degraded; 1 pgs stuck inactive; 10 pgs stuck unclean; 4 pgs stuck

Re: [ceph-users] Scrub Errors

2016-04-30 Thread Oliver Dzombic
Hi, please check with ceph health which pg's cause trouble. Please try: ceph pg repair 4.97 And look if it can be resolved. If not, please paste the corresponding log. That repair can take some time... -- Mit freundlichen Gruessen / Best regards Oliver Dzombic IP-Interactive mailto:i...

[ceph-users] Scrub Errors

2016-04-30 Thread Blade Doyle
Hi Ceph-Users, Help with how to resolve these would be appreciated. 2016-04-30 09:25:58.399634 9b809350 0 log_channel(cluster) log [INF] : 4.97 deep-scrub starts 2016-04-30 09:26:00.041962 93009350 0 -- 192.168.2.52:6800/6640 >> 192.168.2.32:0/3983425916 pipe(0x27406000 sd=111 :6800 s=0 pgs=0 c

Re: [ceph-users] scrub errors continue with 0.80.4

2014-07-18 Thread Gregory Farnum
It's just because the PG hadn't been scrubbed since the error occurred; then you upgraded, it scrubbed, and the error was found. You can deep-scrub all your PGs to check them if you like, but as I've said elsewhere this issue -- while scary! -- shouldn't actually damage any of your user data, so ju

Re: [ceph-users] scrub errors continue with 0.80.4

2014-07-18 Thread Randy Smith
Greg, This error occurred AFTER the upgrade. I upgraded to 0.80.4 last night and this error cropped up this afternoon. I ran `ceph pg repair 3.7f` (after I copied the pgs) which returned the cluster to health. However, I'm concerned that this showed up again so soon after I upgraded to 0.80.4. Is

Re: [ceph-users] scrub errors continue with 0.80.4

2014-07-18 Thread Gregory Farnum
The config option change in the upgrade will prevent *new* scrub errors from occurring, but it won't resolve existing ones. You'll need to run a scrub repair to fix those up. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Fri, Jul 18, 2014 at 2:59 PM, Randy Smith wrote: >

[ceph-users] scrub errors continue with 0.80.4

2014-07-18 Thread Randy Smith
Greetings, I upgraded to 0.80.4 last night to resolve the inconsistent pg scrub errors I was seeing. Unfortunately, they are continuing. $ ceph health detail HEALTH_ERR 1 pgs inconsistent; 1 scrub errors pg 3.7f is active+clean+inconsistent, acting [0,4] And here's the relevant log entries. 201