Primary OSD crashes with below assert:
12.2.11/src/osd/ReplicatedBackend.cc:1445 assert(peer_missing.count(
fromshard))
==
here I have 2 OSDs with bluestore backend and 1 osd with filestore backend.
On Mon, Nov 25, 2019 at 3:34 PM M Ranga Swami Reddy
wrote:
> Hello - We are using the ceph 12.2.1
graded to nautilus (clean
> bluestore installation)
>
> - Original Message -
> > From: "M Ranga Swami Reddy"
> > To: "ceph-users" , "ceph-devel" <
> ceph-de...@vger.kernel.org>
> > Sent: Monday, 25 November, 2019 12:04:46
&
Hi!
I had similar errors in pools on SSD until I upgraded to nautilus (clean
bluestore installation)
- Original Message -
> From: "M Ranga Swami Reddy"
> To: "ceph-users" , "ceph-devel"
>
> Sent: Monday, 25 November, 2019 12:04:46
> Subjec
Hello - We are using the ceph 12.2.11 version (upgraded from Jewel 10.2.12
to 12.2.11). In this cluster, we are having mix of filestore and bluestore
OSD backends.
Recently we are seeing the scrub errors on rgw buckets.data pool every day,
after scrub operation performed by Ceph. If we run the PG r
Dear ml,
we are currently trying to wrap our heads around a HEALTH_ERR problem on
our Luminous 12.2.12 cluster (upgraded from Jewel a couple of weeks
ago). Before attempting a 'ceph pg repair' we would like to have a
better understanding of what has happened.
ceph -s reports:
cluster:
id:
On Fri, Mar 29, 2019 at 7:54 AM solarflow99 wrote:
>
> ok, I tried doing ceph osd out on each of the 4 OSDs 1 by 1. I got it out of
> backfill mode but still not sure if it'll fix anything. pg 10.2a still shows
> state active+clean+inconsistent. Peer 8 is now
> remapped+inconsistent+peering
ok, I tried doing ceph osd out on each of the 4 OSDs 1 by 1. I got it out
of backfill mode but still not sure if it'll fix anything. pg 10.2a still
shows state active+clean+inconsistent. Peer 8 is now
remapped+inconsistent+peering, and the other peer is
active+clean+inconsistent
On Wed, Mar 2
On Thu, Mar 28, 2019 at 8:33 AM solarflow99 wrote:
>
> yes, but nothing seems to happen. I don't understand why it lists OSDs 7 in
> the "recovery_state": when i'm only using 3 replicas and it seems to use
> 41,38,8
Well, osd 8s state is listed as
"active+undersized+degraded+remapped+wait_bac
yes, but nothing seems to happen. I don't understand why it lists OSDs 7
in the "recovery_state": when i'm only using 3 replicas and it seems to
use 41,38,8
# ceph health detail
HEALTH_ERR 1 pgs inconsistent; 47 scrub errors
pg 10.2a is active+clean+inconsistent, acting [41,38,8]
47 scrub errors
http://docs.ceph.com/docs/hammer/rados/troubleshooting/troubleshooting-pg/
Did you try repairing the pg?
On Tue, Mar 26, 2019 at 9:08 AM solarflow99 wrote:
>
> yes, I know its old. I intend to have it replaced but thats a few months
> away and was hoping to get past this. the other OSDs appe
yes, I know its old. I intend to have it replaced but thats a few months
away and was hoping to get past this. the other OSDs appear to be ok, I
see them up and in, why do you see something wrong?
On Mon, Mar 25, 2019 at 4:00 PM Brad Hubbard wrote:
> Hammer is no longer supported.
>
> What's t
Hammer is no longer supported.
What's the status of osds 7 and 17?
On Tue, Mar 26, 2019 at 8:56 AM solarflow99 wrote:
>
> hi, thanks. Its still using Hammer. Here's the output from the pg query,
> the last command you gave doesn't work at all but be too old.
>
>
> # ceph pg 10.2a query
> {
>
hi, thanks. Its still using Hammer. Here's the output from the pg query,
the last command you gave doesn't work at all but be too old.
# ceph pg 10.2a query
{
"state": "active+clean+inconsistent",
"snap_trimq": "[]",
"epoch": 23265,
"up": [
41,
38,
8
It would help to know what version you are running but, to begin with,
could you post the output of the following?
$ sudo ceph pg 10.2a query
$ sudo rados list-inconsistent-obj 10.2a --format=json-pretty
Also, have a read of
http://docs.ceph.com/docs/mimic/rados/troubleshooting/troubleshooting-pg
I noticed my cluster has scrub errors but the deep-scrub command doesn't
show any errors. Is there any way to know what it takes to fix it?
# ceph health detail
HEALTH_ERR 1 pgs inconsistent; 47 scrub errors
pg 10.2a is active+clean+inconsistent, acting [41,38,8]
47 scrub errors
# zgrep 10.2a
There is an osd_scrub_auto_repair setting which defaults to 'false'.
> On 23.10.2018, at 12:12, Dominque Roux wrote:
>
> Hi all,
>
> We lately faced several scrub errors.
> All of them were more or less easily fixed with the ceph pg repair X.Y
> command.
>
> We're using ceph version 12.2.7 an
Hi all,
We lately faced several scrub errors.
All of them were more or less easily fixed with the ceph pg repair X.Y
command.
We're using ceph version 12.2.7 and have SSD and HDD pools.
Is there a way to prevent our datastore from these kind of errors, or is
there a way to automate the fix (It w
Oliver Dzombic writes:
>
> Hi Blade,
>
> you can try to set the min_size to 1, to get it back online, and if/when
> the error vanish ( maybe after another repair command ) you can set the
> min_size again to 2.
>
> you can try to simply out/down/?remove? the osd where it is on.
>
Hi Oliver
Hi Blade,
you can try to set the min_size to 1, to get it back online, and if/when
the error vanish ( maybe after another repair command ) you can set the
min_size again to 2.
you can try to simply out/down/?remove? the osd where it is on.
--
Mit freundlichen Gruessen / Best regards
Oliver Dz
When I issue the "ceph pg repair 1.32" command I *do* see it reported in
the "ceph -w" output but I *do not* see any new messages about page 1.32 in
the log of osd.6 - even if I turn debug messages way up.
# ceph pg repair 1.32
instructing pg 1.32 on osd.6 to repair
(ceph -w shows)
2016-05-04 11:
Hi Blade,
if you dont see anything in the logs, then you should raise the debug
level/frequency.
You must at least see, that the repair command has been issued ( started ).
Also i am wondering about the [6] from your output.
That means, that there is only 1 copy of it ( on osd.6 ).
What is yo
Hi Oliver,
Thanks for your reply.
The problem could have been caused by crashing/flapping OSD's. The cluster
is stable now, but lots of pg problems remain.
$ ceph health
HEALTH_ERR 4 pgs degraded; 158 pgs inconsistent; 4 pgs stuck degraded; 1
pgs stuck inactive; 10 pgs stuck unclean; 4 pgs stuck
Hi,
please check with
ceph health
which pg's cause trouble.
Please try:
ceph pg repair 4.97
And look if it can be resolved.
If not, please paste the corresponding log.
That repair can take some time...
--
Mit freundlichen Gruessen / Best regards
Oliver Dzombic
IP-Interactive
mailto:i...
Hi Ceph-Users,
Help with how to resolve these would be appreciated.
2016-04-30 09:25:58.399634 9b809350 0 log_channel(cluster) log [INF] :
4.97 deep-scrub starts
2016-04-30 09:26:00.041962 93009350 0 -- 192.168.2.52:6800/6640 >>
192.168.2.32:0/3983425916 pipe(0x27406000 sd=111 :6800 s=0 pgs=0 c
It's just because the PG hadn't been scrubbed since the error occurred;
then you upgraded, it scrubbed, and the error was found. You can deep-scrub
all your PGs to check them if you like, but as I've said elsewhere this
issue -- while scary! -- shouldn't actually damage any of your user data,
so ju
Greg,
This error occurred AFTER the upgrade. I upgraded to 0.80.4 last night and
this error cropped up this afternoon. I ran `ceph pg repair 3.7f` (after I
copied the pgs) which returned the cluster to health. However, I'm
concerned that this showed up again so soon after I upgraded to 0.80.4.
Is
The config option change in the upgrade will prevent *new* scrub
errors from occurring, but it won't resolve existing ones. You'll need
to run a scrub repair to fix those up.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Fri, Jul 18, 2014 at 2:59 PM, Randy Smith wrote:
>
Greetings,
I upgraded to 0.80.4 last night to resolve the inconsistent pg scrub errors
I was seeing. Unfortunately, they are continuing.
$ ceph health detail
HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
pg 3.7f is active+clean+inconsistent, acting [0,4]
And here's the relevant log entries.
201
28 matches
Mail list logo