On Thu, Jan 9, 2020 at 5:48 AM Peter Eisch
wrote:
> Hi,
>
> This morning one of my three monitor hosts got booted from the Nautilus
> 14.2.4 cluster and it won’t regain. There haven’t been any changes, or
> events at this site at all. The conf file is the [unchanged] and the same
> as the other t
I'd suggest you open a tracker under the Bluestore component so
someone can take a look. I'd also suggest you include a log with
'debug_bluestore=20' added to the COT command line.
On Thu, Nov 7, 2019 at 6:56 PM Eugene de Beste wrote:
>
> Hi, does anyone have any feedback for me regarding this?
>
; ceph osd map rbd rbd_data.0c16b76b8b4567.0001426e
> osdmap e194356 pool 'rbd' (2) object
> 'rbd_data.0c16b76b8b4567.0001426e' -> pg 2.181de9d9 (2.1d9) -> up
> ([27,30,38], p27) acting ([30,25], p30)
>
> I also checked the logs of all OSDs
On Tue, Oct 29, 2019 at 9:09 PM Jérémy Gardais
wrote:
>
> Thus spake Brad Hubbard (bhubb...@redhat.com) on mardi 29 octobre 2019 à
> 08:20:31:
> > Yes, try and get the pgs healthy, then you can just re-provision the down
> > OSDs.
> >
> > Run a scrub on e
Yes, try and get the pgs healthy, then you can just re-provision the down OSDs.
Run a scrub on each of these pgs and then use the commands on the
following page to find out more information for each case.
https://docs.ceph.com/docs/luminous/rados/troubleshooting/troubleshooting-pg/
Focus on the
t; flags hashpspool stripe_width 0 application cephfs
This looked like something min_size 1 could cause, but I guess that's
not the cause here.
> so inconsistens is empty, which is weird, no ?
Try scrubbing the pg just before running the command.
>
> Thanks again!
>
> K
>
>
>
Does pool 6 have min_size = 1 set?
https://tracker.ceph.com/issues/24994#note-5 would possibly be helpful
here, depending on what the output of the following command looks
like.
# rados list-inconsistent-obj [pgid] --format=json-pretty
On Thu, Oct 10, 2019 at 8:16 PM Kenneth Waegeman
wrote:
>
>
Awesome! Sorry it took so long.
On Thu, Oct 10, 2019 at 12:44 AM Marc Roos wrote:
>
>
> Brad, many thanks!!! My cluster has finally HEALTH_OK af 1,5 year or so!
> :)
>
>
> -Original Message-
> Subject: Re: Ceph pg repair clone_missing?
>
> On Fri, Oct 4, 2019 at 6:09 PM Marc Roos
> wrote
On Fri, Oct 4, 2019 at 6:09 PM Marc Roos wrote:
>
> >
> >Try something like the following on each OSD that holds a copy of
> >rbd_data.1f114174b0dc51.0974 and see what output you get.
> >Note that you can drop the bluestore flag if they are not bluestore
> >osds and you will need
On Thu, Oct 3, 2019 at 6:46 PM Marc Roos wrote:
>
> >
> >>
> >> I was following the thread where you adviced on this pg repair
> >>
> >> I ran these rados 'list-inconsistent-obj'/'rados
> >> list-inconsistent-snapset' and have output on the snapset. I tried
> to
> >> extrapolate your commen
On Wed, Oct 2, 2019 at 9:00 PM Marc Roos wrote:
>
>
>
> Hi Brad,
>
> I was following the thread where you adviced on this pg repair
>
> I ran these rados 'list-inconsistent-obj'/'rados
> list-inconsistent-snapset' and have output on the snapset. I tried to
> extrapolate your comment on the data/om
at that time. Any ideas?
>
> On Tue, Oct 1, 2019 at 8:03 AM Sasha Litvak
> wrote:
>>
>> It was hardware indeed. Dell server reported a disk being reset with power
>> on. Checking the usual suspects i.e. controller firmware, controller event
>> log (if I can get one
On Wed, Oct 2, 2019 at 1:15 AM Mattia Belluco wrote:
>
> Hi Jake,
>
> I am curious to see if your problem is similar to ours (despite the fact
> we are still on Luminous).
>
> Could you post the output of:
>
> rados list-inconsistent-obj
>
> and
>
> rados list-inconsistent-snapset
Make sure you
On Tue, Oct 1, 2019 at 10:43 PM Del Monaco, Andrea <
andrea.delmon...@atos.net> wrote:
> Hi list,
>
> After the nodes ran OOM and after reboot, we are not able to restart the
> ceph-osd@x services anymore. (Details about the setup at the end).
>
> I am trying to do this manually, so we can see the
Removed ceph-de...@vger.kernel.org and added d...@ceph.io
On Tue, Oct 1, 2019 at 4:26 PM Alex Litvak wrote:
>
> Hellow everyone,
>
> Can you shed the line on the cause of the crash? Could actually client
> request trigger it?
>
> Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 2019-09-30 22:5
On Tue, Sep 24, 2019 at 10:51 PM M Ranga Swami Reddy
wrote:
>
> Interestingly - "rados list-inconsistent-obj ${PG} --format=json" not
> showing any objects inconsistent-obj.
> And also "rados list-missing-obj ${PG} --format=json" also not showing any
> missing or unfound objects.
Complete a sc
On Thu, Sep 12, 2019 at 1:52 AM Benjamin Tayehanpour
wrote:
>
> Greetings!
>
> I had an OSD down, so I ran ceph osd status and got this:
>
> [root@ceph1 ~]# ceph osd status
> Error EINVAL: Traceback (most recent call last):
> File "/usr/lib64/ceph/mgr/status/module.py", line 313, in handle_comma
On Wed, Sep 4, 2019 at 9:42 PM Andras Pataki
wrote:
>
> Dear ceph users,
>
> After upgrading our ceph-fuse clients to 14.2.2, we've been seeing sporadic
> segfaults with not super revealing stack traces:
>
> in thread 7fff5a7fc700 thread_name:ceph-fuse
>
> ceph version 14.2.2 (4f8fa0a0024755aae7
https://tracker.ceph.com/issues/38724
On Fri, Aug 23, 2019 at 10:18 PM Paul Emmerich wrote:
>
> I've seen that before (but never on Nautilus), there's already an
> issue at tracker.ceph.com but I don't recall the id or title.
>
>
> Paul
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph c
https://tracker.ceph.com/issues/41255 is probably reporting the same issue.
On Thu, Aug 22, 2019 at 6:31 PM Lars Täuber wrote:
>
> Hi there!
>
> We also experience this behaviour of our cluster while it is moving pgs.
>
> # ceph health detail
> HEALTH_ERR 1 MDSs report slow metadata IOs; Reduced
On Thu, Aug 15, 2019 at 2:09 AM Troy Ablan wrote:
>
> Paul,
>
> Thanks for the reply. All of these seemed to fail except for pulling
> the osdmap from the live cluster.
>
> -Troy
>
> -[~:#]- ceph-objectstore-tool --op get-osdmap --data-path
> /var/lib/ceph/osd/ceph-45/ --file osdmap45
> terminate
On Thu, Aug 15, 2019 at 2:09 AM Troy Ablan wrote:
>
> Paul,
>
> Thanks for the reply. All of these seemed to fail except for pulling
> the osdmap from the live cluster.
>
> -Troy
>
> -[~:#]- ceph-objectstore-tool --op get-osdmap --data-path
> /var/lib/ceph/osd/ceph-45/ --file osdmap45
> terminate
Could you create a tracker for this?
Also, if you can reproduce this could you gather a log with
debug_osd=20 ? That should show us the superblock it was trying to
decode as well as additional details.
On Mon, Aug 12, 2019 at 6:29 AM huxia...@horebdata.cn
wrote:
>
> Dear folks,
>
> I had an OSD
-63> 2019-08-07 00:51:52.861 7fe987e49700 1 heartbeat_map
clear_timeout 'OSD::osd_op_tp thread 0x7fe987e49700' had suicide timed
out after 150
You hit a suicide timeout, that's fatal. On line 80 the process kills
the thread based on the assumption it's hung.
src/common/HeartbeatMap.cc:
66 boo
I'd suggest creating a tracker similar to
http://tracker.ceph.com/issues/40554 which was created for the issue
in the thread you mentioned.
On Wed, Jul 3, 2019 at 12:29 AM Vandeir Eduardo
wrote:
>
> Hi,
>
> on client machines, when I use the command rbd, for example, rbd ls
> poolname, this messa
urns an error) so the client
> application is responsible for any locking needed.
> -Greg
>
> On Tue, Jul 2, 2019 at 3:49 AM Brad Hubbard wrote:
> >
> > Yes, this should be possible using an object class which is also a
> > RADOS client (via the RADOS API). You'l
>>
>>> Thank you for your response , and we will check this video as well.
>>> Our requirement is while writing an object into the cluster , if we can
>>> provide number of copies to be made , the network consumption between
>>> client and cluster will be only for on
On Thu, Jun 27, 2019 at 8:58 PM nokia ceph wrote:
>
> Hi Team,
>
> We have a requirement to create multiple copies of an object and currently we
> are handling it in client side to write as separate objects and this causes
> huge network traffic between client and cluster.
> Is there possibility
n't see anything relating to the clearing in mon, mgr, or osd logs.
> >
> > So, not entirely sure what fixed it, but it is resolved on its own.
> >
> > Thanks,
> >
> > Reed
> >
> > On Apr 30, 2019, at 8:01 PM, Brad Hubbard wrote:
> >
> >
On Wed, May 1, 2019 at 10:54 AM Brad Hubbard wrote:
>
> Which size is correct?
Sorry, accidental discharge =D
If the object info size is *incorrect* try forcing a write to the OI
with something like the following.
1. rados -p [name_of_pool_17] setomapval 10008536718.
tempora
Which size is correct?
On Tue, Apr 30, 2019 at 1:06 AM Reed Dier wrote:
>
> Hi list,
>
> Woke up this morning to two PG's reporting scrub errors, in a way that I
> haven't seen before.
>
> $ ceph versions
> {
> "mon": {
> "ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988
>
> Best,
> Can Zhang
>
>
> On Fri, Apr 19, 2019 at 6:28 PM Brad Hubbard wrote:
> >
> > OK. So this works for me with master commit
> > bdaac2d619d603f53a16c07f9d7bd47751137c4c on Centos 7.5.1804.
> >
> > I cloned the repo and ran './install-deps
ct somehow you have
mis-matched libraries and, if that's the case, it's probably not worth
pursuing. If you can give me specific steps so I can reproduce this
from a freshly cloned tree I'd be happy to look further into it.
Good luck.
On Thu, Apr 18, 2019 at 7:00 PM Brad Hubbard wr
4a89e000)
> ```
>
> Notice the "U" and "V" from nm results.
>
>
>
>
> Best,
> Can Zhang
>
> On Thu, Apr 18, 2019 at 9:36 AM Brad Hubbard wrote:
> >
> > Does it define _ZTIN13PriorityCache8PriCacheE ? If it does, and all is
>
17 11:15 libceph-common.so ->
> libceph-common.so.0
> -rwxr-xr-x. 1 root root 211853400 Apr 17 11:15 libceph-common.so.0
>
>
>
>
> Best,
> Can Zhang
>
> On Thu, Apr 18, 2019 at 7:00 AM Brad Hubbard wrote:
> >
> > On Wed, Apr 17, 2019 at 1:37 PM Can Zhang
On Wed, Apr 17, 2019 at 1:37 PM Can Zhang wrote:
>
> Thanks for your suggestions.
>
> I tried to build libfio_ceph_objectstore.so, but it fails to load:
>
> ```
> $ LD_LIBRARY_PATH=./lib ./bin/fio --enghelp=libfio_ceph_objectstore.so
>
> fio: engine libfio_ceph_objectstore.so not loadable
> IO eng
> I'm still puzzled why it doesn't show any change when I run this no matter
> what I set it to:
>
> # ceph -n osd.1 --show-config | grep osd_recovery_max_active
> osd_recovery_max_active = 3
>
> in fact it doesn't matter if I use an OSD number that doesn't
On Tue, Apr 16, 2019 at 6:03 PM Paul Emmerich wrote:
>
> This works, it just says that it *might* require a restart, but this
> particular option takes effect without a restart.
We've already looked at changing the wording once to make it more palatable.
http://tracker.ceph.com/issues/18424
>
>
On Tue, Apr 16, 2019 at 7:38 AM solarflow99 wrote:
>
> Then why doesn't this work?
>
> # ceph tell 'osd.*' injectargs '--osd-recovery-max-active 4'
> osd.0: osd_recovery_max_active = '4' (not observed, change may require
> restart)
> osd.1: osd_recovery_max_active = '4' (not observed, change may
If you want to do containers at the same time, or transition some/all
to containers at some point in future maybe something based on
kubevirt [1] would be more futureproof?
[1] http://kubevirt.io/
CNV is an example,
https://www.redhat.com/en/resources/container-native-virtualization
On Sat, Apr
emapped+inconsistent+peering, and the other peer is active+clean+inconsistent
Per the document I linked previously if a pg remains remapped you
likely have a problem with your configuration. Take a good look at
your crushmap, pg distribution, pool configuration, etc.
>
>
> On Wed, Mar 27, 2019 a
: [
> {
> "osd": "7",
> "status": "not queried"
> },
> {
> "osd": "8",
> "status": "already probed"
> },
&g
https://bugzilla.redhat.com/show_bug.cgi?id=1662496
On Wed, Mar 27, 2019 at 5:00 AM Andrew J. Hutton
wrote:
>
> More or less followed the install instructions with modifications as
> needed; but I'm suspecting that either a dependency was missed in the
> F29 package or something else is up. I don
ther OSDs appear to be ok, I see
> them up and in, why do you see something wrong?
>
> On Mon, Mar 25, 2019 at 4:00 PM Brad Hubbard wrote:
>>
>> Hammer is no longer supported.
>>
>> What's the status of osds 7 and 17?
>>
>> On Tue, Mar 26, 2019 at 8
ot;: "21395'11840466",
> "ondisk_log_start": "21395'11840466",
> "created": 8200,
> "last_epoch_clean": 20840,
> "parent": "0.0",
>
It would help to know what version you are running but, to begin with,
could you post the output of the following?
$ sudo ceph pg 10.2a query
$ sudo rados list-inconsistent-obj 10.2a --format=json-pretty
Also, have a read of
http://docs.ceph.com/docs/mimic/rados/troubleshooting/troubleshooting-pg
Do a "ps auwwx" to see how a running monitor was started and use the
equivalent command to try to start the MON that won't start. "ceph-mon
--help" will show you what you need. Most important is to get the ID
portion right and to add "-d" to get it to run in teh foreground and
log to stdout. HTH an
"2019-03-21 16:51:56.862447",
> "age": 376.527241,
> "duration": 1.331278,
>
> Kind regards,
> Glen Baars
>
> -Original Message-
> From: Brad Hubbard
> Sent: Thursday, 21 March 2019 1:43 PM
> To: Glen Baar
Actually, the lag is between "sub_op_committed" and "commit_sent". Is
there any pattern to these slow requests? Do they involve the same
osd, or set of osds?
On Thu, Mar 21, 2019 at 3:37 PM Brad Hubbard wrote:
>
> On Thu, Mar 21, 2019 at 3:20 PM Glen Baars
> wrote:
>
> Does anyone know what that section is waiting for?
Hi Glen,
These are documented, to some extent, here.
http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/
It looks like it may be taking a long time to communicate the commit
message back to the client? Are these sl
On Thu, Mar 21, 2019 at 12:11 AM Glen Baars wrote:
>
> Hello Ceph Users,
>
>
>
> Does anyone know what the flag point ‘Started’ is? Is that ceph osd daemon
> waiting on the disk subsystem?
This is set by "mark_started()" and is roughly set when the pg starts
processing the op. Might want to capt
On Tue, Mar 19, 2019 at 7:54 PM Zhenshi Zhou wrote:
>
> Hi,
>
> I mount cephfs on my client servers. Some of the servers mount without any
> error whereas others don't.
>
> The error:
> # ceph-fuse -n client.kvm -m ceph.somedomain.com:6789 /mnt/kvm -r /kvm -d
> 2019-03-19 17:03:29.136 7f8c80eddc80
On Fri, Mar 8, 2019 at 4:46 AM Samuel Taylor Liston wrote:
>
> Hello All,
> I have recently had 32 large map objects appear in my default.rgw.log
> pool. Running luminous 12.2.8.
>
> Not sure what to think about these.I’ve done a lot of reading
> about how when these normall
you could try reading the data from this object and write it again
using rados get then rados put.
On Fri, Mar 8, 2019 at 3:32 AM Herbert Alexander Faleiros
wrote:
>
> On Thu, Mar 07, 2019 at 01:37:55PM -0300, Herbert Alexander Faleiros wrote:
> > Hi,
> >
> > # ceph health detail
> > HEALTH_ERR 3
+Jos Collin
On Thu, Mar 7, 2019 at 9:41 AM Milanov, Radoslav Nikiforov
wrote:
> Can someone elaborate on
>
>
>
> From http://tracker.ceph.com/issues/38122
>
>
>
> Which exactly package is missing?
>
> And why is this happening ? In Mimic all dependencies are resolved by yum?
>
> - Rado
>
>
> __
A single OSD should be expendable and you should be able to just "zap"
it and recreate it. Was this not true in your case?
On Wed, Feb 13, 2019 at 1:27 AM Ruben Rodriguez wrote:
>
>
>
> On 2/9/19 5:40 PM, Brad Hubbard wrote:
> > On Sun, Feb 10, 2019 at 1:
rong/misconfigured with the new switch: we
> would try to replicate the problem, possibly without a ceph deployment ...
>
> Thanks again for your help !
>
> Cheers, Massimo
>
> On Sun, Feb 10, 2019 at 12:07 AM Brad Hubbard wrote:
>>
>> The log ends at
>>
>>
st only arrives at 07.35 (and it
> promptly replies):
>
> 2019-02-09 07:35:14.627462 7f99972cc700 1 -- 192.168.222.204:6804/4159520
> <== osd.5 192.168.222.202:6816/157436 2527
> osd_repop(client.171725953.0:404377591 8.9b e1205833/1205735) v2
> 1050+0+123635 (1225076790 0
On Sun, Feb 10, 2019 at 1:56 AM Ruben Rodriguez wrote:
>
> Hi there,
>
> Running 12.2.11-1xenial on a machine with 6 SSD OSD with bluestore.
>
> Today we had two disks fail out of the controller, and after a reboot
> they both seemed to come back fine but ceph-osd was only able to start
> in one o
Try capturing another log with debug_ms turned up. 1 or 5 should be Ok
to start with.
On Fri, Feb 8, 2019 at 8:37 PM Massimo Sgaravatto
wrote:
>
> Our Luminous ceph cluster have been worked without problems for a while, but
> in the last days we have been suffering from continuous slow requests.
Let's try to restrict discussion to the original thread
"backfill_toofull while OSDs are not full" and get a tracker opened up
for this issue.
On Sat, Feb 2, 2019 at 11:52 AM Fyodor Ustinov wrote:
>
> Hi!
>
> Right now, after adding OSD:
>
> # ceph health detail
> HEALTH_ERR 74197563/199392333 ob
http://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html
should still be current enough and makes good reading on the subject.
On Mon, Jan 21, 2019 at 8:46 PM Stijn De Weirdt wrote:
>
> hi marc,
>
> > - how to prevent the D state process to accumulate so much load?
> you can't. in lin
On Fri, Jan 11, 2019 at 8:58 PM Rom Freiman wrote:
>
> Same kernel :)
Not exactly the point I had in mind, but sure ;)
>
>
> On Fri, Jan 11, 2019, 12:49 Brad Hubbard wrote:
>>
>> Haha, in the email thread he says CentOS but the bug is opened against RHEL
>> :P
Haha, in the email thread he says CentOS but the bug is opened against RHEL :P
Is it worth recommending a fix in skb_can_coalesce() upstream so other
modules don't hit this?
On Fri, Jan 11, 2019 at 7:39 PM Ilya Dryomov wrote:
>
> On Fri, Jan 11, 2019 at 1:38 AM Brad Hubbard wrote:
the same setup, you might be hitting the same
> bug.
Thanks for that Jason, I wasn't aware of that bug. I'm interested to
see the details.
>
> On Thu, Jan 10, 2019 at 6:46 PM Brad Hubbard wrote:
> >
> > On Fri, Jan 11, 2019 at 12:20 AM Rom Freiman wrote:
>
On Fri, Jan 11, 2019 at 12:20 AM Rom Freiman wrote:
>
> Hey,
> After upgrading to centos7.6, I started encountering the following kernel
> panic
>
> [17845.147263] XFS (rbd4): Unmounting Filesystem
> [17846.860221] rbd: rbd4: capacity 3221225472 features 0x1
> [17847.109887] XFS (rbd4): Mounting
Nautilus will make this easier.
https://github.com/ceph/ceph/pull/18096
On Thu, Jan 3, 2019 at 5:22 AM Bryan Stillwell wrote:
>
> Recently on one of our bigger clusters (~1,900 OSDs) running Luminous
> (12.2.8), we had a problem where OSDs would frequently get restarted while
> deep-scrubbing.
Can you provide the complete OOM message from the dmesg log?
On Sat, Dec 22, 2018 at 7:53 AM Pardhiv Karri wrote:
>
>
> Thank You for the quick response Dyweni!
>
> We are using FileStore as this cluster is upgraded from
> Hammer-->Jewel-->Luminous 12.2.8. 16x2TB HDD per node for all nodes. R730
On Tue, Dec 18, 2018 at 10:23 AM Mike O'Connor wrote:
>
> Hi All
>
> I have a ceph cluster which has been working with out issues for about 2
> years now, it was upgrade about 6 month ago to 10.2.11
>
> root@blade3:/var/lib/ceph/mon# ceph status
> 2018-12-18 10:42:39.242217 7ff770471700 0 -- 10.1
https://ceph.com/wp-content/uploads/2016/08/weil-crush-sc06.pdf
On Thu, Dec 6, 2018 at 8:11 PM Leon Robinson wrote:
>
> The most important thing to remember about CRUSH is that the H stands for
> hashing.
>
> If you hash the same object you're going to get the same result.
>
> e.g. cat /etc/fstab
}
> ]
> }
>
> Clearly, on osd.67, the “attrs” array is empty. The question is,
> how do I fix this?
>
> Many thanks in advance,
>
> -kc
>
> K.C. Wong
> kcw...@verseon.com
> M: +1 (408) 769-8235
>
> -
>
t; K.C. Wong
>> kcw...@verseon.com
>> M: +1 (408) 769-8235
>>
>> -
>> Confidentiality Notice:
>> This message contains confidential information. If you are not the
>> intended recipient and received this mes
What does "rados list-inconsistent-obj " say?
Note that you may have to do a deep scrub to populate the output.
On Mon, Nov 12, 2018 at 5:10 AM K.C. Wong wrote:
>
> Hi folks,
>
> I would appreciate any pointer as to how I can resolve a
> PG stuck in “active+clean+inconsistent” state. This has
> r
What do you get if you send "help" (without quotes) to m
ajord...@vger.kernel.org ?
On Sun, Nov 11, 2018 at 10:15 AM Cranage, Steve <
scran...@deepspacestorage.com> wrote:
> Can anyone tell me the secret? A colleague tried and failed many times so
> I tried and got this:
>
>
>
>
>
> Steve Cranage
On Tue, Sep 25, 2018 at 11:31 PM Josh Haft wrote:
>
> Hi cephers,
>
> I have a cluster of 7 storage nodes with 12 drives each and the OSD
> processes are regularly crashing. All 84 have crashed at least once in
> the past two days. Cluster is Luminous 12.2.2 on CentOS 7.4.1708,
> kernel version 3.
On Tue, Sep 25, 2018 at 7:50 PM Sergey Malinin wrote:
>
> # rados list-inconsistent-obj 1.92
> {"epoch":519,"inconsistents":[]}
It's likely the epoch has changed since the last scrub and you'll need
to run another scrub to repopulate this data.
>
&
Are you using filestore or bluestore on the OSDs? If filestore what is
the underlying filesystem?
You could try setting debug_osd and debug_filestore to 20 and see if
that gives some more info?
On Wed, Sep 19, 2018 at 12:36 PM fatkun chan wrote:
>
>
> ceph version 12.2.5 (cad919881333ac9227417158
On Tue, Aug 21, 2018 at 2:37 AM, Satish Patel wrote:
> Folks,
>
> Today i found ceph -s is really slow and just hanging for minute or 2
> minute to give me output also same with "ceph osd tree" output,
> command just hanging long time to give me output..
>
> This is what i am seeing output, one OS
Jewel is almost EOL.
It looks similar to several related issues, one of which is
http://tracker.ceph.com/issues/21826
On Mon, Aug 13, 2018 at 9:19 PM, Alexandru Cucu wrote:
> Hi,
>
> Already tried zapping the disk. Unfortunaltely the same segfaults keep
> me from adding the OSD back to the clust
#x27;34485 mlcod 13572'34485
> active+clean] publish_stats_to_osd 13593:2966970
> 2018-08-08 10:45:33.022697 7effb95a4700 1 -- 10.12.125.1:6803/1319081 <==
> osd.13 10.12.125.3:0/735946 22 osd_ping(ping e13589 stamp 2018-08-08
> 10:45:33.021217) v4 2004+0+0 (3639738084
Do you see "internal heartbeat not healthy" messages in the log of the
osd that suicides?
On Wed, Aug 8, 2018 at 5:45 PM, Brad Hubbard wrote:
> What is the load like on the osd host at the time and what does the
> disk utilization look like?
>
> Also, what does the transact
ealthy
> 'OSD::peering_tp thread 0x7fe03f52f700' had suicide timed out after 150
> 0> 2018-08-08 09:14:00.970742 7fe03f52f700 -1 *** Caught signal
> (Aborted) **
>
>
> Could it be that the suiciding OSDs are rejecting the ping somehow? I'm
> quite confused
Try to work out why the other osds are saying this one is down. Is it
because this osd is too busy to respond or something else.
debug_ms = 1 will show you some message debugging which may help.
On Tue, Aug 7, 2018 at 10:34 PM, Josef Zelenka
wrote:
> To follow up, I did some further digging with
Looks like https://tracker.ceph.com/issues/21826 which is a dup of
https://tracker.ceph.com/issues/20557
On Wed, Aug 8, 2018 at 1:49 AM, Thomas White wrote:
> Hi all,
>
> We have recently begun switching over to Bluestore on our Ceph cluster,
> currently on 12.2.7. We first began encountering se
If you don't already know why, you should investigate why your cluster
could not recover after the loss of a single osd.
Your solution seems valid given your description.
On Thu, Aug 2, 2018 at 12:15 PM, J David wrote:
> On Wed, Aug 1, 2018 at 9:53 PM, Brad Hubbard wrote:
>&g
What is the status of the cluster with this osd down and out?
On Thu, Aug 2, 2018 at 5:42 AM, J David wrote:
> Hello all,
>
> On Luminous 12.2.7, during the course of recovering from a failed OSD,
> one of the other OSDs started repeatedly crashing every few seconds
> with an assertion failure:
>
On Wed, Aug 1, 2018 at 10:38 PM, Marc Roos wrote:
>
>
> Today we pulled the wrong disk from a ceph node. And that made the whole
> node go down/be unresponsive. Even to a simple ping. I cannot find to
> much about this in the log files. But I expect that the
> /usr/bin/ceph-osd process caused a ke
> "swift_versioning": "false",
> "swift_ver_location": "",
> "index_type": 0,
> "mdsearch_config": [],
> "reshard_status": 0,
> "new_b
Search the cluster log for 'Large omap object found' for more details.
On Wed, Aug 1, 2018 at 3:50 AM, Brent Kennedy wrote:
> Upgraded from 12.2.5 to 12.2.6, got a “1 large omap objects” warning
> message, then upgraded to 12.2.7 and the message went away. I just added
> four OSDs to balance out
Ceph doesn't shut down systems as in kill or reboot the box if that's
what you're saying?
On Mon, Jul 23, 2018 at 5:04 PM, Nicolas Huillard wrote:
> Le lundi 23 juillet 2018 à 11:07 +0700, Konstantin Shalygin a écrit :
>> > I even have no fancy kernel or device, just real standard Debian.
>> > Th
I've updated the tracker.
On Thu, Jul 19, 2018 at 7:51 PM, Robert Sander
wrote:
> On 19.07.2018 11:15, Ronny Aasen wrote:
>
>> Did you upgrade from 12.2.5 or 12.2.6 ?
>
> Yes.
>
>> sounds like you hit the reason for the 12.2.7 release
>>
>> read : https://ceph.com/releases/12-2-7-luminous-release
Search the cluster log for 'Large omap object found' for more details.
On Fri, Jul 20, 2018 at 5:13 AM, Brent Kennedy wrote:
> I just upgraded our cluster to 12.2.6 and now I see this warning about 1
> large omap object. I looked and it seems this warning was just added in
> 12.2.6. I found a f
On Thu, Jul 19, 2018 at 12:47 PM, Troy Ablan wrote:
>
>
> On 07/18/2018 06:37 PM, Brad Hubbard wrote:
>> On Thu, Jul 19, 2018 at 2:48 AM, Troy Ablan wrote:
>>>
>>>
>>> On 07/17/2018 11:14 PM, Brad Hubbard wrote:
>>>>
>>>> On Wed
On Thu, Jul 19, 2018 at 2:48 AM, Troy Ablan wrote:
>
>
> On 07/17/2018 11:14 PM, Brad Hubbard wrote:
>>
>> On Wed, Jul 18, 2018 at 2:57 AM, Troy Ablan wrote:
>>>
>>> I was on 12.2.5 for a couple weeks and started randomly seeing
>>> corruption, m
On Wed, Jul 18, 2018 at 2:57 AM, Troy Ablan wrote:
> I was on 12.2.5 for a couple weeks and started randomly seeing
> corruption, moved to 12.2.6 via yum update on Sunday, and all hell broke
> loose. I panicked and moved to Mimic, and when that didn't solve the
> problem, only then did I start to
Your issue is different since not only do the omap digests of all
replicas not match the omap digest from the auth object info but they
are all different to each other.
What is min_size of pool 67 and what can you tell us about the events
leading up to this?
On Mon, Jul 16, 2018 at 7:06 PM, Matth
ernel
exhibiting the problem.
>
> kind regards
>
> Ben
>
>> Brad Hubbard hat am 5. Juli 2018 um 01:16 geschrieben:
>>
>>
>> On Wed, Jul 4, 2018 at 6:26 PM, Benjamin Naber
>> wrote:
>> > Hi @all,
>> >
>> > im currently in testin
On Wed, Jul 4, 2018 at 6:26 PM, Benjamin Naber wrote:
> Hi @all,
>
> im currently in testing for setup an production environment based on the
> following OSD Nodes:
>
> CEPH Version: luminous 12.2.5
>
> 5x OSD Nodes with following specs:
>
> - 8 Core Intel Xeon 2,0 GHZ
>
> - 96GB Ram
>
> - 10x 1,
you
provide from the time leading up to when the issue was first seen?
>
> Cheers
>
> Andrei
> - Original Message -
>> From: "Brad Hubbard"
>> To: "Andrei Mikhailovsky"
>> Cc: "ceph-users"
>> Sent: Thursday, 28 June, 201
uot;key" : "",
>"oid" : ".dir.default.80018061.2",
>"namespace" : "",
>"snapid" : -2,
>"max" : 0
> },
> "truncate_size" : 0,
> &qu
1 - 100 of 414 matches
Mail list logo