Re: [ceph-users] Nautilus 14.2.1 / 14.2.2 crash

2019-07-19 Thread Alex Litvak
I was planning to upgrade 14.2.1 to 14.2.2 next week. Since there are few reports of crashes, does any one knows if upgrade somehow triggers the issue? If not, that what is? Since this has been reported before the upgrade by some, just wondering if upgrade to 14.2.2 makes the problem worse.

Re: [ceph-users] Nautilus 14.2.1 / 14.2.2 crash

2019-07-19 Thread Nathan Fish
Good to know. I tried reset-failed and restart several times, it didn't work on any of them. I also rebooted one of the hosts, didn't help. Thankfully it seems they failed far enough apart that our nearly-empty cluster rebuilt in time. But it's rather worrying. On Fri, Jul 19, 2019 at 10:09 PM

Re: [ceph-users] Nautilus 14.2.1 / 14.2.2 crash

2019-07-19 Thread Nigel Williams
On Sat, 20 Jul 2019 at 04:28, Nathan Fish wrote: > On further investigation, it seems to be this bug: > http://tracker.ceph.com/issues/38724 We just upgraded to 14.2.2, and had a dozen OSDs at 14.2.2 go down this bug, recovered with: systemctl reset-failed ceph-osd@160 systemctl start

Re: [ceph-users] Ceph OSD daemon possibly causes network card issues

2019-07-19 Thread Konstantin Shalygin
On 7/19/19 5:59 PM, Geoffrey Rhodes wrote: Holding thumbs this helps however I still don't understand why the issue only occurs on ceph-osd nodes. ceph-mon and ceph-mds nodes and even a cech client with the same adapters do not have these issues. Because osd hosts actually do data storage

Re: [ceph-users] Multiple OSD crashes

2019-07-19 Thread Alex Litvak
The issue should have been resolved by backport https://tracker.ceph.com/issues/40424 in nautilus, was it merged into 14.2.2 ? Also do you think it is safe to upgrade from 14.2.1 to 14.2.2 ? On 7/19/2019 1:05 PM, Paul Emmerich wrote: I've also encountered a crash just like that after

Re: [ceph-users] Future of Filestore?

2019-07-19 Thread Stuart Longland
On 19/7/19 8:43 pm, Marc Roos wrote: > > Maybe a bit of topic, just curious what speeds did you get previously? > Depending on how you test your native drive of 5400rpm, the performance > could be similar. 4k random read of my 7200rpm/5400 rpm results in > ~60iops at 260kB/s. Well, to be

Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-19 Thread Mike Christie
On 07/19/2019 02:42 AM, Marc Schöchlin wrote: > Hello Jason, > > Am 18.07.19 um 20:10 schrieb Jason Dillaman: >> On Thu, Jul 18, 2019 at 1:47 PM Marc Schöchlin wrote: >>> Hello cephers, >>> >>> rbd-nbd crashes in a reproducible way here. >> I don't see a crash report in the log below. Is it

Re: [ceph-users] Nautilus 14.2.1 / 14.2.2 crash

2019-07-19 Thread Nathan Fish
On further investigation, it seems to be this bug: http://tracker.ceph.com/issues/38724 On Fri, Jul 19, 2019 at 1:38 PM Nathan Fish wrote: > > I came in this morning and started to upgrade to 14.2.2, only to > notice that 3 OSDs had crashed overnight - exactly 1 on each of 3 > hosts. Apparently

Re: [ceph-users] Multiple OSD crashes

2019-07-19 Thread Paul Emmerich
I've also encountered a crash just like that after upgrading to 14.2.2 Looks like this issue: http://tracker.ceph.com/issues/37282 -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89

Re: [ceph-users] Future of Filestore?

2019-07-19 Thread Janne Johansson
Den fre 19 juli 2019 kl 12:43 skrev Marc Roos : > > Maybe a bit of topic, just curious what speeds did you get previously? > Depending on how you test your native drive of 5400rpm, the performance > could be similar. 4k random read of my 7200rpm/5400 rpm results in > ~60iops at 260kB/s. > I also

[ceph-users] Nautilus 14.2.1 / 14.2.2 crash

2019-07-19 Thread Nathan Fish
I came in this morning and started to upgrade to 14.2.2, only to notice that 3 OSDs had crashed overnight - exactly 1 on each of 3 hosts. Apparently there was no data loss, which implies they crashed at different times, far enough part to rebuild? Still digging through logs to find exactly when

Re: [ceph-users] Nautilus 14.2.2 release announcement

2019-07-19 Thread Sage Weil
On Fri, 19 Jul 2019, Alex Litvak wrote: > Dear Ceph developers, > > Please forgive me if this post offends anyone, but it would be nice if this > and all other releases would be announced before or shortly after they hit the > repos. Yep, my fault. Abhishek normally does this but he's out on

Re: [ceph-users] Investigating Config Error, 300x reduction in IOPs performance on RGW layer

2019-07-19 Thread Ravi Patel
Thank you again for reaching out. Based on your feedback, we decided to try a few more benchmarks. We were originally doing single node testing using some internal applications and this tool: - s3bench https://github.com/igneous-systems/s3bench to generate our results. Looks like the poor

Re: [ceph-users] [Nfs-ganesha-devel] 2.7.3 with CEPH_FSAL Crashing

2019-07-19 Thread David C
Thanks, Jeff. I'll give 14.2.2 a go when it's released. On Wed, 17 Jul 2019, 22:29 Jeff Layton, wrote: > Ahh, I just noticed you were running nautilus on the client side. This > patch went into v14.2.2, so once you update to that you should be good > to go. > > -- Jeff > > On Wed, 2019-07-17 at

Re: [ceph-users] Legacy BlueStore stats reporting?

2019-07-19 Thread Stig Telfer
> On 19 Jul 2019, at 15:25, Sage Weil wrote: > > On Fri, 19 Jul 2019, Stig Telfer wrote: >>> On 19 Jul 2019, at 10:01, Konstantin Shalygin wrote: Using Ceph-Ansible stable-4.0 I did a rolling update from latest Mimic to Nautilus 14.2.2 on a cluster yesterday, and the update ran to

Re: [ceph-users] Need to replace OSD. How do I find physical disk

2019-07-19 Thread Tarek Zegar
On the host with the osd run:  ceph-volume lvm list From: "☣Adam" To: ceph-users@lists.ceph.com Date: 07/18/2019 03:25 PM Subject:[EXTERNAL] Re: [ceph-users] Need to replace OSD. How do I find physical disk Sent by:"ceph-users" The block device can

Re: [ceph-users] Legacy BlueStore stats reporting?

2019-07-19 Thread Sage Weil
On Fri, 19 Jul 2019, Stig Telfer wrote: > > On 19 Jul 2019, at 10:01, Konstantin Shalygin wrote: > >> Using Ceph-Ansible stable-4.0 I did a rolling update from latest Mimic to > >> Nautilus 14.2.2 on a cluster yesterday, and the update ran to completion > >> successfully. > >> > >> However, in

Re: [ceph-users] cephfs snapshot scripting questions

2019-07-19 Thread Frank Schilder
This is a question I'm interested as well. Right now, I'm using cephfs-snap from the storage tools project and am quite happy with that. I made a small modification, but will probably not change. Its a simple and robust tool. About where to take snapshots. There seems to be a bug in cephfs

Re: [ceph-users] Legacy BlueStore stats reporting?

2019-07-19 Thread Paul Emmerich
On Fri, Jul 19, 2019 at 1:47 PM Stig Telfer wrote: > > On 19 Jul 2019, at 10:01, Konstantin Shalygin wrote: > > Using Ceph-Ansible stable-4.0 I did a rolling update from latest Mimic to > Nautilus 14.2.2 on a cluster yesterday, and the update ran to completion > successfully. > > However, in

Re: [ceph-users] Nautilus:14.2.2 Legacy BlueStore stats reporting detected

2019-07-19 Thread Paul Emmerich
bluestore warn on legacy statfs = false -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Fri, Jul 19, 2019 at 1:35 PM nokia ceph wrote: > Hi Team, > > After upgrading

Re: [ceph-users] Legacy BlueStore stats reporting?

2019-07-19 Thread Stig Telfer
> On 19 Jul 2019, at 10:01, Konstantin Shalygin wrote: >> Using Ceph-Ansible stable-4.0 I did a rolling update from latest Mimic to >> Nautilus 14.2.2 on a cluster yesterday, and the update ran to completion >> successfully. >> >> However, in ceph status I see a warning of the form "Legacy

[ceph-users] Nautilus:14.2.2 Legacy BlueStore stats reporting detected

2019-07-19 Thread nokia ceph
Hi Team, After upgrading our cluster from 14.2.1 to 14.2.2 , the cluster moved to warning state with following error cn1.chn6m1c1ru1c1.cdn ~# ceph status cluster: id: e9afb5f3-4acf-421a-8ae6-caaf328ef888 health: HEALTH_WARN Legacy BlueStore stats reporting detected on

[ceph-users] Ceph OSD daemon possibly causes network card issues

2019-07-19 Thread Geoffrey Rhodes
Hi Konstantin, *Thanks I've run the following on all four interfaces:* sudo ethtool -K rx off tx off sg off tso off ufo off gso off gro off lro off rxvlan off txvlan off ntuple off rxhash off *The following do not seen to be available to change:* Cannot change udp-fragmentation-offload Cannot

[ceph-users] Ceph OSD daemon possibly causes network card issues

2019-07-19 Thread Geoffrey Rhodes
Hi Paul, *Thanks I've run the following on all four interfaces:* sudo ethtool -K rx off tx off sg off tso off ufo off gso off gro off lro off rxvlan off txvlan off ntuple off rxhash off *The following do not seen to be available to change:* Cannot change udp-fragmentation-offload Cannot change

Re: [ceph-users] Future of Filestore?

2019-07-19 Thread Marc Roos
Maybe a bit of topic, just curious what speeds did you get previously? Depending on how you test your native drive of 5400rpm, the performance could be similar. 4k random read of my 7200rpm/5400 rpm results in ~60iops at 260kB/s. I also wonder why filestore could be that much faster, is this

[ceph-users] Please help: change IP address of a cluster

2019-07-19 Thread ST Wong (ITSC)
Hi all, Our cluster has to change to new IP range in same VLAN: 10.0.7.0/24 -> 10.0.18.0/23, while IP address on private network for OSDs remains unchanged. I wonder if we can do that in either one following ways: = 1. a. Define static route for 10.0.18.0/23 on

[ceph-users] Future of Filestore?

2019-07-19 Thread Stuart Longland
Hi all, Earlier this year, I did a migration from Ceph 10 to 12. Previously, I was happily running Ceph v10 on Filestore with BTRFS, and getting reasonable performance. Moving to Ceph v12 necessitated a migration away from this set-up, and reading the documentation, Bluestore seemed to be "the

[ceph-users] Multiple OSD crashes

2019-07-19 Thread Daniel Aberger - Profihost AG
Hello, we are experiencing crashing OSDs in multiple independent Ceph clusters. Each OSD has very similar log entries regarding the crash as far as I can tell. Example log: https://pastebin.com/raw/vQ2AJ5ud I can provide you with more log files. They are too large for pastebin and I'm not

Re: [ceph-users] Legacy BlueStore stats reporting?

2019-07-19 Thread Konstantin Shalygin
Using Ceph-Ansible stable-4.0 I did a rolling update from latest Mimic to Nautilus 14.2.2 on a cluster yesterday, and the update ran to completion successfully. However, in ceph status I see a warning of the form "Legacy BlueStore stats reporting detected” for all OSDs in the cluster. Can

[ceph-users] Legacy BlueStore stats reporting?

2019-07-19 Thread Stig Telfer
Hi all - Using Ceph-Ansible stable-4.0 I did a rolling update from latest Mimic to Nautilus 14.2.2 on a cluster yesterday, and the update ran to completion successfully. However, in ceph status I see a warning of the form "Legacy BlueStore stats reporting detected” for all OSDs in the

Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-19 Thread Marc Schöchlin
Hello Jason, Am 18.07.19 um 20:10 schrieb Jason Dillaman: > On Thu, Jul 18, 2019 at 1:47 PM Marc Schöchlin wrote: >> Hello cephers, >> >> rbd-nbd crashes in a reproducible way here. > I don't see a crash report in the log below. Is it really crashing or > is it shutting down? If it is crashing