[ceph-users] backfill start all of sudden

2018-10-08 Thread Chen Allen
Hi there, Has anyone experienced below? 2 of OSD server was down, after bring up 2 of servers, I brought 52 OSD's in with just weight of 0.05, but it causing huge backfilling load, I saw so many blocked requests and a number of pg stuck inactive. some of servers was impact. so I stopped

Re: [ceph-users] MDSs still core dumping

2018-10-08 Thread Yan, Zheng
On Tue, Oct 9, 2018 at 5:39 AM Alfredo Daniel Rezinovsky wrote: > > It seems my purge_queue journal is damaged. Even if I reset it keeps > damaged. > > What means inotablev mismatch ? > > > 2018-10-08 16:40:03.144 7f05b6099700 -1 log_channel(cluster) log [ERR] : > journal replay inotablev

Re: [ceph-users] list admin issues

2018-10-08 Thread Alex Gorbachev
On Mon, Oct 8, 2018 at 7:48 AM Elias Abacioglu wrote: > > If it's attachments causing this, perhaps forbid attachments? Force people to > use pastebin / imgur type of services? > > /E > > On Mon, Oct 8, 2018 at 1:33 PM Martin Palma wrote: >> >> Same here also on Gmail with G Suite. >> On Mon,

Re: [ceph-users] MDSs still core dumping

2018-10-08 Thread Sergey Malinin
I was able to start MDS 13.2.1 when I had imported journal, ran recover_dentries, reset journal, reset session table, and did ceph fs reset. However, I got about 1000 errors in log like bad backtrace, loaded dup inode, etc. and it eventually failed on assert(stray_in->inode.nlink >= 1) right

Re: [ceph-users] daahboard

2018-10-08 Thread solarflow99
Ok, thanks for the clarification. I guess I had assumed ansible was supposed to take care of all that, now I got it working. On Mon, Oct 8, 2018 at 3:07 PM Jonas Jelten wrote: > You need to add or generate a certificate, without it the dashboard > doesn't start. > The procedure is described in

[ceph-users] OSD fails to startup with bluestore "direct_read_unaligned (5) Input/output error"

2018-10-08 Thread Alexandre Gosset
Hi, We are experiencing a recurrent problem with some OSD that fails to startup after a crash. Here it happened with an OSD during recovery and this is a very annoying bug, because recovery takes time (small increment of crush weight) and the way I found to fix this is to use ceph-volume lvm zap

Re: [ceph-users] MDSs still core dumping

2018-10-08 Thread Sergey Malinin
... and cephfs-table-tool reset session ? > On 9.10.2018, at 01:32, Sergey Malinin wrote: > > Have you tried to recover dentries and then reset the journal? > > >> On 8.10.2018, at 23:43, Alfredo Daniel Rezinovsky > > wrote: >> >> >> >> On 08/10/18 17:41,

Re: [ceph-users] MDSs still core dumping

2018-10-08 Thread Sergey Malinin
Have you tried to recover dentries and then reset the journal? > On 8.10.2018, at 23:43, Alfredo Daniel Rezinovsky > wrote: > > > > On 08/10/18 17:41, Sergey Malinin wrote: >> >>> On 8.10.2018, at 23:23, Alfredo Daniel Rezinovsky >> > wrote: >>> >>> I need

Re: [ceph-users] list admin issues

2018-10-08 Thread Gerhard W. Recher
@all and Listadmin These Problems with Gmail Email Addresse or Domains operated under Google Control Are caused by Google refusing high Volume Senders !!! I operated a Security list and hast Been faced with exactly Same Problems Google is totally ignorant, no real answers to Out complains !

Re: [ceph-users] daahboard

2018-10-08 Thread Jonas Jelten
You need to add or generate a certificate, without it the dashboard doesn't start. The procedure is described in the documentation. -- JJ On 09/10/2018 00.05, solarflow99 wrote: > seems like it did, yet I don't see anything listening on the port it should > be for dashboard. > > # ceph mgr

Re: [ceph-users] daahboard

2018-10-08 Thread solarflow99
seems like it did, yet I don't see anything listening on the port it should be for dashboard. # ceph mgr module ls { "enabled_modules": [ "dashboard", "status" ], # ceph status cluster: id: d36fd17c-174e-40d6-95b9-86bdd196b7d2 health: HEALTH_OK

Re: [ceph-users] MDS hangs in "heartbeat_map" deadlock

2018-10-08 Thread Patrick Donnelly
On Thu, Oct 4, 2018 at 3:58 PM Stefan Kooman wrote: > A couple of hours later we hit the same issue. We restarted with > debug_mds=20 and debug_journaler=20 on the standby-replay node. Eight > hours later (an hour ago) we hit the same issue. We captured ~ 4.7 GB of > logging I skipped to the

Re: [ceph-users] fixing another remapped+incomplete EC 4+2 pg

2018-10-08 Thread Graham Allan
I'm still trying to find a way to reactivate this one pg which is incomplete. There are a lot of periods in its history based on a combination of a peering storm a couple of weeks ago, with min_size being set too low for safety. At this point I think there is no chance of bringing back the

Re: [ceph-users] advised needed for different projects design

2018-10-08 Thread Paul Emmerich
One cephfs filesystem, one directory per project, quotas in cephfs, exported via NFS ganesha. Unless you have lots of really small files, then you might want to consider RBD (where HA is more annoying to handle). Paul Am Mo., 8. Okt. 2018 um 22:45 Uhr schrieb Joshua Chen : > > Hello all, > When

Re: [ceph-users] list admin issues

2018-10-08 Thread Paul Emmerich
You don't get removed for sending to the mailing list, you get removed because the mailing list servers fails to deliver mail to you. Am Mo., 8. Okt. 2018 um 23:22 Uhr schrieb Jeff Smith : > > I just got dumped again. I have not sent any attechments/images. > On Mon, Oct 8, 2018 at 5:48 AM Elias

Re: [ceph-users] Fastest way to find raw device from OSD-ID? (osd -> lvm lv -> lvm pv -> disk)

2018-10-08 Thread Paul Emmerich
Yeah, it's usually hanging in some low-level LVM tool (lvs, usually). They unfortunately like to get stuck indefinitely on some hardware failures, but there isn't really anything that can be done. But we've found that it's far more reliable to just call lvs ourselves instead of relying on

Re: [ceph-users] Fastest way to find raw device from OSD-ID? (osd -> lvm lv -> lvm pv -> disk)

2018-10-08 Thread Alfredo Deza
On Mon, Oct 8, 2018 at 5:04 PM Paul Emmerich wrote: > > ceph-volume unfortunately doesn't handle completely hanging IOs too > well compared to ceph-disk. Not sure I follow, would you mind expanding on what you mean by "ceph-volume unfortunately doesn't handle completely hanging IOs" ?

Re: [ceph-users] Fastest way to find raw device from OSD-ID? (osd -> lvm lv -> lvm pv -> disk)

2018-10-08 Thread Kevin Olbrich
Hi Jakub, "ceph osd metadata X" this is perfect! This also lists multipath devices which I was looking for! Kevin Am Mo., 8. Okt. 2018 um 21:16 Uhr schrieb Jakub Jaszewski < jaszewski.ja...@gmail.com>: > Hi Kevin, > Have you tried ceph osd metadata OSDid ? > > Jakub > > pon., 8 paź 2018,

Re: [ceph-users] can I define buckets in a multi-zone config that are exempted from replication?

2018-10-08 Thread Casey Bodley
On 10/08/2018 03:45 PM, Christian Rice wrote: Just getting started here, but I am setting up a three-zone realm, each with a pair of S3 object gateways, Luminous on Debian.  I’m wondering if there’s a straightforward way to exempt some buckets from replicating to other zones?  The idea

Re: [ceph-users] list admin issues

2018-10-08 Thread Jeff Smith
I just got dumped again. I have not sent any attechments/images. On Mon, Oct 8, 2018 at 5:48 AM Elias Abacioglu wrote: > > If it's attachments causing this, perhaps forbid attachments? Force people to > use pastebin / imgur type of services? > > /E > > On Mon, Oct 8, 2018 at 1:33 PM Martin

Re: [ceph-users] Fastest way to find raw device from OSD-ID? (osd -> lvm lv -> lvm pv -> disk)

2018-10-08 Thread Paul Emmerich
ceph-volume unfortunately doesn't handle completely hanging IOs too well compared to ceph-disk. It needs to read actual data from each disk and it'll just hang completely if any of the disks doesn't respond. The low-level command to get the information from LVM is: lvs -o lv_tags this allows

[ceph-users] advised needed for different projects design

2018-10-08 Thread Joshua Chen
Hello all, When planning for my institute's need, I would like to seek for design suggestions from you for my special situation: 1, I will support many projects, currently they are all nfs servers (and those nfs servers serve their clients respectively). For example nfsA (for clients belong to

Re: [ceph-users] MDSs still core dumping

2018-10-08 Thread Alfredo Daniel Rezinovsky
On 08/10/18 17:41, Sergey Malinin wrote: On 8.10.2018, at 23:23, Alfredo Daniel Rezinovsky mailto:alfrenov...@gmail.com>> wrote: I need the data, even if it's read only. After full data scan you should have been able to boot mds 13.2.2 and mount the fs. The problem started with the

Re: [ceph-users] MDSs still core dumping

2018-10-08 Thread Sergey Malinin
> On 8.10.2018, at 23:23, Alfredo Daniel Rezinovsky > wrote: > > I need the data, even if it's read only. After full data scan you should have been able to boot mds 13.2.2 and mount the fs.___ ceph-users mailing list ceph-users@lists.ceph.com

[ceph-users] MDSs still core dumping

2018-10-08 Thread Alfredo Daniel Rezinovsky
It seems my purge_queue journal is damaged. Even if I reset it keeps damaged. What means inotablev mismatch ? 2018-10-08 16:40:03.144 7f05b6099700 -1 log_channel(cluster) log [ERR] : journal replay inotablev mismatch 1 -> 42160 /build/ceph-13.2.1/src/mds/journal.cc: In function 'void

Re: [ceph-users] MDS hangs in "heartbeat_map" deadlock

2018-10-08 Thread Stefan Kooman
Quoting Stefan Kooman (ste...@bit.nl): > > From what you've described here, it's most likely that the MDS is trying to > > read something out of RADOS which is taking a long time, and which we > > didn't expect to cause a slow down. You can check via the admin socket to > > see if there are

[ceph-users] can I define buckets in a multi-zone config that are exempted from replication?

2018-10-08 Thread Christian Rice
Just getting started here, but I am setting up a three-zone realm, each with a pair of S3 object gateways, Luminous on Debian. I’m wondering if there’s a straightforward way to exempt some buckets from replicating to other zones? The idea being there might be data that pertains to a specific

Re: [ceph-users] MDS damaged after mimic 13.2.1 to 13.2.2 upgrade

2018-10-08 Thread Alfredo Daniel Rezinovsky
On 08/10/18 11:47, Yan, Zheng wrote: On Mon, Oct 8, 2018 at 9:46 PM Alfredo Daniel Rezinovsky wrote: On 08/10/18 10:20, Yan, Zheng wrote: On Mon, Oct 8, 2018 at 9:07 PM Alfredo Daniel Rezinovsky wrote: On 08/10/18 09:45, Yan, Zheng wrote: On Mon, Oct 8, 2018 at 6:40 PM Alfredo Daniel

Re: [ceph-users] Fastest way to find raw device from OSD-ID? (osd -> lvm lv -> lvm pv -> disk)

2018-10-08 Thread Jakub Jaszewski
Hi Kevin, Have you tried ceph osd metadata OSDid ? Jakub pon., 8 paź 2018, 19:32 użytkownik Alfredo Deza napisał: > On Mon, Oct 8, 2018 at 6:09 AM Kevin Olbrich wrote: > > > > Hi! > > > > Yes, thank you. At least on one node this works, the other node just > freezes but this might by caused

Re: [ceph-users] Fastest way to find raw device from OSD-ID? (osd -> lvm lv -> lvm pv -> disk)

2018-10-08 Thread Alfredo Deza
On Mon, Oct 8, 2018 at 6:09 AM Kevin Olbrich wrote: > > Hi! > > Yes, thank you. At least on one node this works, the other node just freezes > but this might by caused by a bad disk that I try to find. If it is freezing, you could maybe try running the command where it freezes? (ceph-volume

Re: [ceph-users] Don't upgrade to 13.2.2 if you use cephfs

2018-10-08 Thread Patrick Donnelly
+ceph-announce On Sun, Oct 7, 2018 at 7:30 PM Yan, Zheng wrote: > There is a bug in v13.2.2 mds, which causes decoding purge queue to > fail. If mds is already in damaged state, please downgrade mds to > 13.2.1, then run 'ceph mds repaired fs_name:damaged_rank' . > > Sorry for all the trouble I

Re: [ceph-users] rbd ls operation not permitted

2018-10-08 Thread Jason Dillaman
On Mon, Oct 8, 2018 at 11:33 AM wrote: > > Thanks, changing rxw to rwx solved the problem. But again, it is > strange. I am issuing the rbd command against the ssdvolumes pool and > not ssdvolumes-13. And why does "allow *" on the mon solves the problem. > I am a bit lost :-) > > -- > This does

Re: [ceph-users] Don't upgrade to 13.2.2 if you use cephfs

2018-10-08 Thread Alex Litvak
This is would be a question I had since Zheng posted the problem. I recently purged a brand new cluster because I needed to change default WAL/DB settings on all OSDs in collocate scenario. I decided to jump to 13.2.2 rather then upgrade from 13.2.1. Now I wonder if I am still in

Re: [ceph-users] Don't upgrade to 13.2.2 if you use cephfs

2018-10-08 Thread Paul Emmerich
Does this only affect upgraded CephFS deployments? A fresh 13.2.2 should work fine if I'm interpreting this bug correctly? Paul Am Mo., 8. Okt. 2018 um 11:53 Uhr schrieb Daniel Carrasco : > > > > El lun., 8 oct. 2018 5:44, Yan, Zheng escribió: >> >> On Mon, Oct 8, 2018 at 11:34 AM Daniel

Re: [ceph-users] rbd ls operation not permitted

2018-10-08 Thread sinan
Thanks, changing rxw to rwx solved the problem. But again, it is strange. I am issuing the rbd command against the ssdvolumes pool and not ssdvolumes-13. And why does "allow *" on the mon solves the problem. I am a bit lost :-) -- This does work -- caps: [mon] allow * caps: [osd] allow * $

Re: [ceph-users] Mons are using a lot of disk space and has a lot of old osd maps

2018-10-08 Thread Aleksei Zakharov
As i can see, all pg's are active+clean: ~# ceph -s cluster: id: d168189f-6105-4223-b244-f59842404076 health: HEALTH_WARN noout,nodeep-scrub flag(s) set mons 1,2,3,4,5 are using a lot of disk space services: mon: 5 daemons, quorum 1,2,3,4,5 mgr:

Re: [ceph-users] Mons are using a lot of disk space and has a lot of old osd maps

2018-10-08 Thread Wido den Hollander
On 10/08/2018 05:04 PM, Aleksei Zakharov wrote: > Hi all, > > We've upgraded our cluster from jewel to luminous and re-created monitors > using rocksdb. > Now we see, that mon's are using a lot of disk space and used space only > grows. It is about 17GB for now. It was ~13GB when we used

[ceph-users] Mons are using a lot of disk space and has a lot of old osd maps

2018-10-08 Thread Aleksei Zakharov
Hi all, We've upgraded our cluster from jewel to luminous and re-created monitors using rocksdb. Now we see, that mon's are using a lot of disk space and used space only grows. It is about 17GB for now. It was ~13GB when we used leveldb and jewel release. When we added new osd's we saw that it

Re: [ceph-users] rbd ls operation not permitted

2018-10-08 Thread Jason Dillaman
On Mon, Oct 8, 2018 at 10:20 AM wrote: > > On a Ceph Monitor: > # ceph auth get client.openstack | grep caps > exported keyring for client.openstack > caps mon = "allow r" > caps osd = "allow class-read object_prefix rbd_children, allow rwx > pool=ssdvolumes, allow rxw

Re: [ceph-users] MDS damaged after mimic 13.2.1 to 13.2.2 upgrade

2018-10-08 Thread Yan, Zheng
On Mon, Oct 8, 2018 at 9:46 PM Alfredo Daniel Rezinovsky wrote: > > > > On 08/10/18 10:20, Yan, Zheng wrote: > > On Mon, Oct 8, 2018 at 9:07 PM Alfredo Daniel Rezinovsky > > wrote: > >> > >> > >> On 08/10/18 09:45, Yan, Zheng wrote: > >>> On Mon, Oct 8, 2018 at 6:40 PM Alfredo Daniel Rezinovsky

[ceph-users] rados gateway http compression

2018-10-08 Thread Jin Mao
I like to compare the performance between storing compressed data and decompress at client vs storing uncompressed data directly. However, a series tests with or without "Accept-Encoding: gzip" header using curl (hitting same rgw server) do not seem to bring any differences. The only compression

Re: [ceph-users] rbd ls operation not permitted

2018-10-08 Thread sinan
On a Ceph Monitor: # ceph auth get client.openstack | grep caps exported keyring for client.openstack caps mon = "allow r" caps osd = "allow class-read object_prefix rbd_children, allow rwx pool=ssdvolumes, allow rxw pool=ssdvolumes-13, allow rwx pool=sasvolumes-13, allow rwx

Re: [ceph-users] rbd ls operation not permitted

2018-10-08 Thread Jason Dillaman
Can you run "ceph auth get client.openstack | grep caps"? On Mon, Oct 8, 2018 at 10:03 AM wrote: > > The result of your command: > > $ rbd ls --debug-rbd=20 -p ssdvolumes --id openstack > 2018-10-08 13:42:17.386505 7f604933fd40 20 librbd: list 0x7fff5b25cc30 > rbd: list: (1) Operation not

Re: [ceph-users] MDS damaged after mimic 13.2.1 to 13.2.2 upgrade

2018-10-08 Thread Alfredo Daniel Rezinovsky
On 08/10/18 10:20, Yan, Zheng wrote: On Mon, Oct 8, 2018 at 9:07 PM Alfredo Daniel Rezinovsky wrote: On 08/10/18 09:45, Yan, Zheng wrote: On Mon, Oct 8, 2018 at 6:40 PM Alfredo Daniel Rezinovsky wrote: On 08/10/18 07:06, Yan, Zheng wrote: On Mon, Oct 8, 2018 at 5:43 PM Sergey Malinin

Re: [ceph-users] rbd ls operation not permitted

2018-10-08 Thread sinan
The result of your command: $ rbd ls --debug-rbd=20 -p ssdvolumes --id openstack 2018-10-08 13:42:17.386505 7f604933fd40 20 librbd: list 0x7fff5b25cc30 rbd: list: (1) Operation not permitted $ Thanks! Sinan On 08-10-2018 15:37, Jason Dillaman wrote: On Mon, Oct 8, 2018 at 9:24 AM wrote:

Re: [ceph-users] rbd ls operation not permitted

2018-10-08 Thread Jason Dillaman
On Mon, Oct 8, 2018 at 9:24 AM wrote: > > Hi, > > I am running a Ceph cluster (Jewel, ceph version 10.2.10-17.el7cp). > > > I also have 2 OpenStack clusters (Ocata (v12) and Pike (v13)). > > When I perform a "rbd ls -p --id openstack" on the OpenStack > Ocata cluster it works fine, when I

Re: [ceph-users] MDS damaged after mimic 13.2.1 to 13.2.2 upgrade

2018-10-08 Thread Alfredo Daniel Rezinovsky
On 08/10/18 10:32, Sergey Malinin wrote: On 8.10.2018, at 16:07, Alfredo Daniel Rezinovsky mailto:alfrenov...@gmail.com>> wrote: So I can stopt  cephfs-data-scan, run the import, downgrade, and then reset the purge queue? I suggest that you backup metadata pool so that in case of

Re: [ceph-users] MDS damaged after mimic 13.2.1 to 13.2.2 upgrade

2018-10-08 Thread Sergey Malinin
> On 8.10.2018, at 16:07, Alfredo Daniel Rezinovsky > wrote: > > So I can stopt cephfs-data-scan, run the import, downgrade, and then reset > the purge queue? I suggest that you backup metadata pool so that in case of failure you can continue with data scan from where you stopped. I've

[ceph-users] rbd ls operation not permitted

2018-10-08 Thread sinan
Hi, I am running a Ceph cluster (Jewel, ceph version 10.2.10-17.el7cp). I also have 2 OpenStack clusters (Ocata (v12) and Pike (v13)). When I perform a "rbd ls -p --id openstack" on the OpenStack Ocata cluster it works fine, when I perform the same command on the OpenStack Pike cluster I

Re: [ceph-users] MDS damaged after mimic 13.2.1 to 13.2.2 upgrade

2018-10-08 Thread Yan, Zheng
On Mon, Oct 8, 2018 at 9:07 PM Alfredo Daniel Rezinovsky wrote: > > > > On 08/10/18 09:45, Yan, Zheng wrote: > > On Mon, Oct 8, 2018 at 6:40 PM Alfredo Daniel Rezinovsky > > wrote: > >> On 08/10/18 07:06, Yan, Zheng wrote: > >>> On Mon, Oct 8, 2018 at 5:43 PM Sergey Malinin wrote: > >

Re: [ceph-users] MDS damaged after mimic 13.2.1 to 13.2.2 upgrade

2018-10-08 Thread Alfredo Daniel Rezinovsky
On 08/10/18 09:45, Yan, Zheng wrote: On Mon, Oct 8, 2018 at 6:40 PM Alfredo Daniel Rezinovsky wrote: On 08/10/18 07:06, Yan, Zheng wrote: On Mon, Oct 8, 2018 at 5:43 PM Sergey Malinin wrote: On 8.10.2018, at 12:37, Yan, Zheng wrote: On Mon, Oct 8, 2018 at 4:37 PM Sergey Malinin

Re: [ceph-users] Ceph version upgrade with Juju

2018-10-08 Thread James Page
Hi Fabio On Thu, 4 Oct 2018 at 23:02 Fabio Abreu wrote: > Hi Cephers, > > I have a little doubt about the migration of Jewel version in the MAAS / > JUJU implementation scenario . > > Could someone has the same experience in production environment? > > I am asking this because we mapping all

Re: [ceph-users] cephfs poor performance

2018-10-08 Thread Tomasz Płaza
On 08.10.2018 at 10:29, Yan, Zheng wrote: On Mon, Oct 8, 2018 at 3:38 PM Tomasz Płaza wrote: On 08.10.2018 at 09:21, Yan, Zheng wrote: On Mon, Oct 8, 2018 at 1:54 PM Tomasz Płaza wrote: Hi, Can someone please help me, how do I improve performance on our CephFS cluster? System in use is:

Re: [ceph-users] list admin issues

2018-10-08 Thread Elias Abacioglu
If it's attachments causing this, perhaps forbid attachments? Force people to use pastebin / imgur type of services? /E On Mon, Oct 8, 2018 at 1:33 PM Martin Palma wrote: > Same here also on Gmail with G Suite. > On Mon, Oct 8, 2018 at 12:31 AM Paul Emmerich > wrote: > > > > I'm also seeing

Re: [ceph-users] list admin issues

2018-10-08 Thread Martin Palma
Same here also on Gmail with G Suite. On Mon, Oct 8, 2018 at 12:31 AM Paul Emmerich wrote: > > I'm also seeing this once every few months or so on Gmail with G Suite. > > Paul > Am So., 7. Okt. 2018 um 08:18 Uhr schrieb Joshua Chen > : > > > > I also got removed once, got another warning once

Re: [ceph-users] Fastest way to find raw device from OSD-ID? (osd -> lvm lv -> lvm pv -> disk)

2018-10-08 Thread Kevin Olbrich
Hi! Yes, thank you. At least on one node this works, the other node just freezes but this might by caused by a bad disk that I try to find. Kevin Am Mo., 8. Okt. 2018 um 12:07 Uhr schrieb Wido den Hollander : > Hi, > > $ ceph-volume lvm list > > Does that work for you? > > Wido > > On

Re: [ceph-users] MDS damaged after mimic 13.2.1 to 13.2.2 upgrade

2018-10-08 Thread Yan, Zheng
On Mon, Oct 8, 2018 at 5:43 PM Sergey Malinin wrote: > > > > > On 8.10.2018, at 12:37, Yan, Zheng wrote: > > > > On Mon, Oct 8, 2018 at 4:37 PM Sergey Malinin wrote: > >> > >> What additional steps need to be taken in order to (try to) regain access > >> to the fs providing that I backed up

Re: [ceph-users] Fastest way to find raw device from OSD-ID? (osd -> lvm lv -> lvm pv -> disk)

2018-10-08 Thread Wido den Hollander
Hi, $ ceph-volume lvm list Does that work for you? Wido On 10/08/2018 12:01 PM, Kevin Olbrich wrote: > Hi! > > Is there an easy way to find raw disks (eg. sdd/sdd1) by OSD id? > Before I migrated from filestore with simple-mode to bluestore with lvm, > I was able to find the raw disk with

[ceph-users] Fastest way to find raw device from OSD-ID? (osd -> lvm lv -> lvm pv -> disk)

2018-10-08 Thread Kevin Olbrich
Hi! Is there an easy way to find raw disks (eg. sdd/sdd1) by OSD id? Before I migrated from filestore with simple-mode to bluestore with lvm, I was able to find the raw disk with "df". Now, I need to go from LVM LV to PV to disk every time I need to check/smartctl a disk. Kevin

Re: [ceph-users] Don't upgrade to 13.2.2 if you use cephfs

2018-10-08 Thread Daniel Carrasco
El lun., 8 oct. 2018 5:44, Yan, Zheng escribió: > On Mon, Oct 8, 2018 at 11:34 AM Daniel Carrasco > wrote: > > > > I've got several problems on 12.2.8 too. All my standby MDS uses a lot > of memory (while active uses normal memory), and I'm receiving a lot of > slow MDS messages (causing the

Re: [ceph-users] mds_cache_memory_limit value

2018-10-08 Thread John Spray
On Fri, Oct 5, 2018 at 9:33 AM Hervé Ballans wrote: > > Hi all, > > I have just configured a new value for 'mds_cache_memory_limit'. The output > message tells "not observed, change may require restart". > So I'm not really sure, has the new value been taken into account directly or > do I have

Re: [ceph-users] daahboard

2018-10-08 Thread John Spray
Assuming that ansible is correctly running "ceph mgr module enable dashboard", then the next place to look is in "ceph status" (any errors?) and "ceph mgr module ls" (any reports of the module unable to run?) John On Sat, Oct 6, 2018 at 1:53 AM solarflow99 wrote: > > I enabled the dashboard

Re: [ceph-users] MDS damaged after mimic 13.2.1 to 13.2.2 upgrade

2018-10-08 Thread Sergey Malinin
> On 8.10.2018, at 12:37, Yan, Zheng wrote: > > On Mon, Oct 8, 2018 at 4:37 PM Sergey Malinin wrote: >> >> What additional steps need to be taken in order to (try to) regain access to >> the fs providing that I backed up metadata pool, created alternate metadata >> pool and ran

Re: [ceph-users] MDS damaged after mimic 13.2.1 to 13.2.2 upgrade

2018-10-08 Thread Yan, Zheng
On Mon, Oct 8, 2018 at 4:37 PM Sergey Malinin wrote: > > What additional steps need to be taken in order to (try to) regain access to > the fs providing that I backed up metadata pool, created alternate metadata > pool and ran scan_extents, scan_links, scan_inodes, and somewhat recursive >

Re: [ceph-users] After 13.2.2 upgrade: bluefs mount failed to replay log: (5) Input/output error

2018-10-08 Thread Kevin Olbrich
Hi Paul! I installed ceph-debuginfo and set these: debug bluestore = 20/20 debug osd = 20/20 debug bluefs = 20/20 debug bdev = 20/20 V: ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable) *LOGS* *OSD 29:* 2018-10-08 10:29:06.001 7f810511a1c0 20 bluefs _read left

Re: [ceph-users] MDS damaged after mimic 13.2.1 to 13.2.2 upgrade

2018-10-08 Thread Sergey Malinin
What additional steps need to be taken in order to (try to) regain access to the fs providing that I backed up metadata pool, created alternate metadata pool and ran scan_extents, scan_links, scan_inodes, and somewhat recursive scrub. After that I only mounted the fs read-only to backup the

Re: [ceph-users] cephfs poor performance

2018-10-08 Thread Yan, Zheng
On Mon, Oct 8, 2018 at 3:38 PM Tomasz Płaza wrote: > > > On 08.10.2018 at 09:21, Yan, Zheng wrote: > > On Mon, Oct 8, 2018 at 1:54 PM Tomasz Płaza wrote: > >> Hi, > >> > >> Can someone please help me, how do I improve performance on our CephFS > >> cluster? > >> > >> System in use is: Centos 7.5

Re: [ceph-users] cephfs poor performance

2018-10-08 Thread Tomasz Płaza
On 08.10.2018 at 09:21, Yan, Zheng wrote: On Mon, Oct 8, 2018 at 1:54 PM Tomasz Płaza wrote: Hi, Can someone please help me, how do I improve performance on our CephFS cluster? System in use is: Centos 7.5 with ceph 12.2.7. The hardware in use are as follows: 3xMON/MGR: 1xIntel(R) Xeon(R)

Re: [ceph-users] cephfs poor performance

2018-10-08 Thread Marc Roos
That is easy I think, so I will give it a try: Faster CPU's, Use fast NVME disks, all 10Gbit or even better 100Gbit, added with a daily prayer. -Original Message- From: Tomasz Płaza [mailto:tomasz.pl...@grupawp.pl] Sent: maandag 8 oktober 2018 7:46 To: ceph-users@lists.ceph.com

Re: [ceph-users] cephfs poor performance

2018-10-08 Thread Yan, Zheng
On Mon, Oct 8, 2018 at 1:54 PM Tomasz Płaza wrote: > > Hi, > > Can someone please help me, how do I improve performance on our CephFS > cluster? > > System in use is: Centos 7.5 with ceph 12.2.7. > The hardware in use are as follows: > 3xMON/MGR: > 1xIntel(R) Xeon(R) Bronze 3106 > 16GB RAM >

[ceph-users] cephfs kernel client blocks when removing large files

2018-10-08 Thread Dylan McCulloch
Hi all, We have identified some unexpected blocking behaviour by the ceph-fs kernel client. When performing 'rm' on large files (100+GB), there appears to be a significant delay of 10 seconds or more, before a 'stat' operation can be performed on the same directory on the filesystem.