[ceph-users] v14.2.7 Nautilus released

2020-01-31 Thread David Galloway
This is the seventh update to the Ceph Nautilus release series. This is a hotfix release primarily fixing a couple of security issues. We recommend that all users upgrade to this release. Notable Changes --- * CVE-2020-1699: Fixed a path traversal flaw in Ceph dashboard that could

[ceph-users] Re: CephFS - objects in default data pool

2020-01-31 Thread Frank Schilder
Update: the primary data pool (con-fs2-meta2) does store data: con-fs2-meta112 240 MiB 0.02 1.1 TiB 6437 con-fs2-meta213 0 B 0 373 TiB72167 con-fs2-data 14 103 GiB 0.01 894 TiB

[ceph-users] Re: kernel client osdc ops stuck and mds slow reqs

2020-01-31 Thread Ilya Dryomov
On Fri, Jan 31, 2020 at 4:57 PM Dan van der Ster wrote: > > Hi Ilya, > > On Fri, Jan 31, 2020 at 11:33 AM Ilya Dryomov wrote: > > > > On Fri, Jan 31, 2020 at 11:06 AM Dan van der Ster > > wrote: > > > > > > Hi all, > > > > > > We are quite regularly (a couple times per week) seeing: > > > > >

[ceph-users] Re: CephFS - objects in default data pool

2020-01-31 Thread Frank Schilder
Dear Gregory and Philip, I'm also experimenting with a replicated primary data pool and an erasure-coded secondary data pool. I make the same observation with regards to objects and activity as Philip. However, is does seem to make a difference. If I run a very aggressive fio test as in: fio

[ceph-users] Re: kernel client osdc ops stuck and mds slow reqs

2020-01-31 Thread Dan van der Ster
Hi Ilya, On Fri, Jan 31, 2020 at 11:33 AM Ilya Dryomov wrote: > > On Fri, Jan 31, 2020 at 11:06 AM Dan van der Ster wrote: > > > > Hi all, > > > > We are quite regularly (a couple times per week) seeing: > > > > HEALTH_WARN 1 clients failing to respond to capability release; 1 MDSs > > report

[ceph-users] Re: Inactive pgs preventing osd from starting

2020-01-31 Thread Ragan, Tj (Dr.)
I tried that (and just tried again by setting it in /etc/ceph/ceph.conf). OSD still won’t start. Dr. T.J. Ragan Senior Research Computation Officer Leicester Institute of Structural and Chemical Biology University of Leicester, University Road, Leicester LE1 7RH, UK t: +44 (0)116 223 1287 e:

[ceph-users] Re: Upgrading mimic 13.2.2 to mimic 13.2.8

2020-01-31 Thread Frank Schilder
Was probably an over-paranoid question. The upgrade 13.2.2 -> 13.2.8 went smoothly. Only this one didn't do what was expected: # ceph osd set pglog_hardlimit Invalid command: pglog_hardlimit not in

[ceph-users] Getting rid of trim_object Snap .... not in clones

2020-01-31 Thread Andreas John
Hello, in my cluster one after the other OSD dies until I recognized that it was simply an "abort" in the daemon caused probably by 2020-01-31 15:54:42.535930 7faf8f716700 -1 log_channel(cluster) log [ERR] : trim_object Snap 29c44 not in clones Close to this msg I get a stracktrace:  ceph

[ceph-users] Re: Inactive pgs preventing osd from starting

2020-01-31 Thread Paul Emmerich
If you don't care about the data: set osd_find_best_info_ignore_history_les = true on the affected OSDs temporarily. This means losing data. For anyone else reading this: don't ever use this option. It's evil and causes data loss (but gets your PG back and active, yay!) Paul -- Paul Emmerich

[ceph-users] Re: Micron SSD/Basic Config

2020-01-31 Thread adamb
The use case is for KVM RBD volumes. Our enviroment will be 80% random reads/writes probably 40/60 or 30/70 is a good estimate. All 4k-8k IO sizes. We currently run on a Nimble Hybrid array which runs in the 5k-15k IOPS range with some spikes up to 20-25k IOPS (Capable of 100k iops per

[ceph-users] Re: Micron SSD/Basic Config

2020-01-31 Thread David Byte
The RocksDB rings are 256MB, 2.5GB, 25GB, and 250GB. Unless you have a workload that uses a lot of metadata, taking care of the first 3 and providing room for compaction should be fine. To allow for compaction room, 60GB should be sufficient. Add 4GB to accommodate WAL and you're at a nice

[ceph-users] Re: Micron SSD/Basic Config

2020-01-31 Thread adamb
vitalif@yourcmc.ru wrote: > I think 800 GB NVMe per 2 SSDs is an overkill. 1 OSD usually only > requires 30 GB block.db, so 400 GB per an OSD is a lot. On the other > hand, does 7300 have twice the iops of 5300? In fact, I'm not sure if a > 7300 + 5300 OSD will perform better than just a 5300

[ceph-users] Inactive pgs preventing osd from starting

2020-01-31 Thread Ragan, Tj (Dr.)
Hi All, Long story-short, we’re doing disaster recovery on a cephfs cluster, and are at a point where we have 8 pgs stuck incomplete. Just before the disaster, I increased the pg_count on two of the pools, and they had not completed increasing the pgp_num yet. I’ve since forced pgp_num to

[ceph-users] Re: Micron SSD/Basic Config

2020-01-31 Thread Adam Boyhan
Ok, so 100G seems to be the better choice. I will probably go with some of these. [ https://www.fs.com/products/75808.html | https://www.fs.com/products/75808.html ] From: "Paul Emmerich" To: "EDH" Cc: "adamb" , "ceph-users" Sent: Friday, January 31, 2020 8:49:29 AM Subject: Re:

[ceph-users] Re: Micron SSD/Basic Config

2020-01-31 Thread Martin Verges
Hello Adam, Can you describe what performance values you want to gain out of your cluster? What's the use case? EC oder Replica? In general, more disks are preferred over bigger ones. As Micron has not provided us with demo hardware, we can't say how fast these disks are in reality. Before I

[ceph-users] Re: Micron SSD/Basic Config

2020-01-31 Thread vitalif
I think 800 GB NVMe per 2 SSDs is an overkill. 1 OSD usually only requires 30 GB block.db, so 400 GB per an OSD is a lot. On the other hand, does 7300 have twice the iops of 5300? In fact, I'm not sure if a 7300 + 5300 OSD will perform better than just a 5300 OSD at all. It would be

[ceph-users] Re: Micron SSD/Basic Config

2020-01-31 Thread Paul Emmerich
On Fri, Jan 31, 2020 at 2:06 PM EDH - Manuel Rios wrote: > > Hmm change 40Gbps to 100Gbps networking. > > 40Gbps technology its just a bond of 4x10 Links with some latency due link > aggregation. > 100 Gbps and 25Gbps got less latency and Good performance. In ceph a 50% of > the latency comes

[ceph-users] Re: Micron SSD/Basic Config

2020-01-31 Thread EDH - Manuel Rios
Please check that you Support RDMA for improve Access. 40Gbps transceiver are internally a 4x10 Ports . Thats why you can Split 40 gbps switches port in 4x10 multiports over the same link 25GG is a new base technology with improvemenets over 10Gbps in latency. Regards Manuel De: Adam Boyhan

[ceph-users] Re: Micron SSD/Basic Config

2020-01-31 Thread Adam Boyhan
Appreciate the input. Looking at those articles they make me feel like the 40G they are talking about is 4x Bonded 10G connections. Im looking at 40Gbps without bonding for throughput. Is that still the same? [ https://www.fs.com/products/29126.html | https://www.fs.com/products/29126.html

[ceph-users] Re: Micron SSD/Basic Config

2020-01-31 Thread EDH - Manuel Rios
Hmm change 40Gbps to 100Gbps networking. 40Gbps technology its just a bond of 4x10 Links with some latency due link aggregation. 100 Gbps and 25Gbps got less latency and Good performance. In ceph a 50% of the latency comes from Network commits and the other 50% from disk commits. A fast graph

[ceph-users] Micron SSD/Basic Config

2020-01-31 Thread Adam Boyhan
Looking to role out a all flash Ceph cluster. Wanted to see if anyone else was using Micron drives along with some basic input on my design so far? Basic Config Ceph OSD Nodes 8x Supermicro A+ Server 2113S-WTRT - AMD EPYC 7601 32 Core 2.2Ghz - 256G Ram - AOC-S3008L-L8e HBA - 10GB SFP+ for

[ceph-users] Re: Network performance checks

2020-01-31 Thread Massimo Sgaravatto
I am seeing very few of such error messages in the mon logs (~ a couple per day) If I issue on every OSD the command "ceph daemon osd.$id dump_osd_network" with the default 1000 ms threshold, I can't see entries. I guess this is because that command considers only the last (15 ?) minutes. Am I

[ceph-users] Re: kernel client osdc ops stuck and mds slow reqs

2020-01-31 Thread Ilya Dryomov
On Fri, Jan 31, 2020 at 11:06 AM Dan van der Ster wrote: > > Hi all, > > We are quite regularly (a couple times per week) seeing: > > HEALTH_WARN 1 clients failing to respond to capability release; 1 MDSs > report slow requests > MDS_CLIENT_LATE_RELEASE 1 clients failing to respond to capability

[ceph-users] TR: Understand ceph df details

2020-01-31 Thread CUZA Frédéric
Turns out it is probably orphans. We are running ceph luminous : 12.2.12 And the orphans find is stuck in the stage : "iterate_bucket_index" on shard "0" for 2 days now. Anyone is facing this issue ? Regards, De : ceph-users mailto:ceph-users-boun...@lists.ceph.com>> Envoyé : 21 January 2020

[ceph-users] Upgrading mimic 13.2.2 to mimic 13.2.8

2020-01-31 Thread Frank Schilder
Dear all, is it possible to upgrade from 13.2.2 directly to 13.2.8 after setting "ceph osd set pglog_hardlimit" (mimic 13.2.5 release notes), or do I need to follow this path: 13.2. 2 -> 5 -> 6 -> 8 ? Thanks! = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] kernel client osdc ops stuck and mds slow reqs

2020-01-31 Thread Dan van der Ster
Hi all, We are quite regularly (a couple times per week) seeing: HEALTH_WARN 1 clients failing to respond to capability release; 1 MDSs report slow requests MDS_CLIENT_LATE_RELEASE 1 clients failing to respond to capability release mdshpc-be143(mds.0): Client hpc-be028.cern.ch: failing to

[ceph-users] Re: moving small production cluster to different datacenter

2020-01-31 Thread Burkhard Linke
Hi, On 1/31/20 12:09 AM, Nigel Williams wrote: Did you end up having all new IPs for your MONs? I've wondered how should a large KVM deployment be handled when the instance-metadata has a hard-coded list of MON IPs for the cluster? how are they changed en-masse with running VMs? or do these