I've upgraded 7 of our clusters to Nautilus (14.2.4) and noticed that on some
of the clusters (3 out of 7) the OSDs aren't using msgr2 at all. Here's the
output for osd.0 on 2 clusters of each type:
### Cluster 1 (v1 only):
# ceph osd find 0 | jq -r '.addrs'
{
"addrvec": [
{
"type":
On 19/11/14 11:04AM, Gregory Farnum wrote:
> On Thu, Nov 14, 2019 at 8:14 AM Dan van der Ster wrote:
> >
> > Hi Joao,
> >
> > I might have found the reason why several of our clusters (and maybe
> > Bryan's too) are getting stuck not trimming osdmaps.
> > It seems that when an osd fails, the min_l
In order to get optimal performance out of NVMe, you will want very
fast cores, and you will probably have to split each NVMe card into
2-4 OSD partitions in order to throw enough cores at it.
On Fri, Nov 15, 2019 at 10:24 AM Yoann Moulin wrote:
>
> Hello,
>
> I'm going to deploy a new cluster so
Hello,
I'm going to deploy a new cluster soon based on 6.4TB NVME PCI-E Cards, I will
have only 1 NVME card per node and 38 nodes.
The use case is to offer cephfs volumes for a k8s platform, I plan to use an
EC-POOL 8+3 for the cephfs_data pool.
Do you have recommendations for the setup or mis
Hi Igor,
On 15/11/2019 14:22, Igor Fedotov wrote:
Do you mean both standalone DB and(!!) standalone WAL devices/partitions
by having SSD DB/WAL?
No, 1x combined DB/WAL partition on an SSD and 1x data partition on an
HDD per OSD. I.e. created like:
ceph-deploy osd create --data /dev/sda --b
On 11/15/19 1:29 PM, Thomas Schneider wrote:
> This cluster has a long unhealthy story, means this issue is not
> happening out of the blue.
>
> root@ld3955:~# ceph -s
> cluster:
> id: 6b1b5117-6e08-4843-93d6-2da3cf8a6bae
> health: HEALTH_WARN
> 1 MDSs report slow metad
Hi Simon,
Do you mean both standalone DB and(!!) standalone WAL devices/partitions
by having SSD DB/WAL?
If so then BlueFS might eventually overwrite some data at you DB volume
with BlueFS log content. Which most probably makes OSD crash and unable
to restart one day. This is quite random an
Hi,
after stopping all MON MGR OSD services on all nodes and starting all
services after a few seconds this issue is solved.
However now the cluster is busy with peering etc.
THX
Am 15.11.2019 um 14:28 schrieb Ilya Dryomov:
> On Fri, Nov 15, 2019 at 11:39 AM Thomas Schneider <74cmo...@gmail.com>
On Fri, Nov 15, 2019 at 11:39 AM Thomas Schneider <74cmo...@gmail.com> wrote:
>
> Hi,
>
> when I execute this command
> rbd ls -l
> to list all RBDs I get spamming errors:
>
> 2019-11-15 11:29:19.428 7fd852678700 0 SIGN: MSG 1 Sender did not set
> CEPH_MSG_FOOTER_SIGNED.
> 2019-11-15 11:29:19.428
This cluster has a long unhealthy story, means this issue is not
happening out of the blue.
root@ld3955:~# ceph -s
cluster:
id: 6b1b5117-6e08-4843-93d6-2da3cf8a6bae
health: HEALTH_WARN
1 MDSs report slow metadata IOs
noscrub,nodeep-scrub flag(s) set
Hello,
the client is using this version:
root@ld3955:~# ceph versions
{
"mon": {
"ceph version 14.2.4-1-gd592e56
(d592e56e74d94c6a05b9240fcb0031868acefbab) nautilus (stable)": 3
},
"mgr": {
"ceph version 14.2.4-1-gd592e56
(d592e56e74d94c6a05b9240fcb0031868acefbab) nauti
On 11/15/19 11:22 AM, Thomas Schneider wrote:
> Hi,
> ceph health is reporting: pg 59.1c is creating+down, acting [426,438]
>
> root@ld3955:~# ceph health detail
> HEALTH_WARN 1 MDSs report slow metadata IOs; noscrub,nodeep-scrub
> flag(s) set; Reduced data availability: 1 pg inactive, 1 pg down
On 11/15/19 11:38 AM, Thomas Schneider wrote:
> Hi,
>
> when I execute this command
> rbd ls -l
> to list all RBDs I get spamming errors:
>
Those errors are weird. Can you share the Ceph cluster version and the
clients?
$ ceph versions
And then also use rpm/dpkg to check which version of Ce
Hi,
when I execute this command
rbd ls -l
to list all RBDs I get spamming errors:
2019-11-15 11:29:19.428 7fd852678700 0 SIGN: MSG 1 Sender did not set
CEPH_MSG_FOOTER_SIGNED.
2019-11-15 11:29:19.428 7fd852678700 0 SIGN: MSG 1 Message signature
does not match contents.
2019-11-15 11:29:19.428
Hi,
ceph health is reporting: pg 59.1c is creating+down, acting [426,438]
root@ld3955:~# ceph health detail
HEALTH_WARN 1 MDSs report slow metadata IOs; noscrub,nodeep-scrub
flag(s) set; Reduced data availability: 1 pg inactive, 1 pg down; 1
subtrees have overcommitted pool target_size_bytes; 1 su
Hi,
I have two new-ish 14.2.4 clusters that began life on 14.2.0 , all with
HDD OSDs with SSD DB/WALs but neither have experienced obvious problems yet.
What's the impact of this? Does possible data corruption mean possible
silent data corruption?
Or does the corruption cause the OSD failures
hi,cool guys,
Recently,we had encountered a problem,the journal of MDS daemon could't be
trimmed,resulting in a large amount of space occupied by the metadata pool.so
what we could think out was using the admin socket command to flush journal,you
know,It got worse,the admin thread of MDS wa
17 matches
Mail list logo