[ceph-users] Re: Fwd: Re: Issues with Ceph network redundancy using L2 MC-LAG

2021-06-16 Thread Andrew Walker-Brown
switches (40 GB) everything is active/active, so multiple paths as Andrew describes in his config Our config allows us: bring down one of the switches for upgrades bring down an iscsi gatway for patching all the while at least one path is up and servicing Thanks Joe >>> Andrew Walker

[ceph-users] Re: Fwd: Re: Issues with Ceph network redundancy using L2 MC-LAG

2021-06-16 Thread Andrew Walker-Brown
e 2021 09:29 To: Andrew Walker-Brown<mailto:andrew_jbr...@hotmail.com> Cc: huxia...@horebdata.cn<mailto:huxia...@horebdata.cn>; Joe Comeau<mailto:joe.com...@hli.ubc.ca>; ceph-users<mailto:ceph-users@ceph.io> Subject: Re: [ceph-users] Re: Fwd: Re: Issues with Ceph

[ceph-users] Re: Ceph Managers dieing?

2021-06-17 Thread Andrew Walker-Brown
Changing pg_num and pgp_num manually can be a useful tool. Just remember that they need to be factor of 2, don’t increase or decease more than a couple of steps e.g. 64 to 128 or 256….but not to 1024 etc. I had a situation where a couple of OSDs got quite full. I added more capacity but the

[ceph-users] Re: Issues with Ceph network redundancy using L2 MC-LAG

2021-06-15 Thread Andrew Walker-Brown
With an unstable link/port you could see the issues you describe. Ping doesn’t have the packet rate for you to necessarily have a packet in transit at exactly the same time as the port fails temporarily. Iperf on the other hand could certainly show the issue, higher packet rate and more

[ceph-users] Re: OSD lost: firmware bug in Kingston SSDs?

2021-05-06 Thread Andrew Walker-Brown
Hi Frank, I’m running the same SSDs (approx. 20) in Dell servers on HBA330’s. Haven’t had any issues and have suffered at least one power outage. Just checking the wcache setting and it shows as enabled. Running Octopus 15.1.9 and docker containers. Originally part of a Proxmox cluster but

[ceph-users] Re: OSD lost: firmware bug in Kingston SSDs?

2021-05-13 Thread Andrew Walker-Brown
y 2021 10:15 To: Andrew Walker-Brown<mailto:andrew_jbr...@hotmail.com>; ceph-users@ceph.io<mailto:ceph-users@ceph.io> Subject: Re: OSD lost: firmware bug in Kingston SSDs? Hi Andrew, I did a few power-out tests by pulling the power cord of a server several times. This server contains a

[ceph-users] Re: Connect ceph to proxmox

2021-06-05 Thread Andrew Walker-Brown
Hi Yes you can, check on the Proxmox forums for details, but that’s how I run my setup. Andrew Sent from my iPhone On 5 Jun 2021, at 03:18, Szabo, Istvan (Agoda) wrote: Hi, Is there a way to connect from my nautilus ceph setup the pool that I created in ceph to proxmox? Or need a

[ceph-users] Re: Nic bonding (lacp) settings for ceph

2021-06-28 Thread Andrew Walker-Brown
HI, I think ad_select is only relevant in the scenario below I.e where you have more than one port-channel being presented to the Linux bond. So below, you have 2 port channels, one from each switch, but at the Linux side all the ports involved are slaves in the same bond. In your scenario

[ceph-users] Re: Upgrade and lost osds Operation not permitted

2021-04-04 Thread Andrew Walker-Brown
Are the file permissions correct and UID/guid in passwd both 167? Sent from my iPhone On 4 Apr 2021, at 12:29, Lomayani S. Laizer wrote: Hello, +1 Am facing the same problem in ubuntu after upgrade to pacific 2021-04-03T10:36:07.698+0300 7f9b8d075f00 -1 bluestore(/var/lib/ceph/osd/

[ceph-users] Re: Installation of Ceph on Ubuntu 18.04 TLS

2021-04-04 Thread Andrew Walker-Brown
Majid, Check out the install guide for Ubuntu using Cephadm here: https://docs.ceph.com/en/latest/cephadm/install/ Basically install cephadm using apt, then use the Cephadm bootstrap command to get the first mon up and running. For additional hosts, make sure you have them in the hosts file

[ceph-users] Re: Upgrade and lost osds Operation not permitted

2021-04-04 Thread Andrew Walker-Brown
storage service:/var/lib/ceph:/usr/sbin/nologin On Sun, Apr 4, 2021 at 6:47 PM Andrew Walker-Brown mailto:andrew_jbr...@hotmail.com>> wrote: UID and guid should both be 167 I believe. Make a note of the current values and change them to 167 using usermod and groupmod. I had just this issue

[ceph-users] Re: Upgrade and lost osds Operation not permitted

2021-04-04 Thread Andrew Walker-Brown
4 14:11 require_osd_release -rw--- 1 ceph ceph 10 Apr 4 14:11 type -rw--- 1 ceph ceph3 Apr 4 14:11 whoami On Sun, Apr 4, 2021 at 3:07 PM Andrew Walker-Brown mailto:andrew_jbr...@hotmail.com>> wrote: Are the file permissions correct and UID/guid in passwd both 167? Sen

[ceph-users] 3 x OSD work start after host reboot

2021-03-11 Thread Andrew Walker-Brown
Hi all, I’m just testing a new cluster and after shutting down one of the hosts, when I bring it back up none of the OSD’s will restart. The services fail to start of the osd’s and systemctl status for the service states “failed with result ‘exit-code’” Where to start looking for the root

[ceph-users] Re: Networking Idea/Question

2021-03-16 Thread Andrew Walker-Brown
https://docs.ceph.com/en/latest/rados/configuration/network-config-ref/ Sent from Mail for Windows 10 From: Tony Liu Sent: 16 March 2021 16:16 To: Stefan Kooman; Dave

[ceph-users] Quick quota question

2021-03-17 Thread Andrew Walker-Brown
Hi all When setting a quota on a pool (or directory in Cephfs), is it the amount of client data written or the client data x number of replicas that counts toward the quota? Cheers A Sent from my iPhone ___ ceph-users mailing list --

[ceph-users] Re: Quick quota question

2021-03-17 Thread Andrew Walker-Brown
Ahh ok, good to know! Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10 From: Stefan Kooman<mailto:ste...@bit.nl> Sent: 17 March 2021 10:37 To: Andrew Walker-Brown<mailto:andrew_jbr...@hotmail.com>; Magnus HAGDORN<mailto:magnus.hagd...@ed.ac.uk&

[ceph-users] Re: Quick quota question

2021-03-17 Thread Andrew Walker-Brown
8:26 +0000, Andrew Walker-Brown wrote: > When setting a quota on a pool (or directory in Cephfs), is it the > amount of client data written or the client data x number of replicas > that counts toward the quota? It's the amount of data stored so independent of replication level.

[ceph-users] Re: Quick quota question

2021-03-17 Thread Andrew Walker-Brown
quota question Hi, On 3/17/21 11:28 AM, Andrew Walker-Brown wrote: > Hi Magnus, > > Thanks for the reply. Just to be certain (I’m having a slow day today), it’s > the amount of data stored by the clients. As an example. a pool using 3 > replicas and a quota 3TB : clients would be

[ceph-users] Re: howto:: emergency shutdown procedure and maintenance

2021-03-19 Thread Andrew Walker-Brown
Hi Adrian, For maintenance, this is the procedure I’d follow: https://ceph.io/planet/how-to-do-a-ceph-cluster-maintenance-shutdown/ Difference between maintenance and emergency; I’d probably set all the flags as per maintenance but down the OSD’s at the same time followed by all the

[ceph-users] Re: Networking Idea/Question

2021-03-15 Thread Andrew Walker-Brown
Dave That’s the way our cluster is setup. It’s relatively small, 5 hosts, 12 osd’s. Each host has 2x10G with LACP to the switches. We’ve vlan’d public/private networks. Making best use of the LACP lag will to a greater extent be down to choosing the best hashing policy. At the moment

[ceph-users] Re: Best way to add OSDs - whole node or one by one?

2021-03-12 Thread Andrew Walker-Brown
Dave, Worth just looking at utilisation across your OSD’s. I’ve had Pgs get stuck in backfill-wait-too big when I’ve added new osds. Ceph was unable to move Pg around onto a smaller capacity osd that was quite full. I had to increase the number of pgs (and pg_num) for it to get sorted (and do

[ceph-users] Email alerts from Ceph

2021-03-17 Thread Andrew Walker-Brown
Hi all, How have folks implemented getting email or snmp alerts out of Ceph? Getting things like osd/pool nearly full or osd/daemon failures etc. Kind regards Andrew Sent from my iPhone ___ ceph-users mailing list -- ceph-users@ceph.io To

[ceph-users] Re: Same data for two buildings

2021-03-17 Thread Andrew Walker-Brown
Denis, I’m doing something similar to you with 5 nodes, 4 with OSDs and a 5th just as a mon. I have pools set with 4 replicas, minimum 2, crush map configured so 2 replicas go to each DC and then to host level. 5th mon is in a third location, but could be a VM with higher latency somewhere

[ceph-users] Re: ceph boostrap initialization :: nvme drives not empty after >12h

2021-03-12 Thread Andrew Walker-Brown
Hi Adrian, If you’re just using this for test/familiarity and performance isn’t an issue, then I’d create 3 x VMs on your host server and use them for Ceph. It’ll work fine, just don’t expect Gb/s in transfer speeds  A> Sent from Mail for

[ceph-users] Re: HEALTH_WARN - Recovery Stuck?

2021-04-12 Thread Andrew Walker-Brown
If you increase the number of pgs, effectively each one is smaller so the backfill process may be able to ‘squeeze’ them onto the nearly full osds while it sorts things out. I’ve had something similar before and this def helped. Sent from my iPhone On 12 Apr 2021, at 19:11, Marc wrote: 

[ceph-users] Logging to Graylog

2021-04-19 Thread Andrew Walker-Brown
Hi All, I want to send Ceph logs out to an external Graylog server. I’ve configured the Graylog host IP using “ceph config set global log_graylog_host x.x.x.x” and enabled logging through the Ceph dashboard (I’m running Octopus 15.2.9 – container based). I’ve also setup a GELF UDP input on

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-08-17 Thread Andrew Walker-Brown
Hi, I’m coming at this from the position of a newbie to Ceph. I had some experience of it as part of Proxmox, but not as a standalone solution. I really don’t care whether Ceph is contained or not, I don’t have the depth of knowledge or experience to argue it either way. I can see that

[ceph-users] Re: All OSDs on one host down

2021-08-07 Thread Andrew Walker-Brown
From: David Caro<mailto:dc...@wikimedia.org> Sent: 06 August 2021 09:20 To: Andrew Walker-Brown<mailto:andrew_jbr...@hotmail.com> Cc: Marc<mailto:m...@f1-outsourcing.eu>; ceph-users@ceph.io<mailto:ceph-users@ceph.io> Subject: Re: [ceph-users] Re: All OSDs on one host down

[ceph-users] Re: All OSDs on one host down

2021-08-07 Thread Andrew Walker-Brown
t 2021 10:46 To: Andrew Walker-Brown<mailto:andrew_jbr...@hotmail.com>; David Caro<mailto:dc...@wikimedia.org> Cc: Marc<mailto:m...@f1-outsourcing.eu>; ceph-users@ceph.io<mailto:ceph-users@ceph.io> Subject: Re: [ceph-users] Re: All OSDs on one host down Hi Andrew, we ha

[ceph-users] All OSDs on one host down

2021-08-06 Thread Andrew Walker-Brown
Hi all, Bit of a panic. Woke this morning to find on of my dedicated mon hosts showing as down. I did a reboot on the host and it came back up fine (ceph-005). Then all the OSDs (5) on host ceph-004 went down. This host is also a mon and the mon daemon is showing as up. I’m running octopus

[ceph-users] Re: All OSDs on one host down

2021-08-06 Thread Andrew Walker-Brown
ail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10 From: David Caro<mailto:dc...@wikimedia.org> Sent: 06 August 2021 09:20 To: Andrew Walker-Brown<mailto:andrew_jbr...@hotmail.com> Cc: Marc<mailto:m...@f1-outsourcing.eu>; ceph-users@ceph.io<mailto:ceph-users@c

[ceph-users] Re: All OSDs on one host down

2021-08-06 Thread Andrew Walker-Brown
, A Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10 From: Marc<mailto:m...@f1-outsourcing.eu> Sent: 06 August 2021 08:54 To: Andrew Walker-Brown<mailto:andrew_jbr...@hotmail.com>; ceph-users@ceph.io<mailto:ceph-users@ceph.io> Subject: RE: A

[ceph-users] Re: All OSDs on one host down

2021-08-06 Thread Andrew Walker-Brown
…. Need to some more digging….. Any thoughts would be appreciated. A Sent from my iPhone On 6 Aug 2021, at 09:20, David Caro wrote: On 08/06 07:59, Andrew Walker-Brown wrote: > Hi Marc, > > Yes i’m probably doing just that. > > The ceph admin guides aren’t