[ceph-users] Re: Do not use SSDs with (small) SLC cache

2023-02-21 Thread Phil Regnauld
Michael Wodniok (wodniok) writes: > Hi all, > > digging around debugging, why our (small: 10 Hosts/~60 OSDs) cluster is so > slow even while recovering I found out one of our key issues are some SSDs > with SLC cache (in our case Samsung SSD 870 EVO) - which we just recycled > from other use ca

[ceph-users] Re: Running ceph cluster on different os

2021-01-26 Thread Phil Regnauld
Anthony D'Atri (anthony.datri) writes: > I have firsthand experfience migrating multiple clusters from Ubuntu to RHEL, > preserving the OSDs along the way, with no loss or problems. > > It’s not like you’re talking about OpenVMS ;) :) We converted a cluster from Ubuntu 18.04 to

[ceph-users] Re: 10G stackabe lacp switches

2021-02-22 Thread Phil Regnauld
Fabien Sirjean (fsirjean) writes: > Hi, > > Yes, Cisco supports VPC (virtual port-channel) for LACP over multiple > switches. > > We're using 2x10G VPC on our Ceph and Proxmox hosts with 2 Cisco Nexus > 3064PQ (48 x 10G SFP+ & 4 x 40G). Same config here. Have set it up with LACP on the C

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-06-02 Thread Phil Regnauld
Dave Hall (kdhall) writes: > But the developers aren't out in the field with their deployments > when something weird impacts a cluster and the standard approaches don't > resolve it. And let's face it: Ceph is a marvelously robust solution for > large scale storage, but it is also an amazingly i

[ceph-users] Re: Can single Ceph cluster run on various OS families

2021-07-28 Thread Phil Regnauld
icy chan (icy.kf.chan) writes: > > Are there any practices for OS upgrade/migration that can be found from the > official site? > > My drafted plan is: > 1. [CentOS 7] Adopt the cluster by cephadm via upgrading it from Nautilus > to Octopus. > 2. Reinstall the nodes one by one (with new OS)

[ceph-users] Re: RFP for arm64 test nodes

2021-10-11 Thread Phil Regnauld
Martin Verges (martin.verges) writes: > Hello Dan, > > why not using a bit bigger machines and use VMs for tests? We have quite > good experience with that and it works like a charm. If you plan them as > hypervisors, you can run a lot of tests simultaneous. Use the 80 core ARM, Are you t

[ceph-users] iscsi issues with ceph (Nautilus) + tcmu-runner

2020-05-13 Thread Phil Regnauld
So, we've been running with iscsi enabled (tcmu-runner) on our Nautilus ceph cluster for a couple of weeks, and started using it with our vsphere cluster. Things looked good so we put it in production, but yesterday morning we experienced a freeze of all iSCSO I/O one of the ESXi nodes, and the onl

[ceph-users] Re: iscsi issues with ceph (Nautilus) + tcmu-runner

2020-05-14 Thread Phil Regnauld
Mike Christie (mchristi) writes: > > I've never seen this kernel crash before. It might be helpful to send > more of the log before the kernel warning below. These are the messages leading up to the warning (pretty much the same, with the occasional notice about an ongoing deep s

[ceph-users] CEPH failure domain - power considerations

2020-05-28 Thread Phil Regnauld
Hi, in our production cluster, we have the following setup - 10 nodes - 3 drives / server (so far), mix of SSD and HDD (different pools) + NVMe - dual 10G in LACP, linked to two different switches (Cisco vPC) - OSDs, MONs and MGRs are colocated - A + B power feeds, 2 ATS

[ceph-users] Re: CEPH failure domain - power considerations

2020-05-29 Thread Phil Regnauld
Chris Palmer (chris.palmer) writes: > Immediate thought: Forget about crush maps, osds, etc. If you lose half the > nodes (when one power rail fails) your MONs will lose quorum. I don't see > how you can win with that configuration... That's a good point, I'll have to think that one throug

[ceph-users] Re: CEPH failure domain - power considerations

2020-05-29 Thread Phil Regnauld
Hans van den Bogert (hansbogert) writes: > I would second that, there's no winning in this case for your requirements > and single PSU nodes. If there were 3 feeds,  then yes; you could make an > extra layer in your crushmap much like you would incorporate a rack topology > in the crushmap.

[ceph-users] Re: CEPH failure domain - power considerations

2020-05-29 Thread Phil Regnauld
Burkhard Linke (Burkhard.Linke) writes: > > Buy some power transfer switches. You can connect those to the two PDUs, and > in case of a power failure on one PDUs they will still be able to use the > second PDU. ATS = power switches (in my original mail). > We only use them for "small" ma

[ceph-users] Re: ceph iscsi latency too high for esxi?

2020-10-04 Thread Phil Regnauld
Yep, and we're still experiencing it every few months. One (and only one) of our ESXi nodes, which are otherwise identical, is experiencing total freeze of all I/O, and it won't recover - I mean, ESXi is so dead, we have to go into IPMI and reset the box... We're using Croit's software, but the is

[ceph-users] Re: Building a petabyte cluster from scratch

2019-12-04 Thread Phil Regnauld
Darren Soothill (darren.soothill) writes: > Hi Fabien, > > ZFS ontop of RBD really makes me shudder. ZFS expects to have individual disk > devices that it can manage. It thinks it has them with this config but CEPH > is masking the real data behind it. > > As has been said before why not just u

[ceph-users] Re: Can Ceph Do The Job?

2020-01-30 Thread Phil Regnauld
Bastiaan Visser (bastiaan) writes: > We are making hourly snapshots of ~400 rbd drives in one (spinning-rust) > cluster. The snapshots are made one by one. > Total size of the base images is around 80TB. The entire process takes a > few minutes. > We do not experience any problems doing this.

[ceph-users] Re: some ceph general questions about the design

2020-04-20 Thread Phil Regnauld
harald.freid...@gmail.com (harald.freidhof) writes: > > 1. shoud i use a raid controller a create for example a raid 5 with all disks > on each osd server? or should i passtrough all disks to ceph osd? Set up the raid controller in JBOD mode, and passthrough the disks. > 2. if i have a

[ceph-users] Re: Cluster network and public network

2020-05-09 Thread Phil Regnauld
Anthony D'Atri (anthony.datri) writes: > > During heavy recovery or backfill, including healing from failures, > balancing, adding/removing drives, much more will be used. > > Convention wisdom has been to not let that traffic DoS clients, or clients to > DoS heartbeats. [...] > If yo