Michael Wodniok (wodniok) writes:
> Hi all,
>
> digging around debugging, why our (small: 10 Hosts/~60 OSDs) cluster is so
> slow even while recovering I found out one of our key issues are some SSDs
> with SLC cache (in our case Samsung SSD 870 EVO) - which we just recycled
> from other use ca
Anthony D'Atri (anthony.datri) writes:
> I have firsthand experfience migrating multiple clusters from Ubuntu to RHEL,
> preserving the OSDs along the way, with no loss or problems.
>
> It’s not like you’re talking about OpenVMS ;)
:)
We converted a cluster from Ubuntu 18.04 to
Fabien Sirjean (fsirjean) writes:
> Hi,
>
> Yes, Cisco supports VPC (virtual port-channel) for LACP over multiple
> switches.
>
> We're using 2x10G VPC on our Ceph and Proxmox hosts with 2 Cisco Nexus
> 3064PQ (48 x 10G SFP+ & 4 x 40G).
Same config here. Have set it up with LACP on the C
Dave Hall (kdhall) writes:
> But the developers aren't out in the field with their deployments
> when something weird impacts a cluster and the standard approaches don't
> resolve it. And let's face it: Ceph is a marvelously robust solution for
> large scale storage, but it is also an amazingly i
icy chan (icy.kf.chan) writes:
>
> Are there any practices for OS upgrade/migration that can be found from the
> official site?
>
> My drafted plan is:
> 1. [CentOS 7] Adopt the cluster by cephadm via upgrading it from Nautilus
> to Octopus.
> 2. Reinstall the nodes one by one (with new OS)
Martin Verges (martin.verges) writes:
> Hello Dan,
>
> why not using a bit bigger machines and use VMs for tests? We have quite
> good experience with that and it works like a charm. If you plan them as
> hypervisors, you can run a lot of tests simultaneous. Use the 80 core ARM,
Are you t
So, we've been running with iscsi enabled (tcmu-runner) on our Nautilus ceph
cluster for a couple of weeks, and started using it with our vsphere cluster.
Things looked good so we put it in production, but yesterday morning we
experienced a freeze of all iSCSO I/O one of the ESXi nodes, and the onl
Mike Christie (mchristi) writes:
>
> I've never seen this kernel crash before. It might be helpful to send
> more of the log before the kernel warning below.
These are the messages leading up to the warning (pretty much the same,
with
the occasional notice about an ongoing deep s
Hi, in our production cluster, we have the following setup
- 10 nodes
- 3 drives / server (so far), mix of SSD and HDD (different pools) +
NVMe
- dual 10G in LACP, linked to two different switches (Cisco vPC)
- OSDs, MONs and MGRs are colocated
- A + B power feeds, 2 ATS
Chris Palmer (chris.palmer) writes:
> Immediate thought: Forget about crush maps, osds, etc. If you lose half the
> nodes (when one power rail fails) your MONs will lose quorum. I don't see
> how you can win with that configuration...
That's a good point, I'll have to think that one throug
Hans van den Bogert (hansbogert) writes:
> I would second that, there's no winning in this case for your requirements
> and single PSU nodes. If there were 3 feeds, then yes; you could make an
> extra layer in your crushmap much like you would incorporate a rack topology
> in the crushmap.
Burkhard Linke (Burkhard.Linke) writes:
>
> Buy some power transfer switches. You can connect those to the two PDUs, and
> in case of a power failure on one PDUs they will still be able to use the
> second PDU.
ATS = power switches (in my original mail).
> We only use them for "small" ma
Yep, and we're still experiencing it every few months. One (and only one) of
our ESXi nodes, which are otherwise identical, is experiencing total freeze
of all I/O, and it won't recover - I mean, ESXi is so dead, we have to go into
IPMI and reset the box...
We're using Croit's software, but the is
Darren Soothill (darren.soothill) writes:
> Hi Fabien,
>
> ZFS ontop of RBD really makes me shudder. ZFS expects to have individual disk
> devices that it can manage. It thinks it has them with this config but CEPH
> is masking the real data behind it.
>
> As has been said before why not just u
Bastiaan Visser (bastiaan) writes:
> We are making hourly snapshots of ~400 rbd drives in one (spinning-rust)
> cluster. The snapshots are made one by one.
> Total size of the base images is around 80TB. The entire process takes a
> few minutes.
> We do not experience any problems doing this.
harald.freid...@gmail.com (harald.freidhof) writes:
>
> 1. shoud i use a raid controller a create for example a raid 5 with all disks
> on each osd server? or should i passtrough all disks to ceph osd?
Set up the raid controller in JBOD mode, and passthrough the disks.
> 2. if i have a
Anthony D'Atri (anthony.datri) writes:
>
> During heavy recovery or backfill, including healing from failures,
> balancing, adding/removing drives, much more will be used.
>
> Convention wisdom has been to not let that traffic DoS clients, or clients to
> DoS heartbeats.
[...]
> If yo
17 matches
Mail list logo