[ceph-users] Re: recommendation for barebones server with 8-12 direct attach NVMe?

2024-01-13 Thread Anthony D';Atri
Why use such a card and M.2 drives that I suspect aren’t enterprise-class? Instead of U.2, E1.s, or E3.s ? > On Jan 13, 2024, at 5:10 AM, Mike O'Connor wrote: > > On 13/1/2024 1:02 am, Drew Weaver wrote: >> Hello, >> >> So we were going to replace a Ceph cluster with some hardware we had layin

[ceph-users] Re: Recomand number of k and m erasure code

2024-01-13 Thread Anthony D';Atri
There are nuances, but in general the higher the sum of m+k, the lower the performance, because *every* operation has to hit that many drives, which is especially impactful with HDDs. So there’s a tradeoff between storage efficiency and performance. And as you’ve seen, larger parity groups es

[ceph-users] Re: About ceph disk slowops effect to cluster

2024-01-12 Thread Anthony D';Atri
> On Jan 12, 2024, at 03:31, Phong Tran Thanh wrote: > > Hi Yang and Anthony, > > I found the solution for this problem on a HDD disk 7200rpm > > When the cluster recovers, one or multiple disk failures because slowop > appears and then affects the cluster, we can change these configurations

[ceph-users] Re: How does mclock work?

2024-01-09 Thread Anthony D';Atri
y from it imho. In an > alternate universe it would have been really neat if Intel could have worked > with the HDD vendors to put like 16GB of user accessible optane on every HDD. > Enough for the WAL and L0 (and maybe L1). > > > Mark > > > On 1/9/24 08:53, Anth

[ceph-users] Re: How does mclock work?

2024-01-09 Thread Anthony D';Atri
Not strictly an answer to your worthy question, but IMHO this supports my stance that hybrid OSDs aren't worth the hassle. > On Jan 9, 2024, at 06:13, Frédéric Nass > wrote: > > With hybrid setups (RocksDB+WAL on SSDs or NVMes and Data on HDD), if mclock > only considers write performance,

[ceph-users] Re: Ceph newbee questions

2024-01-03 Thread Anthony D';Atri
be there after replication. No lost data? >> Correct, *if* nothing happens to the survivors. But unless you take manual >> steps, data will be unavailable. >> Most of the time if a node fails you can replace a DIMM etc. and bring it >> back. >>> Many thanks!! >

[ceph-users] Re: Ceph newbee questions

2024-01-01 Thread Anthony D';Atri
ual steps, data will be unavailable. Most of the time if a node fails you can replace a DIMM etc. and bring it back. > > Many thanks!! > > Regards > Marcus > > > > On fre, dec 22 2023 at 19:12:19 -0500, Anthony D'Atri > wrote: >>>> You can

[ceph-users] Re: Ceph newbee questions

2023-12-22 Thread Anthony D';Atri
>> >> You can do that for a PoC, but that's a bad idea for any production >> workload. You'd want at least three nodes with OSDs to use the default RF=3 >> replication. You can do RF=2, but at the peril of your mortal data. > > I'm not sure I agree - I think size=2, min_size=2 is no worse t

[ceph-users] Re: Ceph newbee questions

2023-12-22 Thread Anthony D';Atri
> I have manually configured a ceph cluster with ceph fs on debian bookworm. Bookworm support is very, very recent I think. > What is the difference from installing with cephadm compared to manuall > install, > any benefits that you miss with manual install? A manual install is dramatically m

[ceph-users] Re: Building new cluster had a couple of questions

2023-12-22 Thread Anthony D';Atri
> Sorry I thought of one more thing. > > I was actually re-reading the hardware recommendations for Ceph and it seems > to imply that both RAID controllers as well as HBAs are bad ideas. Advice I added most likely ;) "RAID controllers" *are* a subset of HBAs BTW. The nomenclature can be co

[ceph-users] Re: Is there a way to find out which client uses which version of ceph?

2023-12-21 Thread Anthony D';Atri
[rook@rook-ceph-tools-5ff8d58445-gkl5w .aws]$ ceph features { "mon": [ { "features": "0x3f01cfbf7ffd", "release": "luminous", "num": 3 } ], "osd": [ { "features": "0x3f01cfbf7ffd", "release": "lu

[ceph-users] Re: Ceph Cluster Deployment - Recommendation

2023-12-18 Thread Anthony D';Atri
Four servers doth not a quality cluster make. This setup will work, but you can't use a reasonable EC profile for your bucket pool. Aim higher than the party line wrt PG counts esp. for the index pool. > On Dec 18, 2023, at 10:19, Amardeep Singh > wrote: > > Hi Everyone, > > We are in

[ceph-users] Re: Difficulty adding / using a non-default RGW placement target & storage class

2023-12-07 Thread Anthony D';Atri
Following up on my own post from last month, for posterity. The trick was updating the period. I'm not using multisite, but Rook seems to deploy so that one can. -- aad > On Nov 6, 2023, at 16:52, Anthony D'Atri wrote: > > I'm having difficulty adding and using

[ceph-users] Re: How to identify the index pool real usage?

2023-12-01 Thread Anthony D';Atri
>> >> Today we had a big issue with slow ops on the nvme drives which holding >> the index pool. >> >> Why the nvme shows full if on ceph is barely utilized? Which one I should >> belive? >> >> When I check the ceph osd df it shows 10% usage of the osds (1x 2TB nvme >> drive has 4x osds on it):

[ceph-users] Re: Recommended architecture

2023-11-30 Thread Anthony D';Atri
I try to address these ideas in https://www.amazon.com/Learning-Ceph-scalable-reliable-solution-ebook/dp/B01NBP2D9I though as with any tech topic the details change over time. It's difficult to interpret the table the OP included, but I think it shows a 3 node cluster. When you only have 3 nod

[ceph-users] Re: Best Practice for OSD Balancing

2023-11-28 Thread Anthony D';Atri
Sent to quickly — also note that consumer / client SSDs often don’t have powerloss protection, so if your whole cluster were to lose power at the wrong time, you might lose data. > On Nov 28, 2023, at 8:16 PM, Anthony D'Atri wrote: > > >>> >>> 1) They

[ceph-users] Re: Best Practice for OSD Balancing

2023-11-28 Thread Anthony D';Atri
>> >> 1) They’re client aka desktop SSDs, not “enterprise” >> 2) They’re a partition of a larger OSD shared with other purposes > > Yup. They're a mix of SATA SSDs and NVMes, but everything is > consumer-grade. They're only 10% full on average and I'm not > super-concerned with performance. I

[ceph-users] Re: Best Practice for OSD Balancing

2023-11-28 Thread Anthony D';Atri
>> Very small and/or non-uniform clusters can be corner cases for many things, >> especially if they don’t have enough PGs. What is your failure domain — >> host or OSD? > > Failure domain is host, Your host buckets do vary in weight by roughly a factor of two. They naturally will get PGs m

[ceph-users] Re: Best Practice for OSD Balancing

2023-11-28 Thread Anthony D';Atri
> > I'm fairly new to Ceph and running Rook on a fairly small cluster > (half a dozen nodes, about 15 OSDs). Very small and/or non-uniform clusters can be corner cases for many things, especially if they don’t have enough PGs. What is your failure domain — host or OSD? Are your OSDs sized u

[ceph-users] Re: OSDs failing to start due to crc32 and osdmap error

2023-11-27 Thread Anthony D';Atri
The options Wes listed are for data, not RocksDB. > On Nov 27, 2023, at 1:59 PM, Denis Polom wrote: > > Hi, > > no we don't: > > "bluestore_rocksdb_options": > "compression=kNoCompression,max_write_buffer_number=4,min_write_buffer_number_to_merge=1,recycle_log_file_num=4,write_buffer_size=268

[ceph-users] Re: easy way to find out the number of allocated objects for a RBD image

2023-11-25 Thread Anthony D';Atri
If there’s a filesystem on the volume, running fstrim or mounting with the discard option might significantly reduce usage and block count. > On Nov 25, 2023, at 1:02 PM, Tony Liu wrote: > > Thank you Eugen! "rbd du" is it. > The used_size from "rbd du" is object count times object size. > T

[ceph-users] Re: cephadm vs ceph.conf

2023-11-23 Thread Anthony D';Atri
>>> >>> Should I modify the ceph.conf (vi/emacs) directly ? >> >> vi is never the answer. > > WTF ? You break my dream ;-) ;-) Let line editors die. >> > You're right. > > Currently I'm testing > > 17.2.7 quincy. > > So in my daily life how I would know if I should use ceph config or >

[ceph-users] Re: cephadm vs ceph.conf

2023-11-23 Thread Anthony D';Atri
> to change something in the /etc/ceph/ceph.conf. Central config was introduced with Mimic. Since both central config and ceph.conf work and are supported, explicitly mentioning both in the docs every time is a lot of work (and awkward). One day we’ll sort out an effective means to generali

[ceph-users] Re: Erasure vs replica

2023-11-23 Thread Anthony D';Atri
Yes, lots of people are using EC. Which is more “reliable” depends on what you need. If you need to survive 4 failures, there are scenarios where RF=3 won’t do it for you. You could in such a case use an EC 4,4 profile, 8,4, etc. It’s a tradeoff between write speed and raw::usable ratio effi

[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2023-11-21 Thread Anthony D';Atri
I encountered mgr ballooning multiple times with Luminous, but have not since. At the time, I could often achieve relief by sending the admin socket a heap release - it would show large amounts of memory unused but not yet released. That experience is one reason I got Rook recently to allow pro

[ceph-users] Re: How to use hardware

2023-11-18 Thread Anthony D';Atri
Common motivations for this strategy include the lure of unit economics and RUs. Often ultra dense servers can’t fill racks anyway due to power and weight limits. Here the osd_memory_target would have to be severely reduced to avoid oomkilling. Assuming the OSDs are top load LFF HDDs with e

[ceph-users] Re: blustore osd nearfull but no pgs on it

2023-11-18 Thread Anthony D';Atri
I was thinking the same thing. Very small OSDs can behave unexpectedly because of the relatively high percentage of overhead. > On Nov 18, 2023, at 3:08 AM, Eugen Block wrote: > > Do you have a large block.db size defined in the ceph.conf (or config store)? > > Zitat von Debian : > >> th

[ceph-users] Re: CEPH Cluster performance review

2023-11-11 Thread Anthony D';Atri
I'm going to assume that ALL of your pools are replicated with size 3, since you didn't provide that info, and that all but the *hdd pools are on SSDs. `ceph osd dump | grep pool` Let me know if that isn't the case. With that assumption, I make your pg ratio to be ~ 57, which is way too low. R

[ceph-users] Re: Ceph Dashboard - Community News Sticker [Feedback]

2023-11-09 Thread Anthony D';Atri
IMHO we don't need yet another place to look for information, especially one that some operators never see. ymmv. > >> Hello, >> >> We wanted to get some feedback on one of the features that we are planning >> to bring in for upcoming releases. >> >> On the Ceph GUI, we thought it could be in

[ceph-users] Difficulty adding / using a non-default RGW placement target & storage class

2023-11-06 Thread Anthony D';Atri
I'm having difficulty adding and using a non-default placement target & storage class and would appreciate insights. Am I going about this incorrectly? Rook does not yet have the ability to do this, so I'm adding it by hand. Following instructions on the net I added a second bucket pool, place

[ceph-users] Re: resharding RocksDB after upgrade to Pacific breaks OSDs

2023-11-03 Thread Anthony D';Atri
nm, Adam beat me to it. > On Nov 3, 2023, at 11:40, Josh Baergen wrote: > > The ticket has been updated, but it's probably important enough to > state on the list as well: The documentation is currently wrong in a > way that running the command as documented will cause this corruption. > The cor

[ceph-users] Re: resharding RocksDB after upgrade to Pacific breaks OSDs

2023-11-03 Thread Anthony D';Atri
If someone can point me at the errant docs locus I'll make it right. > On Nov 3, 2023, at 11:45, Laura Flores wrote: > > Yes, Josh beat me to it- this is an issue of incorrectly documenting the > command. You can try the solution posted in the tracker issue. > > On Fri, Nov 3, 2023 at 10:43 AM

[ceph-users] Re: Emergency, I lost 4 monitors but all osd disk are safe

2023-11-02 Thread Anthony D';Atri
This admittedly is the case throughout the docs. > On Nov 2, 2023, at 07:27, Joachim Kraftmayer - ceph ambassador > wrote: > > Hi, > > another short note regarding the documentation, the paths are designed for a > package installation. > > the paths for container installation look a bit diff

[ceph-users] Re: Stickyness of writing vs full network storage writing

2023-10-28 Thread Anthony D';Atri
ty, strong consistency and > higher failure domains as host we do with Ceph. > > Joachim > > ___ > ceph ambassador DACH > ceph consultant since 2012 > > Clyso GmbH - Premier Ceph Foundation Member > > https://www.clyso.com/

[ceph-users] Re: Stickyness of writing vs full network storage writing

2023-10-27 Thread Anthony D';Atri
Ceph is all about strong consistency and data durability. There can also be a distinction between performance of the cluster in aggregate vs a single client, especially in a virtualization scenario where to avoid the noisy-neighbor dynamic you deliberately throttle iops and bandwidth per client

[ceph-users] Re: Moving devices to a different device class?

2023-10-24 Thread Anthony D';Atri
Ah, our old friend the P5316. A few things to remember about these: * 64KB IU means that you'll burn through endurance if you do a lot of writes smaller than that. The firmware will try to coalesce smaller writes, especially if they're sequential. You probably want to keep your RGW / CephFS

[ceph-users] Re: How to deal with increasing HDD sizes ? 1 OSD for 2 LVM-packed HDDs ?

2023-10-18 Thread Anthony D';Atri
This is one of many reasons for not using HDDs ;) One nuance that is easy overlooked is the CRUSH weight of failure domains. If, say, you have a failure domain of "rack" with size=3 replicated pools and 3x CRUSH racks, if you add the new, larger OSDs to only one rack, you will not increase the

[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-13 Thread Anthony D';Atri
ening and how the issue can > be alleviated or resolved, unfortunately monitor RocksDB usage and tunables > appear to be not documented at all. > > /Z > > On Fri, 13 Oct 2023 at 20:11, Anthony D'Atri <mailto:anthony.da...@gmail.com>> wrote: >> cf. Mark&#x

[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-13 Thread Anthony D';Atri
cf. Mark's article I sent you re RocksDB tuning. I suspect that with Reef you would experience fewer writes. Universal compaction might also help, but in the end this SSD is a client SKU and really not suited for enterprise use. If you had the 1TB SKU you'd get much longer life, or you could

[ceph-users] Re: Hardware recommendations for a Ceph cluster

2023-10-08 Thread Anthony D';Atri
> AFAIK the standing recommendation for all flash setups is to prefer fewer > but faster cores Hrm, I think this might depend on what you’re solving for. This is the conventional wisdom for MDS for sure. My sense is that OSDs can use multiple cores fairly well, so I might look at the cores *

[ceph-users] Re: Hardware recommendations for a Ceph cluster

2023-10-06 Thread Anthony D';Atri
> Currently, I have an OpenStack installation with a Ceph cluster consisting of > 4 servers for OSD, each with 16TB SATA HDDs. My intention is to add a second, > independent Ceph cluster to provide faster disks for OpenStack VMs. Indeed, I know from experience that LFF spinners don't cut it fo

[ceph-users] Re: ceph osd down doesn't seem to work

2023-10-03 Thread Anthony D';Atri
And unless you *need* a given ailing OSD to be up because it's the only copy of data, you may get better recovery/backfill results by stopping the service for that OSD entirely, so that the recovery reads all to to healthier OSDs. > On Oct 3, 2023, at 12:21, Josh Baergen wrote: > > Hi Simon, >

[ceph-users] Re: Balancer blocked as autoscaler not acting on scaling change

2023-09-26 Thread Anthony D';Atri
Note that this will adjust override reweight values, which will conflict with balancer upmaps. > On Sep 26, 2023, at 3:51 AM, c...@elchaka.de wrote: > > Hi an idea is to see what > > Ceph osd test-reweight-by-utilization > shows. > If it looks usefull you can run the above command without "

[ceph-users] Re: Separating Mons and OSDs in Ceph Cluster

2023-09-09 Thread Anthony D';Atri
That may be the very one I was thinking of, though the OP seemed to be preserving the IP addresses, so I suspect containerization is in play. > On Sep 9, 2023, at 11:36 AM, Tyler Stachecki > wrote: > > On Sat, Sep 9, 2023 at 10:48 AM Anthony D'Atri > wrote: >>

[ceph-users] Re: Separating Mons and OSDs in Ceph Cluster

2023-09-09 Thread Anthony D';Atri
Which Ceph release are you running, and how was it deployed? With some older releases I experienced mons behaving unexpectedly when one of the quorum bounced, so I like to segregate them for isolation still. There was also at point an issue where clients wouldn’t get a runtime update of new m

[ceph-users] Re: Is it possible (or meaningful) to revive old OSDs?

2023-09-06 Thread Anthony D';Atri
Resurrection usually only makes sense if fate or a certain someone resulted in enough overlapping removed OSDs that you can't meet min_size. I've had to a couple of times :-/ If an OSD is down for more than a short while, backfilling a redeployed OSD will likely be faster than waiting for it t

[ceph-users] Re: Critical Information: DELL/Toshiba SSDs dying after 70,000 hours of operation

2023-09-01 Thread Anthony D';Atri
Is a secure-erase suggested after the firmware update? Sometimes manufacturers do that. > On Sep 1, 2023, at 05:16, Frédéric Nass > wrote: > > Hello, > > This message to inform you that DELL has released a new firmwares for these > SSD drives to fix the 70.000 POH issue: > > [ > https:/

[ceph-users] Re: Status of diskprediction MGR module?

2023-08-28 Thread Anthony D';Atri
>> The module don't have new commits for more than two year > > So diskprediction_local is unmaintained. Will it be removed? > It looks like a nice feature but when you try to use it it's useless. IIRC it has only a specific set of drive models, and the binary blob from ProphetStor. >> I sugg

[ceph-users] Re: cephadm to setup wal/db on nvme

2023-08-25 Thread Anthony D';Atri
> Thank you for reply, > > I have created two class SSD and NvME and assigned them to crush maps. You don't have enough drives to keep them separate. Set the NVMe drives back to "ssd" and just make one pool. > > $ ceph osd crush rule ls > replicated_rule > ssd_pool > nvme_pool > > > Runni

[ceph-users] Re: OSD delete vs destroy vs purge

2023-08-19 Thread Anthony D';Atri
> Thanks Eugen for the explanation. To summarize what I understood: > - delete from GUI simply does a drain+destroy; > - destroy will preserve the OSD id so that it will be used by the next OSD > that will be created on that host; > - purge will remove everything, and the next OSD that will be c

[ceph-users] Re: Decrepit ceph cluster performance

2023-08-13 Thread Anthony D';Atri
>> As per recent isdct/intelmas/sst? The web site? > > Yes. It's all "Solidigm" now, which has made information harder to > find and firmware harder to get, but these drives aren't exactly > getting regular updates at this point. Exactly. "isdct" more or less became "intelmas", and post-sep

[ceph-users] Re: Decrepit ceph cluster performance

2023-08-13 Thread Anthony D';Atri
> >> The OP implies that the cluster's performance *degraded* with the Quincy >> upgrade.I wonder if there was a kernel change at the same time. > > No, it's never been great. But it's definitely getting worse over > time. That is most likely correlated with increased utilization (both > in term

[ceph-users] Re: Decrepit ceph cluster performance

2023-08-13 Thread Anthony D';Atri
> > Also, 1 CPU core/OSD is definitely undersized. I'm not sure how much > you have -- but you want at least a couple per OSD for SSD, and even > more for NVMe... especially when it comes to small block write > workloads. Think you meant s/SSD/SAS|SATA/ If the OP means physical core, granted

[ceph-users] Re: librbd 4k read/write?

2023-08-13 Thread Anthony D';Atri
Yep. Remember that most Ceph clusters serve a number of simultaneous clients, so the “IO blender” effect more or less presents a random workload to drives. Dedicated single-client node-local drives might benefit from such strategies. But really gymnastics like this for uncertain gain serve

[ceph-users] Re: librbd 4k read/write?

2023-08-11 Thread Anthony D';Atri
> > This is an expected result, and it is not specific to Ceph. Any > storage that consists of multiple disks will produce a performance > gain over a single disk only if the workload allows for concurrent use > of these disks - which is not the case with your 4K benchmark due to > the de-facto

[ceph-users] Re: librbd 4k read/write?

2023-08-10 Thread Anthony D';Atri
> > Good afternoon everybody! > > I have the following scenario: > Pool RBD replication x3 > 5 hosts with 12 SAS spinning disks each Old hardware? SAS is mostly dead. > I'm using exactly the following line with FIO to test: > fio -ioengine=libaio -direct=1 -invalidate=1 -name=test -bs=4M -s

[ceph-users] Re: Disk device path changed - cephadm faild to apply osd service

2023-08-02 Thread Anthony D';Atri
This. You can even constrain placement by size or model number. > On Aug 2, 2023, at 6:53 AM, Eugen Block wrote: > > But that could be done easily like this: > > service_type: osd > service_id: ssd-db > service_name: osd.ssd-db > placement: > hosts: > - storage01 > - storage02 > ... > spec:

[ceph-users] Re: Adding datacenter level to CRUSH tree causes rebalancing

2023-07-20 Thread Anthony D';Atri
I can believe the month timeframe for a cluster with multiple large spinners behind each HBA. I’ve witnessed such personally. > On Jul 20, 2023, at 4:16 PM, Michel Jouvin > wrote: > > Hi Niklas, > > As I said, ceph placement is based on more than fulfilling the failure domain > constraint.

[ceph-users] Re: 1 PG stucked in "active+undersized+degraded for long time

2023-07-20 Thread Anthony D';Atri
Sometimes one can even get away with "ceph osd down 343" which doesn't affect the process. I have had occasions when this goosed peering in a less-intrusive way. I believe it just marks the OSD down in the mons' map, and when that makes it to the OSD, the OSD responds with "I'm not dead yet" a

[ceph-users] Re: Workload that delete 100 M object daily via lifecycle

2023-07-18 Thread Anthony D';Atri
Indeed that's very useful. I improved the documentation for that not long ago, took a while to sort out exactly what it was about. Normally LC only runs once a day as I understand it, there's a debug option that compresses time so that it'll run more frequently, as having to wait for a day to

[ceph-users] Re: Workload that delete 100 M object daily via lifecycle

2023-07-18 Thread Anthony D';Atri
Index pool on Aerospike? Building OSDs on PRAM might be a lot less work than trying to ensure consistency on backing storage while still servicing out of RAM and not syncing every transaction. > On Jul 18, 2023, at 14:31, Peter Grandi wrote: > > [...] S3 workload, that will need to delet

[ceph-users] Re: librbd hangs during large backfill

2023-07-18 Thread Anthony D';Atri
I've seen this dynamic contribute to a hypervisor with many attachments running out of system-wide file descriptors. > On Jul 18, 2023, at 16:21, Konstantin Shalygin wrote: > > Hi, > > Check you libvirt limits for qemu open files/sockets. Seems, when you added > new OSD's, your librbd client

[ceph-users] Re: Workload that delete 100 M object daily via lifecycle

2023-07-18 Thread Anthony D';Atri
Index pool distributed over a large number of NVMe OSDs? Multiple, dedicated RGW instances that only run LC? > On Jul 18, 2023, at 12:08, Peter Grandi wrote: > On Mon, 17 Jul 2023 19:19:34 +0700, Ha Nguyen Van said: > >> [...] S3 workload, that will need to delete 100M file daily [

[ceph-users] Re: Per minor-version view on docs.ceph.com

2023-07-12 Thread Anthony D';Atri
The docs aren't necessarily structured that way, i.e. there isn't a 17.2.6 docs site as such. We try to document changes in behavior in sync with code, but don't currently have a process to ensure that a given docs build corresponds exactly to a given dot release. In fact we sometimes go back

[ceph-users] Re: pg_num != pgp_num - and unable to change.

2023-07-06 Thread Anthony D';Atri
Indeed. For clarity, this process is not the same as the pg_autoscaler. It's real easy to conflate the two, along with the balancer module, so I like to call that out to reduce confusion. > On Jul 6, 2023, at 18:01, Dan van der Ster wrote: > > Since nautilus, pgp_num (and pg_num) will be inc

[ceph-users] Re: Rook on bare-metal?

2023-07-06 Thread Anthony D';Atri
I’m also using Rook on BM. I had never used K8s before, so that was the learning curve, e.g. translating the example YAML files into the Helm charts we needed, and the label / taint / toleration dance to fit the square peg of pinning services to round hole nodes. We’re using Kubespray ; I gath

[ceph-users] Re: What is the best way to use disks with different sizes

2023-07-04 Thread Anthony D';Atri
ps://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> > Sans virus.www.avast.com > <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> > >

[ceph-users] Re: What is the best way to use disks with different sizes

2023-07-04 Thread Anthony D';Atri
There aren’t enough drives to split into multiple pools. Deploy 1 OSD on each of the 3.8T devices and 2 OSDs on each of the 7.6s. Or, alternately, 2 and 4. > On Jul 4, 2023, at 3:44 AM, Eneko Lacunza wrote: > > Hi, > > El 3/7/23 a las 17:27, wodel youchi escribió: >> I will be deploying a Pr

[ceph-users] Re: device class for nvme disk is ssd

2023-06-28 Thread Anthony D';Atri
Even when you factor in density, iops, and the cost of an HBA? SAS is mostly dead, manufacturers are beginning to drop SATA from their roadmaps. > On Jun 28, 2023, at 10:24 AM, Marc wrote: > >  > >> >> What would we use instead? SATA / SAS that are progressively withering >> in the market

[ceph-users] Re: device class for nvme disk is ssd

2023-06-28 Thread Anthony D';Atri
That page has mixed info. What would we use instead? SATA / SAS that are progressively withering in the market, less performance for the same money? Why pay extra for an HBA just to use legacy media? You can use NVMe for WAL+DB, with more complexity. You’ll get faster metadata and lower la

[ceph-users] Re: Encryption per user Howto

2023-06-02 Thread Anthony D';Atri
Stefan, how do you have this implemented? Earlier this year I submitted https://tracker.ceph.com/issues/58569 asking to enable just this. > On Jun 2, 2023, at 10:09, Stefan Kooman wrote: > > On 5/26/23 23:09, Alexander E. Patrakov wrote: >> Hello Frank, >> On Fri, May 26, 2023 at 6:27 PM Fran

[ceph-users] Re: `ceph features` on Nautilus still reports "luminous"

2023-05-25 Thread Anthony D';Atri
This is my understanding as well: as with CRUSH tunable sets, features *happen* to be named after releases don't always correlate 1:1. > On May 25, 2023, at 15:49, Wesley Dillingham wrote: > > Fairly confident this is normal. I just checked a pacific cluster and they > all report luminous as

[ceph-users] Re: CEPH Version choice

2023-05-17 Thread Anthony D';Atri
The release of Reef has been delayed in part due to issues that sidelined the testing / validation infrastructure. > On May 15, 2023, at 05:40, huy nguyen wrote: > > Hi, as I understand, Pacific+ is having a performance issue that does not > exist in older releases? So that why Ceph's new rele

[ceph-users] Re: Octopus on Ubuntu 20.04.6 LTS with kernel 5

2023-05-11 Thread Anthony D';Atri
As a KRDB client, I believe that 5.4 also introduces better support for RBD features including fast-diff > On May 11, 2023, at 3:59 AM, Gerdriaan Mulder wrote: > > As a data point: we've been running Octopus (solely for CephFS) on Ubuntu > 20.04 with 5.4.0(-122) for some time now, with packag

[ceph-users] Re: architecture help (iscsi, rbd, backups?)

2023-04-27 Thread Anthony D';Atri
There is also a direct RBD client for MS Windows, though it's relatively young. > On Apr 27, 2023, at 18:20, Bailey Allison wrote: > > Hey Angelo, > > Just to make sure I'm understanding correctly, the main idea for the use > case is to be able to present Ceph storage to windows clients as SMB?

[ceph-users] Re: Deep-scrub much slower than HDD speed

2023-04-27 Thread Anthony D';Atri
> Indeed! Every Ceph instance I have seen (not many) and almost every HPC > storage system I have seen have this problem, and that's because they were > never setup to have enough IOPS to support the maintenance load, never mind > the maintenance load plus the user load (and as a rule not even

[ceph-users] Re: Could you please explain the PG concept

2023-04-25 Thread Anthony D';Atri
Absolutely. Moreover, PGs are not a unit of size, they are a logical grouping of smaller RADOS objects, because a few thousand PGs are a lot easier and less expensive to manage than tens or hundreds of millions of small underlying RADOS objects. They’re for efficiency, and are not any set size

[ceph-users] Re: Veeam backups to radosgw seem to be very slow

2023-04-25 Thread Anthony D';Atri
>> >> >> We have a customer that tries to use veeam with our rgw objectstorage and >> it seems to be blazingly slow. >> What also seems to be strange, that veeam sometimes show "bucket does not >> exist" or "permission denied". >> I've tested parallel and everything seems to work fine from the

[ceph-users] Re: Some hint for a DELL PowerEdge T440/PERC H750 Controller...

2023-04-19 Thread Anthony D';Atri
Actually there was a firmware bug around that a while back. The HBA and storcli claimed to not touch drive cache, but actually were enabling it and lying. > On Apr 19, 2023, at 1:41 PM, Marco Gaiarin wrote: > > Mandi! Mario Giammarco > In chel di` si favelave... > >> The disk cache is: >

[ceph-users] Re: HBA or RAID-0 + BBU

2023-04-19 Thread Anthony D';Atri
LSI 9266/9271 as well in an affected range unless ECO’d > On Apr 19, 2023, at 3:13 PM, Sebastian wrote: > > I want add one thing to what other says, we discussed this between > Cephalocon sessions, avoid HP controllers p210/420, or upgrade firmware to > latest. > These controllers has strang

[ceph-users] Re: HBA or RAID-0 + BBU

2023-04-18 Thread Anthony D';Atri
Are you baiting me? ;) HBA. Always. RAID HBAs are the devil. > On Apr 19, 2023, at 12:56 AM, Murilo Morais wrote: > > Good evening everyone! > > Guys, about the P420 RAID controller, I have a question about the operation > mode: What would be better: HBA or RAID-0 with BBU (active write c

[ceph-users] Re: Some hint for a DELL PowerEdge T440/PERC H750 Controller...

2023-04-15 Thread Anthony D';Atri
With the LSI HBAs I’ve used, HBA cache seemed to only be used for VDs, not for passthrough drives. And then with various nasty bugs. Be careful not to conflate HBA cache with cache on the HDD itself. > On Apr 15, 2023, at 11:51 AM, Konstantin Shalygin wrote: > > Hi, > > Current controller

[ceph-users] Re: Live migrate RBD image with a client using it

2023-04-13 Thread Anthony D';Atri
I've used a similar process with great success for capacity management -- moving volumes from very full clusters to ones with more free space. There was a weighting system to direct new volumes where there was space, but, to forestall full ratio problems due to organic growth of existing thi

[ceph-users] Re: Some hint for a DELL PowerEdge T440/PERC H750 Controller...

2023-04-11 Thread Anthony D';Atri
> > The truth is that: > - hdd are too slow for ceph, the first time you need to do a rebalance or > similar you will discover... Depends on the needs. For cold storage, or sequential use-cases that aren't performance-sensitive ... Can't say "too slow" without context. In Marco's case, I

[ceph-users] Re: Some hint for a DELL PowerEdge T440/PERC H750 Controller...

2023-04-06 Thread Anthony D';Atri
How bizarre, I haven’t dealt with this specific SKU before. Some Dell / LSI HBAs call this passthrough mode, some “personality”, some “jbod mode”, dunno why they can’t be consistent. > We are testing an experimental Ceph cluster with server and controller at > subject. > > The controller have

[ceph-users] Re: Recently deployed cluster showing 9Tb of raw usage without any load deployed

2023-04-03 Thread Anthony D';Atri
Any chance you ran `rados bench` but didn’t fully clean up afterward? > On Apr 3, 2023, at 9:25 PM, Work Ceph > wrote: > > Hello guys! > > > We noticed an unexpected situation. In a recently deployed Ceph cluster we > are seeing a raw usage, that is a bit odd. We have the following setup: >

[ceph-users] Re: Crushmap rule for multi-datacenter erasure coding

2023-04-03 Thread Anthony D';Atri
Mark Nelson's space amp sheet visualizes this really well. A nuance here is that Ceph always writes a full stripe, so with a 9,6 profile, on conventional media, a minimum of 15x4KB=20KB underlying storage will be consumed, even for a 1KB object. A 22 KB object would similarly tie up ~18KB of

[ceph-users] Re: Set the Quality of Service configuration.

2023-04-02 Thread Anthony D';Atri
I think those only work for librbd clients, not for Ceph-CSI or other KRBD clients. > On Apr 2, 2023, at 3:47 PM, Danny Webb wrote: > > for RBD workloads you can set QOS values on a per image basis (and maybe on > an entire pool basis): > > https://docs.ceph.com/en/latest/rbd/rbd-config-re

[ceph-users] Re: avg apply latency went up after update from octopus to pacific

2023-03-27 Thread Anthony D';Atri
> >> >> What I also see is that I have three OSDs that have quite a lot of OMAP >> data, in compare to other OSDs (~20 time higher). I don't know if this >> is an issue: > > I have on 2TB ssd's with 2GB - 4GB omap data, while on 8TB hdd's the omap > data is only 53MB - 100MB. > Should I manu

[ceph-users] Re: EC profiles where m>k (EC 8+12)

2023-03-24 Thread Anthony D';Atri
A custom CRUSH rule can have two steps to enforce that. > On Mar 24, 2023, at 11:04, Danny Webb wrote: > > The question I have regarding this setup is, how can you guarantee that the > 12 m chunks will be located evenly across the two rooms. What would happen > if by chance all 12 chunks were

[ceph-users] Re: With Ceph Quincy, the "ceph" package does not include ceph-volume anymore

2023-03-24 Thread Anthony D';Atri
>> >> I would be surprised if I installed "ceph-osd" and didn't get >> ceph-volume, but I never thought about if "only" installing "ceph" did >> or did not provide it. I — perhaps naively - think of `ceph` as just the CLI, usually installing `ceph-common` too. >> >> (from one of the remaining

[ceph-users] Re: Concerns about swap in ceph nodes

2023-03-15 Thread Anthony D';Atri
With CentOS/Rocky 7-8 I’ve observed unexpected usage of swap when there is plenty of physmem available. Swap IMHO is a relic of a time when RAM capacities were lower and much more expensive. In years beginning with a 2, and with Ceph explicitly, I assert that swap should never be enabled duri

[ceph-users] Re: Theory about min_size and its implications

2023-03-04 Thread Anthony D';Atri
> so size 4 / min_size 2 would be a lot better (of course) More copies (or parity) are always more reliable, but one quickly gets into diminishing returns. In your scenario you might look into stretch mode, which currently would require 4 replicas. In the future maybe it could support EC wit

[ceph-users] Re: Theory about min_size and its implications

2023-03-03 Thread Anthony D';Atri
This is not speculation: I have personally experienced this with an inherited 2R cluster. > On Mar 3, 2023, at 04:07, Janne Johansson wrote: > > > Do not assume the last PG needs to die in a horrible fire, killing > several DC operators with it, it only takes a REALLY small outage, a > fluke

[ceph-users] Re: Theory about min_size and its implications

2023-03-02 Thread Anthony D';Atri
> but what is the problem with only one active PG? > someone pointed out "split brain" but I am unsure about this. I think Paxos will ensure that split-brain doesn’t happen by virtue of needing >50% of the mon quorum to be up. > i think what happens in the worst case is this: > only 1 PG is

[ceph-users] Re: PG Sizing Question

2023-03-01 Thread Anthony D';Atri
> By the sounds of it, a cluster may be configured for the 100 PG / OSD target; > adding pools to the former configuration scenario will require an increase in > OSDs to maintain that recommended PG distribution target and accommodate an > increase of PGs resulting from additional pools.

[ceph-users] Re: PG Sizing Question

2023-02-28 Thread Anthony D';Atri
This can be subtle and is easy to mix up. The “PG ratio” is intended to be the number of PGs hosted on each OSD, plus or minus a few. Note how I phrased that, it’s not the number of PGs divided by the number of OSDs. Remember that PGs are replicated. While each PG belongs to exactly one pool,

[ceph-users] Re: ceph noout vs ceph norebalance, which is better for minor maintenance

2023-02-17 Thread Anthony D';Atri
> * if rebalance will starts due EDAC or SFP degradation, is faster to fix the > issue via DC engineers and put node back to work A judicious mon_osd_down_out_subtree_limit setting can also do this by not rebalancing when an entire node is detected down. > * noout prevents unwanted OSD's fi

[ceph-users] Re: Deep scrub debug option

2023-02-07 Thread Anthony D';Atri
Documented here: https://github.com/ceph/ceph/blob/9754cafc029e1da83f5ddd4332b69066fe6b3ffb/src/common/options/global.yaml.in#L3202 Introduced back here with a bunch of other scrub tweaks: https://github.com/ceph/ceph/pull/18971/files Are your OSDs HDDs? Using EC? How many deep scrubs do you h

[ceph-users] Re: How to get RBD client log?

2023-02-01 Thread Anthony D';Atri
When the client is libvirt/librbd/QEMU virtualization, IIRC one must set these values in the hypervisor’s ceph.conf > On Feb 1, 2023, at 11:05, Ruidong Gao wrote: > > Hi, > > You can use environment variable to set log level to what you want as below: > bash-4.4$ export CEPH_ARGS="--debug-rbd=

<    1   2   3   4   5   6   >