[ceph-users] Ceph bluestore performance on 4kn vs. 512e?

2019-02-25 Thread Oliver Schulz
Dear all, in real-world use, is there a significant performance benefit in using 4kn instead of 512e HDDs (using Ceph bluestore with block-db on NVMe-SSD)? Cheers and thanks for any advice, Oliver ___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] CephFS with erasure coding, do I need a cache-pool?

2018-07-19 Thread Oliver Schulz
about 2x faster than the P3700 we had, and allow us to get more out of our flash drives. -------- *From:* Oliver Schulz *Sent:* Wednesday, 18 July 2018 12:00:14 PM *To:* Linh Vu; ceph-users *Subject:* Re: [ceph-users] CephFS wi

Re: [ceph-users] CephFS with erasure coding, do I need a cache-pool?

2018-07-19 Thread Oliver Schulz
It's about 2x faster than the P3700 we had, and allow us to get more out of our flash drives. ---- *From:* Oliver Schulz *Sent:* Wednesday, 18 July 2018 12:00:14 PM *To:* Linh Vu; ceph-users *Subject:* Re: [ceph-users] C

Re: [ceph-users] CephFS with erasure coding, do I need a cache-pool?

2018-07-17 Thread Oliver Schulz
on't do any such special allocation. 😊 Cheers, Linh -------- *From:* Oliver Schulz *Sent:* Tuesday, 17 July 2018 11:39:26 PM *To:* Linh Vu; ceph-users *Subject:* Re: [ceph-users] CephFS with erasure coding, do I need a cache-pool?

Re: [ceph-users] CephFS with erasure coding, do I need a cache-pool?

2018-07-17 Thread Oliver Schulz
On 18.07.2018 00:43, Gregory Farnum wrote: > But you could also do workaround like letting it choose (K+M)/2 racks > and putting two shards in each rack. Oh yes, you are more susceptible to top-of-rack switch failures in this case or whatever. It's just one option — many people are

Re: [ceph-users] CephFS with erasure coding, do I need a cache-pool?

2018-07-17 Thread Oliver Schulz
you more throughput but increase latency especially for small files, so it also depends on how important performance is and what kind of file size you store on your CephFS. Cheers, Linh *From:* ceph-users on behalf of O

Re: [ceph-users] CephFS with erasure coding, do I need a cache-pool?

2018-07-17 Thread Oliver Schulz
ind of file size you store on your CephFS. Cheers, Linh *From:* ceph-users on behalf of Oliver Schulz *Sent:* Sunday, 15 July 2018 9:46:16 PM *To:* ceph-users *Subject:* [ceph-users] CephFS with erasure coding, do

Re: [ceph-users] CephFS with erasure coding, do I need a cache-pool?

2018-07-17 Thread Oliver Schulz
Hi Greg, On 17.07.2018 03:01, Gregory Farnum wrote: Since Luminous, you can use an erasure coded pool (on bluestore) directly as a CephFS data pool, no cache pool needed. More than that, we'd really prefer you didn't use cache pools for anything. Just Say No. :) Thanks for the confirm

Re: [ceph-users] CephFS with erasure coding, do I need a cache-pool?

2018-07-16 Thread Oliver Schulz
Dear John, On 16.07.2018 16:25, John Spray wrote: Since Luminous, you can use an erasure coded pool (on bluestore) directly as a CephFS data pool, no cache pool needed. Great! I'll be happy to go without a cache pool then. Thanks for your help, John, Oliver __

[ceph-users] CephFS with erasure coding, do I need a cache-pool?

2018-07-15 Thread Oliver Schulz
Dear all, we're planning a new Ceph-Clusterm, with CephFS as the main workload, and would like to use erasure coding to use the disks more efficiently. Access pattern will probably be more read- than write-heavy, on average. I don't have any practical experience with erasure- coded pools so far.

Re: [ceph-users] Backfill stops after a while after OSD reweight

2018-06-23 Thread Oliver Schulz
Hi Konstantin, thanks! "set-all-straw-buckets-to-straw2" was what I was looking for. Didn't see it in the docs. Thanks again! Cheers, Oliver On 23.06.2018 06:39, Konstantin Shalygin wrote: Yes, I know that section of the docs, but can't find how to change the crush rules after "ceph osd cru

Re: [ceph-users] Backfill stops after a while after OSD reweight

2018-06-22 Thread Oliver Schulz
#tunables 2018-06-20 18:27 GMT+02:00 Oliver Schulz mailto:oliver.sch...@tu-dortmund.de>>: Thanks, Paul - I could probably activate the Jewel tunables profile without losing too many clients - most are running at least kernel 4.2, I think. I'll go hunting for older clien

Re: [ceph-users] Backfill stops after a while after OSD reweight

2018-06-20 Thread Oliver Schulz
without breaking the oldest clients. Incrementing choose*tries in the crush rule or tunables is probably sufficient. But since you are apparently running into data balance problems you'll have to update that to something more modern sooner or later. You can also play around with crushtool

Re: [ceph-users] Backfill stops after a while after OSD reweight

2018-06-20 Thread Oliver Schulz
re common). Values used for EC are set_chooseleaf_tries = 5 and set_choose_tries = 100. You can configure them by adding them as the first steps of the rule. You can also configure an upmap exception. But in general it is often not the best idea to have only 3 racks for

Re: [ceph-users] Backfill stops after a while after OSD reweight

2018-06-20 Thread Oliver Schulz
On 06/20/2018 04:00 PM, Paul Emmerich wrote: Can you post the full output of "ceph -s", "ceph health detail, and ceph osd df tree Also please run "ceph pg X.YZ query" on one of the PGs not backfilling. Paul 2018-06-20 15:25 GMT+02:00 Oliver Schulz mailto:oliver.sch...@tu

[ceph-users] Backfill stops after a while after OSD reweight

2018-06-20 Thread Oliver Schulz
Dear all, we (somewhat) recently extended our Ceph cluster, and updated it to Luminous. By now, the fill level on some ODSs is quite high again, so I'd like to re-balance via "OSD reweight". I'm running into the following problem, however: Not matter what I do (reweigt a little, or a lot, or onl

Re: [ceph-users] How to fix a Ceph PG in unkown state with no OSDs?

2018-06-14 Thread Oliver Schulz
ephfs disaster recovery tools. Your cluster was offline here because it couldn't do some writes, but it should still be self-consistent. On Thu, Jun 14, 2018 at 4:52 PM Oliver Schulz mailto:oliver.sch...@tu-dortmund.de>> wrote: They are recovered now, looks like it just took a bit

Re: [ceph-users] How to fix a Ceph PG in unkown state with no OSDs?

2018-06-14 Thread Oliver Schulz
shmap -o crushmap crushtool -d crushmap -o crushmap.txt Paul 2018-06-14 22:39 GMT+02:00 Oliver Schulz <mailto:oliver.sch...@tu-dortmund.de>>: Thanks, Greg!! I reset all the OSD weights to 1.00, and I think I'm in a much better state now. The only trouble left in &quo

Re: [ceph-users] How to fix a Ceph PG in unkown state with no OSDs?

2018-06-14 Thread Oliver Schulz
gs with it in various versions and forcing that kind of priority without global decision making is prone to issues. But yep, looks like things will eventually become all good now. :) On Thu, Jun 14, 2018 at 4:39 PM Oliver Schulz mailto:oliver.sch...@tu-dortmund.de>> wrote: Thanks, Gre

Re: [ceph-users] How to fix a Ceph PG in unkown state with no OSDs?

2018-06-14 Thread Oliver Schulz
quot;recovery_wait", after. Cheers, Oliver On 14.06.2018 22:09, Gregory Farnum wrote: On Thu, Jun 14, 2018 at 4:07 PM Oliver Schulz mailto:oliver.sch...@tu-dortmund.de>> wrote: Hi Greg, I increased the hard limit and rebooted everything. The PG without acting OSDs still

Re: [ceph-users] How to fix a Ceph PG in unkown state with no OSDs?

2018-06-14 Thread Oliver Schulz
ndoing the remap-by-utilization > changes. How do I do that, best? Just set all the weights back to 1.00? Cheers, Oliver P.S.: Thanks so much for helping! On 14.06.2018 21:37, Gregory Farnum wrote: On Thu, Jun 14, 2018 at 3:26 PM Oliver Schulz mailto:oliver.sch...@tu-dortmund.de>> w

Re: [ceph-users] How to fix a Ceph PG in unkown state with no OSDs?

2018-06-14 Thread Oliver Schulz
Ah, I see some OSDs actually are over the 200 PG limit - I'll increase the hard limit and restart everything. On 14.06.2018 21:26, Oliver Schulz wrote: But the contents of the remapped PGs should still be Ok, right? What confuses me is that they don't backfill - why don't the &q

Re: [ceph-users] How to fix a Ceph PG in unkown state with no OSDs?

2018-06-14 Thread Oliver Schulz
he pg overdose protection that was added in luminous. Check the list archives for the exact name, but you’ll want to increase the pg hard limit and restart the osds that exceeded the previous/current setting. -Greg On Thu, Jun 14, 2018 at 2:33 PM Oliver Schulz mailto:oliver.sch...@tu-dortmund.de

Re: [ceph-users] How to fix a Ceph PG in unkown state with no OSDs?

2018-06-14 Thread Oliver Schulz
t the pg reporting, but I believe if it’s reporting the state as unknown that means *no* running osd which contains any copy of that pg. That’s not something which ceph could do on its own without failures of osds. What’s the output of “ceph -s”? On Thu, Jun 14, 2018 at 2:15 PM Oliver Schulz

Re: [ceph-users] How to fix a Ceph PG in unkown state with no OSDs?

2018-06-14 Thread Oliver Schulz
hould still be getting reported as inactive. On Thu, Jun 14, 2018 at 8:40 AM Oliver Schulz mailto:oliver.sch...@tu-dortmund.de>> wrote: Dear all, I have a serious problem with our Ceph cluster: One of our PGs somehow ended up in this state (reported by "ceph health deta

[ceph-users] How to fix a Ceph PG in unkown state with no OSDs?

2018-06-14 Thread Oliver Schulz
Dear all, I have a serious problem with our Ceph cluster: One of our PGs somehow ended up in this state (reported by "ceph health detail": pg 1.XXX is stuck inactive for ..., current state unknown, last acting [] Also, "ceph pg map 1.xxx" reports: osdmap e525812 pg 1.721 (1.721) -> up

[ceph-users] Increasing number of PGs by not a factor of two?

2018-05-16 Thread Oliver Schulz
Dear all, we have a Ceph cluster that has slowly evolved over several years and Ceph versions (started with 18 OSDs and 54 TB in 2013, now about 200 OSDs and 1.5 PB, still the same cluster, with data continuity). So there are some "early sins" in the cluster configuration, left over from the earl

Re: [ceph-users] Shared WAL/DB device partition for multiple OSDs?

2018-05-12 Thread Oliver Schulz
that much more longevity to the life of the drive. You cannot change the size of any part of a bluestore osd after creation. On Sat, May 12, 2018, 3:09 PM Oliver Schulz mailto:oliver.sch...@tu-dortmund.de>> wrote: Dear David, On 11.05.2018 22:10, David Turner wrote: > F

Re: [ceph-users] Shared WAL/DB device partition for multiple OSDs?

2018-05-12 Thread Oliver Schulz
Dear David, On 11.05.2018 22:10, David Turner wrote: For if you should do WAL only on the NVMe vs use a filestore journal, that depends on your write patterns, use case, etc. we mostly use CephFS, for scientific data processing. It's mainly larger files (10 MB to 10 GB, but sometimes also a bu

Re: [ceph-users] Shared WAL/DB device partition for multiple OSDs?

2018-05-11 Thread Oliver Schulz
Dear David, thanks a lot for the detailed answer(s) and clarifications! Can I ask just a few more questions? On 11.05.2018 18:46, David Turner wrote: partitions is 10GB per 1TB of OSD.  If your OSD is a 4TB disk you should be looking closer to a 40GB block.db partition.  If your block.db parti

Re: [ceph-users] Shared WAL/DB device partition for multiple OSDs?

2018-05-11 Thread Oliver Schulz
ng your question, you could do something like: $ ceph-deploy osd create --bluestore --data=/dev/sdb --block-db /dev/nvme0n1p1 $HOSTNAME $ ceph-deploy osd create --bluestore --data=/dev/sdc --block-db /dev/nvme0n1p1 $HOSTNAME On Fri, May 11, 2018 at 10:35 AM Oliver Schulz mailto:oliver.sch...@tu-d

Re: [ceph-users] Shared WAL/DB device partition for multiple OSDs?

2018-05-11 Thread Oliver Schulz
4 0 576M 0 part > ├─nvme0n1p15 259:15 0 1G 0 part > ├─nvme0n1p16 259:16 0 576M 0 part > ├─nvme0n1p17 259:17 0 1G 0 part > ├─nvme0n1p18 259:18 0 576M 0 part > ├─nvme0n1p19 259:19 0 1G 0 part > ├─nvme0n1p20 259:20 0 576M 0 part > ├─

[ceph-users] Shared WAL/DB device partition for multiple OSDs?

2018-05-11 Thread Oliver Schulz
Dear Ceph Experts, I'm trying to set up some new OSD storage nodes, now with bluestore (our existing nodes still use filestore). I'm a bit unclear on how to specify WAL/DB devices: Can several OSDs share one WAL/DB partition? So, can I do ceph-deploy osd create --bluestore --osd-db=/dev/nvme

Re: [ceph-users] Using ceph deploy with mon.a instead of mon.hostname?

2018-04-20 Thread Oliver Schulz
find 'MON_HOSTNAME' in monmap Any ideas? Cheers, Oliver On 04/20/2018 11:46 AM, Stefan Kooman wrote: Quoting Oliver Schulz (oliver.sch...@tu-dortmund.de): Dear Ceph Experts, I'm try to switch an old Ceph cluster from manual administration to ceph-deploy, but I'm running

[ceph-users] Using ceph deploy with mon.a instead of mon.hostname?

2018-04-20 Thread Oliver Schulz
Dear Ceph Experts, I'm try to switch an old Ceph cluster from manual administration to ceph-deploy, but I'm running into the following error: # ceph-deploy gatherkeys HOSTNAME [HOSTNAME][INFO ] Running command: /usr/bin/ceph --connect-timeout=25 --cluster=ceph --admin-daemon=/var/run/ceph/cep

[ceph-users] How to repair MDS damage?

2017-02-14 Thread Oliver Schulz
Dear Ceph Experts, after upgrading our Ceph cluster from Hammer to Jewel, the MDS (after a few days) found some metadata damage: # ceph status [...] health HEALTH_ERR mds0: Metadata damage detected [...] The output of # ceph tell mds.0 damage ls is: [ {

Re: [ceph-users] How to identify MDS client failing to respond to capability release?

2015-07-30 Thread Oliver Schulz
d consider using either a more recent kernel or a fuse client. John On 30/07/15 08:32, Oliver Schulz wrote: Hello Ceph Experts, lately, "ceph status" on our cluster often states: mds0: Client CLIENT_ID failing to respond to capability release How can I identify which client is at

[ceph-users] How to identify MDS client failing to respond to capability release?

2015-07-30 Thread Oliver Schulz
Hello Ceph Experts, lately, "ceph status" on our cluster often states: mds0: Client CLIENT_ID failing to respond to capability release How can I identify which client is at fault (hostname or IP address) from the CLIENT_ID? What could be the source of the "failing to respond to capability

Re: [ceph-users] Privileges for read-only CephFS access?

2015-02-18 Thread Oliver Schulz
Dear Greg, On 18.02.2015 23:41, Gregory Farnum wrote: is it possible to define a Ceph user/key with privileges that allow for read-only CephFS access but do not allow ...and deletes, unfortunately. :( I don't think this is presently a thing it's possible to do until we get a much better user au

Re: [ceph-users] Privileges for read-only CephFS access?

2015-02-18 Thread Oliver Schulz
Hi Florian, On 18.02.2015 22:58, Florian Haas wrote: is it possible to define a Ceph user/key with privileges that allow for read-only CephFS access but do not allow All you should need to do is [...] However, I've just tried the above with ceph-fuse on firefly, and [...] So I believe you've un

[ceph-users] Privileges for read-only CephFS access?

2015-02-18 Thread Oliver Schulz
Dear Ceph Experts, is it possible to define a Ceph user/key with privileges that allow for read-only CephFS access but do not allow write or other modifications to the Ceph cluster? I would like to export a sub-tree of our CephFS via HTTPS. Alas, web-servers are inviting targets, so in the (hope

[ceph-users] Blocked requests during and after CephFS delete

2013-12-08 Thread Oliver Schulz
Hello Ceph-Gurus, a short while ago I reported some trouble we had with our cluster suddenly going into a state of "blocked requests". We did a few tests, and we can reproduce the problem: During / after deleting of a substantial chunk of data on CephFS (a few TB), ceph health shows blocked requ

Re: [ceph-users] Constant slow / blocked requests with otherwise healthy cluster

2013-11-28 Thread Oliver Schulz
> our Ceph cluster suddenly went into a state of OSDs constantly having > blocked or slow requests, rendering the cluster unusable. This happened > during normal use, there were no updates, etc. our cluster seems to have recovered overnight and is back to normal behaviour. This morning, everything

Re: [ceph-users] Constant slow / blocked requests with otherwise healthy cluster

2013-11-28 Thread Oliver Schulz
Hi Michael, > Sounds like what I was having starting a couple of days ago, played [...] yes, that sounds ony too familiar. :-( > Updated to 3.12 kernel and restarted all of the ceph nodes and it's now > happily churning through a rados -p rbd bench 300 write -t 120 that Weird - but if that s

[ceph-users] Constant slow / blocked requests with otherwise healthy cluster

2013-11-27 Thread Oliver Schulz
Dear Ceph Experts, our Ceph cluster suddenly went into a state of OSDs constantly having blocked or slow requests, rendering the cluster unusable. This happened during normal use, there were no updates, etc. All disks seem to be healthy (smartctl, iostat, etc.). A complete hardware reboot includ

Re: [ceph-users] CRUSH tunables for production system? / Data Distribution?

2013-11-13 Thread Oliver Schulz
Dear Greg, I believe 3.8 is after CRUSH_TUNABLES v1 was implemented in the kernel, so it shouldn't hurt you to turn them on if you need them. (And the crush tool is just out of date; we should update that text!) However, if you aren't having distribution issues on your cluster I wouldn't bother

[ceph-users] CRUSH tunables for production system?

2013-11-13 Thread Oliver Schulz
Dear Ceph Experts, We're running a production Ceph cluster with Ceph Dumpling, with Ubuntu 12.04.3 (kernel 3.8) on the cluster nodes and all clients. We're mainly using CephFS (kernel) and RBD (kernel and user-space/libvirt). Would you recommend to activate CRUSH_TUNABLES (1, not 2) for our use

[ceph-users] Mounting RBD or CephFS on Ceph-Node?

2013-07-23 Thread Oliver Schulz
Dear Ceph Experts, I remember reading that at least in the past I wasn't recommended to mount Ceph storage on a Ceph cluster node. Given a recent kernel (3.8/3.9) and sufficient CPU and memory resources on the nodes, would it now be safe to * Mount RBD oder CephFS on a Ceph cluster node? * Run a

[ceph-users] Mounting RBD or CephFS on Ceph-Node?

2013-07-23 Thread Oliver Schulz
Dear Ceph Experts, I remember reading that at least in the past I wasn't recommended to mount Ceph storage on a Ceph cluster node. Given a recent kernel (3.8/3.9) and sufficient CPU and memory resources on the nodes, would it now be safe to * Mount RBD oder CephFS on a Ceph cluster node? * Run a

[ceph-users] XFS or btrfs for production systems with modern Kernel?

2013-06-07 Thread Oliver Schulz
Hello, the CEPH "Hard disk and file system recommendations" page states that XFS is the recommend OSD file system for production systems. Does that still hold true for the last kernels versions (e.g. Ubuntu 12.04 with lts-raring kernel 3.8.5)? Would btrfs provide a significant performance incre