Re: [ceph-users] fio test rbd - single thread - qd1

2019-03-19 Thread Piotr Dałek
AvgStddev x 39 26.18181.51 48.16 50.574872 24.01572 Same here - shold be cached in the blue-store cache as it is 16GB x 84 OSD's .. with a 1GB testfile. Any thoughts - suggestions - insights ? Jesper -- Piotr Dałek piotr.da...@corp.ovh.com https:

Re: [ceph-users] ceph block - volume with RAID#0

2019-01-31 Thread Piotr Dałek
? Exclusive lock on RBD images will kill any (theoretical) performance gains. Without exclusive lock, you lose some of RBD features. Plus, using 2+ clients with single images doesn't sound like a good idea. -- Piotr Dałek piotr.da...@corp.ovh.com https://www.ovhclou

Re: [ceph-users] Fwd: what are the potential risks of mixed cluster and client ms_type

2018-11-18 Thread Piotr Dałek
ewall or network hardware issues. -- Piotr Dałek piotr.da...@corp.ovh.com https://www.ovhcloud.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Fwd: what are the potential risks of mixed cluster and client ms_type

2018-11-18 Thread Piotr Dałek
both messengers use the same protocol. -- Piotr Dałek piotr.da...@corp.ovh.com https://www.ovhcloud.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] RBD image "lightweight snapshots"

2018-08-09 Thread Piotr Dałek
ist with regular snapshots. Removal of these "lightweight" snapshots would be instant (or near instant). So what do others think about this? -- Piotr Dałek piotr.da...@corp.ovh.com https://www.ovhcloud.com ___ ceph-users mailing list ceph-users@lists

Re: [ceph-users] Safe to use rados -p rbd cleanup?

2018-07-16 Thread Piotr Dałek
nchmark metadata object and remove only objects indiced by these metadata. "--prefix" is used when these metadata are lost or overwritten. -- Piotr Dałek piotr.da...@corp.ovh.com https://www.ovhcloud.com ___ ceph-users mailing list ceph-us

Re: [ceph-users] Safe to use rados -p rbd cleanup?

2018-07-16 Thread Piotr Dałek
's safe as objects for rbd images are named differently. -- Piotr Dałek piotr.da...@corp.ovh.com https://www.ovhcloud.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] SSDs for data drives

2018-07-11 Thread Piotr Dałek
write test was running and how heavy it was. -- Piotr Dałek piotr.da...@corp.ovh.com https://www.ovhcloud.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Prioritize recovery over backfilling

2018-06-07 Thread Piotr Dałek
scrubbing/deep scrubbing is fine). >Shutdown all activity to the ceph cluster before that moment? Depends on whether it's actually possible in your case and what load your users generate - you have to decide. -- Piotr Dałek piotr.da...@corp.ovh.com h

Re: [ceph-users] Prioritize recovery over backfilling

2018-06-06 Thread Piotr Dałek
to ask particular pgs to recover first, even if there are other pgs to backfill and/or recovery. -- Piotr Dałek piotr.da...@corp.ovh.com https://www.ovhcloud.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Reduced productivity because of slow requests

2018-06-06 Thread Piotr Dałek
I tried to find smth out in osd logs but there are nothing about it. Any thoughts how to avoid it? Have you tried disabling scrub and deep scrub? -- Piotr Dałek piotr.da...@corp.ovh.com https://www.ovhcloud.com ___ ceph-users mailing list ceph-users@list

Re: [ceph-users] a big cluster or several small

2018-05-15 Thread Piotr Dałek
- and their users - are unaffected. For us this already proved useful in the past. -- Piotr Dałek piotr.da...@corp.ovh.com https://www.ovhcloud.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Integrating XEN Server : Long query time for "rbd ls -l" queries

2018-04-25 Thread Piotr Dałek
n the "rbd" utility. So what can i do to make "rbd ls -l" faster or to get comparable information regarding snapshot hierarchy information? Can you run this command with extra argument "--rbd_concurrent_management_ops=1" and share the timing of that? -- Piotr Dałek piot

Re: [ceph-users] High apply latency

2018-02-02 Thread Piotr Dałek
tings command. You may want to try the above as well. -- Piotr Dałek piotr.da...@corp.ovh.com https://www.ovh.com/us/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] formatting bytes and object counts in ceph status ouput

2018-01-02 Thread Piotr Dałek
users expect that non-size counters - like object counts - use base-10, and size counters use base-2 units. Ceph's "standard" of using base-2 everywhere was confusing for me as well initially, but I got used to that... Still, wouldn't mind if that would get sorted out once and

Re: [ceph-users] Snap trim queue length issues

2017-12-18 Thread Piotr Dałek
On 17-12-15 03:58 PM, Sage Weil wrote: On Fri, 15 Dec 2017, Piotr Dałek wrote: On 17-12-14 05:31 PM, David Turner wrote: I've tracked this in a much more manual way.  I would grab a random subset [..] This was all on a Hammer cluster.  The changes to the snap trimming queues going int

Re: [ceph-users] Snap trim queue length issues

2017-12-15 Thread Piotr Dałek
of serious service disruption once disk space is all used up. Hopefully it'll be convincing enough for devs. ;) -- Piotr Dałek piotr.da...@corp.ovh.com https://www.ovh.com/us/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Snap trim queue length issues

2017-12-14 Thread Piotr Dałek
yself. But having some support from users would be helpful in pushing this into next Jewel release. Thanks! [1] one of our guys hacked a bash oneliner that printed out snap trim queue lengths for all pgs, but full run takes over an hour to complete on a cluster with over 20k pgs... [2] https

Re: [ceph-users] ceph.conf tuning ... please comment

2017-12-06 Thread Piotr Dałek
3 lowest, or if that's not acceptable then at least set "osd heartbeat min size" to 0. -- Piotr Dałek piotr.da...@corp.ovh.com https://www.ovh.com/us/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Single disk per OSD ?

2017-12-01 Thread Piotr Dałek
ly extend the damage area once such OSD fails). -- Piotr Dałek piotr.da...@corp.ovh.com https://www.ovh.com/us/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-disk is now deprecated

2017-11-28 Thread Piotr Dałek
x27;t have a *big* problem with this (we haven't upgraded to Luminous yet, so we can skip to next point release and move to ceph-volume together with Luminous). It's still a problem, though - now we have more of our infrastructure to migrate and test, meaning even more delays

Re: [ceph-users] Restart is required?

2017-11-16 Thread Piotr Dałek
es in a subdir before merging into parent NOTE: A negative value means to disable subdir merging " will variable definition like "filestore_merge_treshold = -50" (negative value) work? (in Jewel it worked like a charm) Yes, I don't see any changes to that. -- Piotr Dałek

Re: [ceph-users] Restart is required?

2017-11-16 Thread Piotr Dałek
;filestore split multiple" is not observed for runtime changes, meaning that new value will be stored in osd.0 process memory, but not used at all. Do I really need to restart OSD to make changes to take effect? ceph version 12.2.1 () luminous (stable) Yes. -- Piotr Dałek piotr.da...@co

Re: [ceph-users] Libvirt hosts freeze after ceph osd+mon problem

2017-11-07 Thread Piotr Dałek
[..] Why would you want to *stop* (as in, freeze) a process instead of killing it? Anyway, with processes still there, it may take a few minutes before cluster realizes that daemons are stopped and kicks it out of cluster, restoring normal behavior (assuming correctly set crush rules). --

Re: [ceph-users] rbd rm snap on image with exclusive lock

2017-10-25 Thread Piotr Dałek
g properly refreshed. I'd love to, but that would require us to restart that client - not an option. We'll try to reproduce this somehow anyway and let you know if something interesting shows up. -- Piotr Dałek piotr.da...@corp.ovh.com https://www.ovh.com/us/

Re: [ceph-users] rbd rm snap on image with exclusive lock

2017-10-25 Thread Piotr Dałek
snapshot exists. Thanks, that makes things clear. Seems like we have some Cinders utilizing Infernalis (9.2.1) librbd. Are you aware of any bugs in 9.2.x that could cause such behavior? We've seen that for the first time... -- Piotr Dałek piotr.da...@corp.

[ceph-users] rbd rm snap on image with exclusive lock

2017-10-25 Thread Piotr Dałek
shots but not remove them when exclusive lock on image is taken? (jewel bug?) 2. Why the error is transformed and then ignored? -- Piotr Dałek piotr.da...@corp.ovh.com https://www.ovh.com/us/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://l

Re: [ceph-users] A new SSD for journals - everything sucks?

2017-10-11 Thread Piotr Dałek
goes down). You may want to look at this: https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ -- Piotr Dałek piotr.da...@corp.ovh.com https://www.ovh.com/us/ ___ ceph-users mailing list ceph-users@list

Re: [ceph-users] why sudden (and brief) HEALTH_ERR

2017-10-04 Thread Piotr Dałek
rious :-) Since Jewel (AFAIR), when (re)starting OSDs, pg status is reset to "never contacted", resulting in "pgs are stuck inactive for more than 300 seconds" being reported until osds regain connections between themselves. -- Piotr Dałek piotr.da...@corp.ovh.c

Re: [ceph-users] RBD: How many snapshots is too many?

2017-09-18 Thread Piotr Dałek
g too much at once - not a final solution, but should make life a bit more tolerable until actual, working solution is in place. -- Piotr Dałek piotr.da...@corp.ovh.com https://www.ovh.com/us/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Note about rbd_aio_write usage

2017-07-06 Thread Piotr Dałek
On 17-07-06 09:39 PM, Jason Dillaman wrote: On Thu, Jul 6, 2017 at 3:25 PM, Piotr Dałek wrote: Is that deep copy an equivalent of what Jewel librbd did at unspecified point of time, or extra one? It's equivalent / replacement -- not an additional copy. This was changed to support sc

Re: [ceph-users] Note about rbd_aio_write usage

2017-07-06 Thread Piotr Dałek
On 17-07-06 04:40 PM, Jason Dillaman wrote: On Thu, Jul 6, 2017 at 10:22 AM, Piotr Dałek wrote: So I really see two problems here: lack of API docs and backwards-incompatible change in API behavior. Docs are always in need of update, so any pull requests would be greatly appreciated

Re: [ceph-users] Note about rbd_aio_write usage

2017-07-06 Thread Piotr Dałek
aio_write? To stress-test memory bus? So I really see two problems here: lack of API docs and backwards-incompatible change in API behavior. -- Piotr Dałek piotr.da...@corp.ovh.com https://www.ovh.com/us/ ___ ceph-users mailing list ceph-

Re: [ceph-users] Note about rbd_aio_write usage

2017-07-06 Thread Piotr Dałek
On 17-07-06 03:03 PM, Jason Dillaman wrote: On Thu, Jul 6, 2017 at 8:26 AM, Piotr Dałek wrote: Hi, If you're using "rbd_aio_write()" in your code, be aware of the fact that before Luminous release, this function expects buffer to remain unchanged until write op ends, and on Lum

[ceph-users] Note about rbd_aio_write usage

2017-07-06 Thread Piotr Dałek
ty for unnecessary memory allocation and copy on your side (though it's probably unavoidable with current state of Luminous). -- Piotr Dałek piotr.da...@corp.ovh.com https://www.ovh.com/us/ ___ ceph-users mailing list ceph-users@lists.ceph.com http

Re: [ceph-users] Sparse file info in filestore not propagated to other OSDs

2017-06-21 Thread Piotr Dałek
On 17-06-21 03:35 PM, Jason Dillaman wrote: On Wed, Jun 21, 2017 at 3:05 AM, Piotr Dałek wrote: I saw that RBD (librbd) does that - replacing writes with discards when buffer contains only zeros. Some code that does the same in librados could be added and it shouldn't impact performance

Re: [ceph-users] Sparse file info in filestore not propagated to other OSDs

2017-06-21 Thread Piotr Dałek
On 17-06-21 03:24 PM, Sage Weil wrote: On Wed, 21 Jun 2017, Piotr Dałek wrote: On 17-06-14 03:44 PM, Sage Weil wrote: On Wed, 14 Jun 2017, Paweł Sadowski wrote: On 04/13/2017 04:23 PM, Piotr Dałek wrote: On 04/06/2017 03:25 PM, Sage Weil wrote: On Thu, 6 Apr 2017, Piotr Dałek wrote: [snip

Re: [ceph-users] Prioritise recovery on specific PGs/OSDs?

2017-06-21 Thread Piotr Dałek
prioritize recovery on pool if that would work for you (as others wrote), or +1 this PR: https://github.com/ceph/ceph/pull/13723 (it's bit outdated as I'm constantly low on time, but I promise to push it forward!). -- Piotr Dałek piotr.da...@corp.ovh.com https://www.ovh.com/us/ ___

Re: [ceph-users] Sparse file info in filestore not propagated to other OSDs

2017-06-21 Thread Piotr Dałek
On 17-06-14 03:44 PM, Sage Weil wrote: On Wed, 14 Jun 2017, Paweł Sadowski wrote: On 04/13/2017 04:23 PM, Piotr Dałek wrote: On 04/06/2017 03:25 PM, Sage Weil wrote: On Thu, 6 Apr 2017, Piotr Dałek wrote: [snip] I think the solution here is to use sparse_read during recovery. The PushOp

Re: [ceph-users] Socket errors, CRC, lossy con messages

2017-04-10 Thread Piotr Dałek
calculated by sending side. Try gathering some more examples of such crc errors and isolate osd/host that sends malformed data, then do usual diagnostics like memory test on that mahcine. -- Piotr Dałek piotr.da...@corp.ovh.com https://www.ovh.com/us/ __

Re: [ceph-users] slow perfomance: sanity check

2017-04-06 Thread Piotr Dałek
e non-SMR drives, because Ceph is not optimized for those. -- Piotr Dałek piotr.da...@corp.ovh.com https://www.ovh.com/us/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Recompiling source code - to find exact RPM

2017-03-24 Thread Piotr Dałek
te binaries, restart of Ceph daemons is still required. -- Piotr Dałek piotr.da...@corp.ovh.com https://www.ovh.com/us/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Recompiling source code - to find exact RPM

2017-03-23 Thread Piotr Dałek
hat's what I'm try to figure out. Yes, I understand that. But wouldn't be faster and/or more convenient if you would just recompile binaries in-place (or use network symlinks) instead of packaging entire Ceph and (re)installing its packages each time you do the change? Generating R

Re: [ceph-users] Recompiling source code - to find exact RPM

2017-03-23 Thread Piotr Dałek
mlink them via nfs (or whatever) to build machine and build once there. -- Piotr Dałek piotr.da...@corp.ovh.com https://www.ovh.com/us/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-13 Thread Piotr Dałek
be a problem (at least we don't see it anymore). -- Piotr Dałek piotr.da...@corp.ovh.com https://www.ovh.com/us/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Issue with upgrade from 0.94.9 to 10.2.5

2017-01-26 Thread Piotr Dałek
in ceph -w. i haven't dug into it much but just wanted to second that i've seen this happen on a recent hammer to recent jewel upgrade. Thanks for confirmation. We've prepared the patch which fixes the issue for us: https://github.com/ceph/ceph/pull/13131 -- Piotr Dałek piotr.da.

Re: [ceph-users] Issue with upgrade from 0.94.9 to 10.2.5

2017-01-18 Thread Piotr Dałek
On 01/17/2017 12:52 PM, Piotr Dałek wrote: During our testing we found out that during upgrade from 0.94.9 to 10.2.5 we're hitting issue http://tracker.ceph.com/issues/17386 ("Upgrading 0.94.6 -> 0.94.9 saturating mon node networking"). Apparently, there's a few commits fo

[ceph-users] Issue with upgrade from 0.94.9 to 10.2.5

2017-01-17 Thread Piotr Dałek
re supposed to fix this issue for upgrades from 0.94.6 to 0.94.9 (and possibly for others), but we're still seeing this upgrading to Jewel, and symptoms are exactly same - after upgrading MONs, each not yet upgraded OSD takes full OSDMap from monitors after failing the CRC check. Anyone else

Re: [ceph-users] Any librados C API users out there?

2017-01-12 Thread Piotr Dałek
On 01/11/2017 07:01 PM, Sage Weil wrote: On Wed, 11 Jan 2017, Jason Dillaman wrote: On Wed, Jan 11, 2017 at 11:44 AM, Piotr Dałek wrote: As the subject says - are here any users/consumers of librados C API? I'm asking because we're researching if this PR: https://github.com/ceph

[ceph-users] Any librados C API users out there?

2017-01-11 Thread Piotr Dałek
es without intermediate data copy, which will reduce cpu and memory load on clients. If you're using librados C API for object writes, feel free to comment here or in the pull request. -- Piotr Dałek ___ ceph-users mailing list ceph-users@list