[ceph-users] ceph v10.2.9 - rbd cli deadlock ?

2017-07-25 Thread Kjetil Jørgensen
Hi, I'm not sure yet whether or not this is made worse by config, however - if I do something along the lines of: > seq 100 | xargs -P100 -n1 bash -c 'exec rbd.original showmapped' I'll end up with at least one of the invocations deadlocked like below. Doing the same on our v10.2.7 clusters

Re: [ceph-users] 答复: How's cephfs going?

2017-07-19 Thread Kjetil Jørgensen
Hi, While not necessarily CephFS specific - we somehow seem to manage to frequently end up with objects that have inconsistent omaps. This seems to be replication (as anecdotally it's a replica that ends up diverging, and it's at least a few times something that happened after the osd that held

Re: [ceph-users] rbd kernel client fencing

2017-04-25 Thread Kjetil Jørgensen
r fencing image access. It's all about arbitrating modification, in support of i.e. object-map). > > Thanks. > > >> On 20 Apr 2017, at 6:31 AM, Kjetil Jørgensen <kje...@medallia.com> wrote: >> >> Hi, >> >> As long as you blacklist the old owner by ip,

Re: [ceph-users] rbd kernel client fencing

2017-04-19 Thread Kjetil Jørgensen
Hi, As long as you blacklist the old owner by ip, you should be fine. Do note that rbd lock remove implicitly also blacklists unless you also pass rbd lock remove the --rbd_blacklist_on_break_lock=false option. (that is I think "ceph osd blacklist add a.b.c.d interval" translates into

Re: [ceph-users] How to cut a large file into small objects

2017-04-11 Thread Kjetil Jørgensen
Hi, rados - Does not shard your object (as far as I know, there may be a striping API, although it may not do quite what you want) cephfs - implemented on top of rados - does it's own object sharding (I'm fuzzy on the details) rbd - implemented on top of rados - does shard into 2^order sized

Re: [ceph-users] Modification Time of RBD Images

2017-03-24 Thread Kjetil Jørgensen
Hi, YMMV, riddled with assumptions (image is image-format=2, has one ext4 filesystem, no partition table, ext4 superblock starts at 0x400 and probably a whole boatload of other stuff, I don't know when ext4 updates s_wtime of it's superblock, nor if it's actually the superblock last write or last

Re: [ceph-users] Object Map Costs (Was: Snapshot Costs (Was: Re: Pool Sizes))

2017-03-24 Thread Kjetil Jørgensen
Hi, Depending on how you plan to use the omap - you might also want to avoid a large number of key/value pairs as well. CephFS got it's directory fragment size capped due to large omaps being painful to deal with (see: http://tracker.ceph.com/issues/16164 and

Re: [ceph-users] I/O hangs with 2 node failure even if one node isn't involved in I/O

2017-03-22 Thread Kjetil Jørgensen
reasoning is - two or more machines failing at the same instant that isn't caused by switch/power is unlikely enough that we'll happily live with it, it has so far served us well. -KJ On Wed, Mar 22, 2017 at 7:06 PM, Kjetil Jørgensen <kje...@medallia.com> wrote: > > For the most part -

Re: [ceph-users] I/O hangs with 2 node failure even if one node isn't involved in I/O

2017-03-22 Thread Kjetil Jørgensen
for writes. On Wed, Mar 22, 2017 at 8:05 AM, Adam Carheden <carhe...@ucar.edu> wrote: > On Tue, Mar 21, 2017 at 1:54 PM, Kjetil Jørgensen <kje...@medallia.com> > wrote: > > >> c. Reads can continue from the single online OSD even in pgs that > >>

Re: [ceph-users] I/O hangs with 2 node failure even if one node isn't involved in I/O

2017-03-21 Thread Kjetil Jørgensen
aving 1 > replica > > > min_size dictates that IO freezes for those objects until min_size > is > > > achieved. http://docs.ceph.com/docs/jewel/rados/operations/pools/# > set-the-number-of-object-replicas > > <http://docs.ceph.com/docs/jewel/rados/oper

Re: [ceph-users] I/O hangs with 2 node failure even if one node isn't involved in I/O

2017-03-20 Thread Kjetil Jørgensen
Hi, rbd_id.vm-100-disk-1 is only a "meta object", IIRC, it's contents will get you a "prefix", which then gets you on to rbd_header., rbd_header.prefix contains block size, striping, etc. The actual data bearing objects will be named something like rbd_data.prefix.%-016x. Example - vm-100-disk-1

[ceph-users] cephfs-data-scan scan_links cross version from master on jewel ?

2017-01-12 Thread Kjetil Jørgensen
Hi, I want/need cephfs-data-scan scan_links, it's in master, although we're currently on jewel (10.2.5). Am I better off cherry-picking the relevant commit onto the jewel branch rather than just using master ? Cheers, -- Kjetil Joergensen SRE, Medallia Inc Phone: +1 (650)

[ceph-users] jewel/ceph-osd/filestore: Moving omap to separate filesystem/device

2016-12-08 Thread Kjetil Jørgensen
Hi, so - we're considering moving omap out to faster media than our rather slow spinning rust. There's been some discussion around this here: https://github.com/ceph/ceph/pull/6421 Since this hasn't landed in jewel, or the ceph-disk convenience bits - we're thinking of "other ways" of doing

Re: [ceph-users] jewel/CephFS - misc problems (duplicate strays, mismatch between head items and fnode.fragst)

2016-10-07 Thread Kjetil Jørgensen
Hi On Fri, Oct 7, 2016 at 6:31 AM, Yan, Zheng <uker...@gmail.com> wrote: > On Fri, Oct 7, 2016 at 8:20 AM, Kjetil Jørgensen <kje...@medallia.com> > wrote: > > And - I just saw another recent thread - > > http://tracker.ceph.com/issues/17177 - can be an explanation

Re: [ceph-users] jewel/CephFS - misc problems (duplicate strays, mismatch between head items and fnode.fragst)

2016-10-07 Thread Kjetil Jørgensen
On Fri, Oct 7, 2016 at 4:46 AM, John Spray <jsp...@redhat.com> wrote: > On Fri, Oct 7, 2016 at 1:05 AM, Kjetil Jørgensen <kje...@medallia.com> > wrote: > > Hi, > > > > context (i.e. what we're doing): We're migrating (or trying to) migrate > off > >

Re: [ceph-users] jewel/CephFS - misc problems (duplicate strays, mismatch between head items and fnode.fragst)

2016-10-06 Thread Kjetil Jørgensen
daemon mds.foo scrub_path ? -KJ On Thu, Oct 6, 2016 at 5:05 PM, Kjetil Jørgensen <kje...@medallia.com> wrote: > Hi, > > context (i.e. what we're doing): We're migrating (or trying to) migrate > off of an nfs server onto cephfs, for a workload that's best described as > &qu

[ceph-users] jewel/CephFS - misc problems (duplicate strays, mismatch between head items and fnode.fragst)

2016-10-06 Thread Kjetil Jørgensen
Hi, context (i.e. what we're doing): We're migrating (or trying to) migrate off of an nfs server onto cephfs, for a workload that's best described as "big piles" of hardlinks. Essentially, we have a set of "sources": foo/01/ foo/0b/<0b> .. and so on bar/02/.. bar/0c/.. .. and so on

[ceph-users] HitSet - memory requirement

2016-08-31 Thread Kjetil Jørgensen
Hi, http://docs.ceph.com/docs/master/rados/operations/cache-tiering/ states > > Note A larger hit_set_count results in more RAM consumed by the ceph-osd > process. By how much - what order - kb ? mb ? gb ? After some spelunking - there's osd_hit_set_max_size, is it fair to make the following

Re: [ceph-users] ceph-mon cpu usage

2015-07-24 Thread Kjetil Jørgensen
It sounds slightly similar to what I just experienced. I had one monitor out of three, which seemed to essentially run one core at full tilt continuously, and had it's virtual address space allocated at the point where top started calling it Tb. Requests hitting this monitor did not get very