Re: [ceph-users] Ceph memory overhead when used with KVM

2017-05-15 Thread nick
Hi Jason, did you have some time to check if you can reproduce the high memory usage? I am not sure if I should create a bug report for this or if this is expected behaviour. Cheers Nick On Monday, May 08, 2017 08:55:55 AM you wrote: > Thanks. One more question: was the image a clone or a stand

Re: [ceph-users] corrupted rbd filesystems since jewel

2017-05-15 Thread Stefan Priebe - Profihost AG
> 3.) it still happens on pre jewel images even when they got restarted > / killed and reinitialized. In that case they've the asok socket > available > for now. Should i issue any command to the socket to get log out of > the hanging vm? Qemu is still responding just ceph / disk i/O gets > stall

Re: [ceph-users] corrupted rbd filesystems since jewel

2017-05-15 Thread Stefan Priebe - Profihost AG
Hello Jason, it got some further hints. Please see below. Am 15.05.2017 um 22:25 schrieb Jason Dillaman: > On Mon, May 15, 2017 at 3:54 PM, Stefan Priebe - Profihost AG > wrote: >> Would it be possible that the problem is the same you fixed? > > No, I would not expect it to be related to the ot

Re: [ceph-users] Cephalocon Cancelled

2017-05-15 Thread Blair Bethwaite
On 15 May 2017 at 23:21, Danny Al-Gaaf wrote: > What about moving the event to the next OpenStack Summit in Sydney, let > say directly following the Summit. +1! The Ceph day just gone at the Boston OpenStack Summit felt a lot like I imagined Cephalocon would be anyway, and as far as I know the O

Re: [ceph-users] Extremely high OSD memory utilization on Kraken 11.2.0 (with XFS -or- bluestore)

2017-05-15 Thread Aaron Ten Clay
Hi Sage, No problem. I thought this would take a lot longer to resolve so I waited to find a good chunk of time, then it only took a few minutes! Here are the respective backtrace outputs from gdb: https://aarontc.com/ceph/dumps/core.ceph-osd.150.082e9ca887c34cfbab183366a214a84c.6742.14926344930

Re: [ceph-users] Inconsistent pgs with size_mismatch_oi

2017-05-15 Thread Gregory Farnum
7f0186e28700 0 > log_channel(cluster) log [INF] : 36.277b scrub starts > ceph-osd.244.log-20170514.gz:2017-05-13 20:50:42.085718 7f0186e28700 0 > log_channel(cluster) log [INF] : 36.277b scrub ok > ceph-osd.244.log-20170515.gz:2017-05-15 00:10:39.417578 7f0184623700 0 > log_channel

Re: [ceph-users] Inconsistent pgs with size_mismatch_oi

2017-05-15 Thread Lincoln Bryant
0186e28700 0 log_channel(cluster) log [INF] : 36.277b scrub ok ceph-osd.244.log-20170515.gz:2017-05-15 00:10:39.417578 7f0184623700 0 log_channel(cluster) log [INF] : 36.277b scrub starts ceph-osd.244.log-20170515.gz:2017-05-15 00:11:26.189777 7f0186e28700 0 log_channel(cluster) log [INF] : 36.

Re: [ceph-users] Odd cyclical cluster performance

2017-05-15 Thread Gregory Farnum
Did you try correlating it with PG scrubbing or other maintenance behaviors? -Greg On Thu, May 11, 2017 at 12:47 PM, Patrick Dinnen wrote: > Seeing some odd behaviour while testing using rados bench. This is on > a pre-split pool, two node cluster with 12 OSDs total. > > ceph osd pool create newe

Re: [ceph-users] Inconsistent pgs with size_mismatch_oi

2017-05-15 Thread Gregory Farnum
On Mon, May 1, 2017 at 9:28 AM, Lincoln Bryant wrote: > Hi all, > > I’ve run across a peculiar issue on 10.2.7. On my 3x replicated cache tiering > cache pool, routine scrubbing suddenly found a bunch of PGs with > size_mismatch_oi errors. From the “rados list-inconsistent-pg tool”[1], I see >

Re: [ceph-users] corrupted rbd filesystems since jewel

2017-05-15 Thread Jason Dillaman
On Mon, May 15, 2017 at 3:54 PM, Stefan Priebe - Profihost AG wrote: > Would it be possible that the problem is the same you fixed? No, I would not expect it to be related to the other issues you are seeing. The issue I just posted a fix against only occurs when a client requests the lock from th

Re: [ceph-users] corrupted rbd filesystems since jewel

2017-05-15 Thread Stefan Priebe - Profihost AG
Hi, great thanks. I'm still trying but it's difficult to me as well. As it happens only sometimes there must be an unknown additional factor. For the future i've enabled client sockets for all VMs as well. But this does not help in this case - as it seems to be fixed after migration. Would it be

Re: [ceph-users] corrupted rbd filesystems since jewel

2017-05-15 Thread Jason Dillaman
I was able to re-create the issue where "rbd feature disable" hangs if the client experienced a long comms failure with the OSDs, and I have a proposed fix posted [1]. Unfortunately, I haven't been successful in repeating any stalled IO, discard issues, nor export-diff logged errors. I'll keep tryi

Re: [ceph-users] ceph-objectstore-tool apply-layout-settings

2017-05-15 Thread Anton Dmitriev
Unfotunately, there is no options setuser and setgroup for ceph-objectstore-tool. $ ceph-objectstore-tool --setuser ceph --setgroup ceph --data-path /var/lib/ceph/osd/ceph-177 --journal-path /var/lib/ceph/osd/ceph-177/journal --log-file=/var/log/ceph/objectstore_tool.177.log --op apply-layou

Re: [ceph-users] corrupted rbd filesystems since jewel

2017-05-15 Thread Stefan Priebe - Profihost AG
Hello Jason, > Just so I can attempt to repeat this: Thanks. > (1) you had an image that was built using Hammer clients and OSDs with > exclusive lock disabled Yes. It was created with the hammer rbd defaults. > (2) you updated your clients and OSDs to Jewel > (3) you restarted your OSDs and liv

Re: [ceph-users] ceph df space for rgw.buckets.data shows used even when files are deleted

2017-05-15 Thread Jens Rosenboom
2017-05-12 2:55 GMT+00:00 Ben Hines : > It actually seems like these values aren't being honored, i actually see > many more objects being processed by gc (as well as kraken object > lifecycle), even though my values are at the default 32 objs. > > 19:52:44 root@<> /var/run/ceph $ ceph --admin-daem

Re: [ceph-users] RGW: removal of support for fastcgi

2017-05-15 Thread Yehuda Sadeh-Weinraub
On Mon, May 15, 2017 at 8:35 AM, Ken Dreyer wrote: > On Fri, May 5, 2017 at 1:51 PM, Yehuda Sadeh-Weinraub > wrote: >> >> TL;DR: Does anyone care if we remove support for fastcgi in rgw? > > Please remove it as soon as possible. The old libfcgi project's code > is a security liability. When upst

Re: [ceph-users] RGW: removal of support for fastcgi

2017-05-15 Thread Ken Dreyer
On Fri, May 5, 2017 at 1:51 PM, Yehuda Sadeh-Weinraub wrote: > > TL;DR: Does anyone care if we remove support for fastcgi in rgw? Please remove it as soon as possible. The old libfcgi project's code is a security liability. When upstream died, there was a severe lack of coordination around distri

Re: [ceph-users] num_caps

2017-05-15 Thread Ranjan Ghosh
Ah, understand it much better now. Thank you so much for explaining. I hope/assume the caps dont prevent other clients from accessing the stuff in some way, right? +1, though, for the idea to be able to specify a timeout. We have a rsync backup job which runs over the whole filesystem every fe

Re: [ceph-users] Cephalocon Cancelled

2017-05-15 Thread Danny Al-Gaaf
Am 13.05.2017 um 21:28 schrieb Joao Eduardo Luis: > On 05/13/2017 09:06 AM, John Spray wrote: >> On Fri, May 12, 2017 at 9:45 PM, Wido den Hollander [...] >>> Sad to here, especially the reasoning behind it. But understandable! >>> >>> Let's move this event to Europe :-) >> >> My dining table seat

Re: [ceph-users] corrupted rbd filesystems since jewel

2017-05-15 Thread Jason Dillaman
Just so I can attempt to repeat this: (1) you had an image that was built using Hammer clients and OSDs with exclusive lock disabled (2) you updated your clients and OSDs to Jewel (3) you restarted your OSDs and live-migrated your VMs to pick up the Jewel changes (4) you enabled exclusive-lock, ob

Re: [ceph-users] num_caps

2017-05-15 Thread John Spray
On Mon, May 15, 2017 at 1:36 PM, Henrik Korkuc wrote: > On 17-05-15 13:40, John Spray wrote: >> >> On Mon, May 15, 2017 at 10:40 AM, Ranjan Ghosh wrote: >>> >>> Hi all, >>> >>> When I run "ceph daemon mds. session ls" I always get a fairly >>> large >>> number for num_caps (200.000). Is this norm

Re: [ceph-users] num_caps

2017-05-15 Thread Henrik Korkuc
On 17-05-15 13:40, John Spray wrote: On Mon, May 15, 2017 at 10:40 AM, Ranjan Ghosh wrote: Hi all, When I run "ceph daemon mds. session ls" I always get a fairly large number for num_caps (200.000). Is this normal? I thought caps are sth. like open/locked files meaning a client is holding a ca

Re: [ceph-users] num_caps

2017-05-15 Thread John Spray
On Mon, May 15, 2017 at 10:40 AM, Ranjan Ghosh wrote: > Hi all, > > When I run "ceph daemon mds. session ls" I always get a fairly large > number for num_caps (200.000). Is this normal? I thought caps are sth. like > open/locked files meaning a client is holding a cap on a file and no other > clie

[ceph-users] num_caps

2017-05-15 Thread Ranjan Ghosh
Hi all, When I run "ceph daemon mds. session ls" I always get a fairly large number for num_caps (200.000). Is this normal? I thought caps are sth. like open/locked files meaning a client is holding a cap on a file and no other client can access it during this time. How can I debug this if it