[ceph-users] running xfs_fsr on ceph OSDs

2016-10-24 Thread mj
Hi, We have been running xfs on our servers for many years, and we are used to run a scheduled xfs_fsr during the weekend. Lately we have started using proxmox / ceph, and I'm wondering if we would benefit (like 'the old days') from scheduled xfs_fsr runs? Our OSDs are xfs, plus the VMs are

[ceph-users] 答复: tgt with ceph

2016-10-24 Thread Lu Dillon
Sorry for spam again. By the tgtadm's man, I tried to add "bsopts" option in the tgt's configuration, but failed. And then, I tried to add a "client.user" secion in the button of the ceph.conf. But, this still doesn't work. The section looks like this: [client] name = iscsiuser keyring = /et

Re: [ceph-users] running xfs_fsr on ceph OSDs

2016-10-24 Thread Christian Balzer
Hello, On Mon, 24 Oct 2016 09:41:37 +0200 mj wrote: > Hi, > > We have been running xfs on our servers for many years, and we are used > to run a scheduled xfs_fsr during the weekend. > > Lately we have started using proxmox / ceph, and I'm wondering if we > would benefit (like 'the old days'

Re: [ceph-users] New cephfs cluster performance issues- Jewel - cache pressure, capability release, poor iostat await avg queue size

2016-10-24 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Christian Balzer > Sent: 24 October 2016 02:30 > To: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] New cephfs cluster performance issues- Jewel - > cache pressure, capability release,

Re: [ceph-users] Ceph and TCP States

2016-10-24 Thread Yan, Zheng
On Sat, Oct 22, 2016 at 4:14 AM, Gregory Farnum wrote: > On Fri, Oct 21, 2016 at 7:56 AM, Nick Fisk wrote: >>> -Original Message- >>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of >>> Haomai Wang >>> Sent: 21 October 2016 15:40 >>> To: Nick Fisk >>> Cc: ceph-u

Re: [ceph-users] Ceph and TCP States

2016-10-24 Thread Nick Fisk
> -Original Message- > From: Yan, Zheng [mailto:uker...@gmail.com] > Sent: 24 October 2016 10:19 > To: Gregory Farnum > Cc: Nick Fisk ; Zheng Yan ; Ceph Users > > Subject: Re: [ceph-users] Ceph and TCP States > > X-Assp-URIBL failed: 'ceph-users-ceph.com'(black.uribl.com ) > X-Assp-Spam

Re: [ceph-users] Ceph and TCP States

2016-10-24 Thread Ilya Dryomov
On Mon, Oct 24, 2016 at 11:29 AM, Nick Fisk wrote: >> -Original Message- >> From: Yan, Zheng [mailto:uker...@gmail.com] >> Sent: 24 October 2016 10:19 >> To: Gregory Farnum >> Cc: Nick Fisk ; Zheng Yan ; Ceph Users >> >> Subject: Re: [ceph-users] Ceph and TCP States >> >> X-Assp-URIBL f

Re: [ceph-users] Ceph and TCP States

2016-10-24 Thread Yan, Zheng
> On 24 Oct 2016, at 17:29, Nick Fisk wrote: > >> -Original Message- >> From: Yan, Zheng [mailto:uker...@gmail.com] >> Sent: 24 October 2016 10:19 >> To: Gregory Farnum >> Cc: Nick Fisk ; Zheng Yan ; Ceph Users >> >> Subject: Re: [ceph-users] Ceph and TCP States >> >> X-Assp-URIBL fa

Re: [ceph-users] Monitoring Overhead

2016-10-24 Thread John Spray
On Mon, Oct 24, 2016 at 4:21 AM, Ashley Merrick wrote: > Hello, > > > > This may come across as a simple question but just wanted to check. > > > > I am looking at importing live data from my cluster via ceph -s e.t.c into a > graphical graph interface so I can monitor performance / iops / e.t.c >

Re: [ceph-users] cache tiering deprecated in RHCS 2.0

2016-10-24 Thread Dietmar Rieder
On 10/24/2016 03:10 AM, Christian Balzer wrote: [...] > There are several items here and I very much would welcome a response from > a Ceph/RH representative. > > 1. Is that depreciation only in regards to RHCS, as Nick seems to hope? > Because I very much doubt that, why develop code you just "

Re: [ceph-users] Ceph and TCP States

2016-10-24 Thread Nick Fisk
> -Original Message- > From: Ilya Dryomov [mailto:idryo...@gmail.com] > Sent: 24 October 2016 10:33 > To: Nick Fisk > Cc: Yan, Zheng ; Gregory Farnum ; > Zheng Yan ; Ceph Users us...@lists.ceph.com> > Subject: Re: [ceph-users] Ceph and TCP States > > On Mon, Oct 24, 2016 at 11:29 AM, Ni

Re: [ceph-users] Monitoring Overhead

2016-10-24 Thread Christian Balzer
Hello, On Mon, 24 Oct 2016 10:46:31 +0100 John Spray wrote: > On Mon, Oct 24, 2016 at 4:21 AM, Ashley Merrick wrote: > > Hello, > > > > > > > > This may come across as a simple question but just wanted to check. > > > > > > > > I am looking at importing live data from my cluster via ceph -s e.t

Re: [ceph-users] Monitoring Overhead

2016-10-24 Thread Ashley Merrick
Hello, Thanks both for your responses, defiantly looking at collectd + graphite, just wanted to see what overheads where like, far from in a situation that would choke the cluster but wanted to check first. Thanks, Ashley -Original Message- From: Christian Balzer [mailto:ch...@gol.com]

Re: [ceph-users] Ceph Very Small Cluster

2016-10-24 Thread Ranjan Ghosh
Thanks JC & Greg, I've changed the "mon osd min down reporters" to 1. According to this: http://docs.ceph.com/docs/jewel/rados/configuration/mon-osd-interaction/ the default is already 1, though. I don't remember the value before I changed it everywhere, so I can't say for sure now. But I thin

Re: [ceph-users] effect of changing ceph osd primary affinity

2016-10-24 Thread Ilya Dryomov
On Fri, Oct 21, 2016 at 10:35 PM, Ridwan Rashid Noel wrote: > Thank you for your reply Greg. Is there any detailed resource that describe > about how the primary affinity changing works? All I got from searching was > one paragraph from the documentation. No, probably nothing detailed. There isn

Re: [ceph-users] cache tiering deprecated in RHCS 2.0

2016-10-24 Thread Oliver Dzombic
Hi, if ceph will remote the cache tiering, and not replacing by something similar, it will fall behind other, existing solutions. I dont know what strategy stands behind this decision. But we all cant start advertising and announcing the caching, deviding hot and cold stores to customers all the

Re: [ceph-users] cephfs page cache

2016-10-24 Thread Yan, Zheng
I finally reproduced this issue. Adding following lines to httpd.conf can workaround this issue. EnableMMAP off EnableSendfile off On Sat, Sep 3, 2016 at 11:07 AM, Yan, Zheng wrote: > On Fri, Sep 2, 2016 at 5:10 PM, Sean Redmond wrote: >> I have checked all the servers in scope running 'dmes

Re: [ceph-users] 答复: tgt with ceph

2016-10-24 Thread Jason Dillaman
I think you are looking for the "id" option -- not "name". [1] https://github.com/fujita/tgt/blob/master/doc/README.rbd#L36 On Mon, Oct 24, 2016 at 3:58 AM, Lu Dillon wrote: > Sorry for spam again. > > > By the tgtadm's man, I tried to add "bsopts" option in the tgt's > configuration, but failed

Re: [ceph-users] Ceph and TCP States

2016-10-24 Thread Ilya Dryomov
On Mon, Oct 24, 2016 at 11:50 AM, Nick Fisk wrote: >> -Original Message- >> From: Ilya Dryomov [mailto:idryo...@gmail.com] >> Sent: 24 October 2016 10:33 >> To: Nick Fisk >> Cc: Yan, Zheng ; Gregory Farnum ; >> Zheng Yan ; Ceph Users > us...@lists.ceph.com> >> Subject: Re: [ceph-users] C

Re: [ceph-users] Ceph and TCP States

2016-10-24 Thread Nick Fisk
> -Original Message- > From: Ilya Dryomov [mailto:idryo...@gmail.com] > Sent: 24 October 2016 14:45 > To: Nick Fisk > Cc: Yan, Zheng ; Gregory Farnum ; > Zheng Yan ; Ceph Users us...@lists.ceph.com> > Subject: Re: [ceph-users] Ceph and TCP States > > On Mon, Oct 24, 2016 at 11:50 AM, Ni

Re: [ceph-users] Ceph Very Small Cluster

2016-10-24 Thread Gregory Farnum
On Mon, Oct 24, 2016 at 3:31 AM, Ranjan Ghosh wrote: > Thanks JC & Greg, I've changed the "mon osd min down reporters" to 1. > According to this: > > http://docs.ceph.com/docs/jewel/rados/configuration/mon-osd-interaction/ > > the default is already 1, though. I don't remember the value before I >

Re: [ceph-users] reliable monitor restarts

2016-10-24 Thread Wes Dillingham
What do the logs of the monitor service say? Increase their verbosity and check the logs at the time of the crash. Are you doing any sort of monitoring on the nodes such that you can forensically check what the system was up to prior to the crash? As others have said systemd can handle this via un

Re: [ceph-users] Replica count

2016-10-24 Thread Gregory Farnum
On Sun, Oct 23, 2016 at 10:45 PM, David Turner < david.tur...@storagecraft.com> wrote: > 1/3 of your raw data on the osds will be deleted and then it move a bunch > around. I haven't done it personally, but I would guess somewhere in the > range of 50-70% data movement. It will depend on how many

Re: [ceph-users] Memory leak in radosgw

2016-10-24 Thread Trey Palmer
Updating to libcurl 7.44 fixed the memory leak issue. Thanks for the tip, Ben. FWIW this was a massive memory leak, it rendered the system untenable in my testing. RGW multisite will flat not work with the current CentOS/RHEL7 libcurl. Seems like there are a lot of different problems caused b

Re: [ceph-users] Surviving a ceph cluster outage: the hard way

2016-10-24 Thread Dan Jakubiec
Thanks Kostis, great read. We also had a Ceph disaster back in August and a lot of this experience looked familiar. Sadly, in the end we were not able to recover our cluster but glad to hear that you were successful. LevelDB corruptions were one of our big problems. Your note below about r

Re: [ceph-users] Memory leak in radosgw

2016-10-24 Thread Ken Dreyer
Hi Trey, If you run the upstream curl releases, please note that curl has a poor security record and it's important to stay on top of updates. https://curl.haxx.se/docs/security.html indicates that 7.44 has security problems, and in fact there are eleven more security announcements coming soon (ht

[ceph-users] All PGs are active+clean, still remapped PGs

2016-10-24 Thread Wido den Hollander
Hi, On a cluster running Hammer 0.94.9 (upgraded from Firefly) I have 29 remapped PGs according to the OSDMap, but all PGs are active+clean. osdmap e111208: 171 osds: 166 up, 166 in; 29 remapped pgs pgmap v101069070: 6144 pgs, 2 pools, 90122 GB data, 22787 kobjects 264 TB used, 184 TB / 448

Re: [ceph-users] Issue with Ceph padding files out to ceph.dir.layout.stripe_unit size

2016-10-24 Thread Kate Ward
I've upgraded to 10.2.3 from the Ubuntu 16.04 proposed source, and unfortunately still have the issue. It so far seems less frequent, but I will continue monitoring, and provide notes on this thread if I am every able to find a root cause, and a consistently reproducible scenario. On Fri, Oct 21,

Re: [ceph-users] All PGs are active+clean, still remapped PGs

2016-10-24 Thread David Turner
Are you running a replica size of 4? If not, these might be errantly being reported as being on 10. [cid:imagedfab80.JPG@a622f997.4d830ea4] David Turner | Cloud Operations Engineer | StorageCraft Technology Corporation

Re: [ceph-users] All PGs are active+clean, still remapped PGs

2016-10-24 Thread Dan van der Ster
Hi Wido, This seems similar to what our dumpling tunables cluster does when a few particular osds go down... Though in our case the remapped pgs are correctly shown as remapped, not clean. The fix in our case will be to enable the vary_r tunable (which will move some data). Cheers, Dan On 24 Oc

Re: [ceph-users] All PGs are active+clean, still remapped PGs

2016-10-24 Thread David Turner
More to my curiosity on this. Our clusters leave behind /var/lib/ceph/osd/ceph-##/current/pg_temp folders on occasion. if you check all of the pg_temp folders for osd.10, you might find something that's holding onto the pg even if it's really moved on. [cid:i

Re: [ceph-users] effect of changing ceph osd primary affinity

2016-10-24 Thread Ridwan Rashid Noel
Thank you Ilya for the detailed explanation. Regards, Ridwan Noel On Mon, Oct 24, 2016 at 6:30 AM, Ilya Dryomov wrote: > On Fri, Oct 21, 2016 at 10:35 PM, Ridwan Rashid Noel > wrote: > > Thank you for your reply Greg. Is there any detailed resource that > describe > > about how the primary af

Re: [ceph-users] cache tiering deprecated in RHCS 2.0

2016-10-24 Thread Christian Balzer
Hello, On Mon, 24 Oct 2016 11:49:15 +0200 Dietmar Rieder wrote: > On 10/24/2016 03:10 AM, Christian Balzer wrote: > > [...] > > There are several items here and I very much would welcome a response from > > a Ceph/RH representative. > > > > 1. Is that depreciation only in regards to RHCS, as N

[ceph-users] v0.94 OSD crashes

2016-10-24 Thread Zhang Qiang
Hi, One of several OSDs on the same machine crashed several times within days. It's always that one, other OSDs are all fine. Below is the dumped message, since it's too long here, I only pasted the head and tail of the recent events. If it's necessary to inspect the full log, please see https://g

Re: [ceph-users] v0.94 OSD crashes

2016-10-24 Thread Haomai Wang
could you check dmesg? I think there exists disk EIO error On Tue, Oct 25, 2016 at 9:58 AM, Zhang Qiang wrote: > Hi, > > One of several OSDs on the same machine crashed several times within days. > It's always that one, other OSDs are all fine. Below is the dumped message, > since it's too long

Re: [ceph-users] v0.94 OSD crashes

2016-10-24 Thread Zhang Qiang
Thanks Wang, looks like so, not Ceph to blame :) On 25 October 2016 at 09:59, Haomai Wang wrote: > could you check dmesg? I think there exists disk EIO error > > On Tue, Oct 25, 2016 at 9:58 AM, Zhang Qiang > wrote: > >> Hi, >> >> One of several OSDs on the same machine crashed several times wi

Re: [ceph-users] Deep scrubbing

2016-10-24 Thread kefu chai
posting this to ceph-users mailing list. On Tue, Oct 25, 2016 at 2:02 AM, Andrzej Jakowski wrote: > Hi, > > Wanted to learn more on what is the Ceph community take on the deep > scrubbing process. > It seems that deep scrubbing is expected to read data from physical > media: NAND dies or magnetic

[ceph-users] Does anyone know the pg_temp is still exist when the cluster state changes to activate+clean

2016-10-24 Thread Wangwenfeng
Hello, all I hava a Ceph cluster of Hammer 0.94.5, 3 Hosts and the cluster is 3 MONs and 8 OSDs for each host. When test the date recover and backfill of EC pool (K=2, M=1), each chunk for one host, I have an issue of the PG state. When the Ceph cluster state change activate+clean by “ceph -s”,

Re: [ceph-users] All PGs are active+clean, still remapped PGs

2016-10-24 Thread Wido den Hollander
> Op 24 oktober 2016 om 22:29 schreef Dan van der Ster : > > > Hi Wido, > > This seems similar to what our dumpling tunables cluster does when a few > particular osds go down... Though in our case the remapped pgs are > correctly shown as remapped, not clean. > > The fix in our case will be to

Re: [ceph-users] All PGs are active+clean, still remapped PGs

2016-10-24 Thread Wido den Hollander
> Op 24 oktober 2016 om 22:41 schreef David Turner > : > > > More to my curiosity on this. Our clusters leave behind > /var/lib/ceph/osd/ceph-##/current/pg_temp folders on occasion. if you check > all of the pg_temp folders for osd.10, you might find something that's > holding onto the pg