[ceph-users] Delay time in Multi-site sync

2019-08-06 Thread Hoan Nguyen Van
Hi all. I want to delay time for sync process from primary zone to secondary zone. If some one want to delete my data i have enough time to process. How i can do it. Config some options, install more proxy. Any solutions. Thanks. Regards ___

Re: [ceph-users] New CRUSH device class questions

2019-08-06 Thread Konstantin Shalygin
Is it possible to add a new device class like 'metadata'? Yes, but you don't need this. Just use your existing class with another crush ruleset. If I set the device class manually, will it be overwritten when the OSD boots up? Nope. Classes assigned automatically when OSD is created, not

Re: [ceph-users] New CRUSH device class questions

2019-08-06 Thread Robert LeBlanc
On Tue, Aug 6, 2019 at 11:11 AM Paul Emmerich wrote: > On Tue, Aug 6, 2019 at 7:45 PM Robert LeBlanc > wrote: > > We have a 12.2.8 luminous cluster with all NVMe and we want to take some > of the NVMe OSDs and allocate them strictly to metadata pools (we have a > problem with filling up this

Re: [ceph-users] 14.2.2 - OSD Crash

2019-08-06 Thread Brad Hubbard
-63> 2019-08-07 00:51:52.861 7fe987e49700 1 heartbeat_map clear_timeout 'OSD::osd_op_tp thread 0x7fe987e49700' had suicide timed out after 150 You hit a suicide timeout, that's fatal. On line 80 the process kills the thread based on the assumption it's hung. src/common/HeartbeatMap.cc: 66

Re: [ceph-users] RadosGW (Ceph Object Gateay) Pools

2019-08-06 Thread EDH - Manuel Rios Fernandez
Hi, I think -> default.rgw.buckets.index for us it reach 2k-6K iops for a index size of 23GB. Regards Manuel -Mensaje original- De: ceph-users En nombre de dhils...@performair.com Enviado el: miércoles, 7 de agosto de 2019 1:41 Para: ceph-users@lists.ceph.com Asunto: [ceph-users]

[ceph-users] RadosGW (Ceph Object Gateay) Pools

2019-08-06 Thread DHilsbos
All; Based on the PG Calculator, on the Ceph website, I have this list of pools to pre-create for my Object Gateway: .rgw.root default.rgw.control default.rgw.data.root default.rgw.gc default.rgw.log default.rgw.intent-log default.rgw.meta default.rgw.usage default.rgw.users.keys

[ceph-users] Error Mounting CephFS

2019-08-06 Thread DHilsbos
All; I have a server running CentOS 7.6 (1810), that I want to set up with CephFS (full disclosure, I'm going to be running samba on the CephFS). I can mount the CephFS fine when I use the option secret=, but when I switch to secretfile=, I get an error "No such process." I installed

Re: [ceph-users] How to maximize the OSD effective queue depth in Ceph?

2019-08-06 Thread Anthony D'Atri
> However, I'm starting to think that the problem isn't with the number > of threads that have work to do... the problem may just be that the > OSD & PG code has enough thread locking happening that there is no > possible way to have more than a few things happening on a single OSD > (or perhaps a

[ceph-users] 14.2.2 - OSD Crash

2019-08-06 Thread EDH - Manuel Rios Fernandez
Hi We got a pair of OSD located in node that crash randomly since 14.2.2 OS Version : Centos 7.6 There're a ton of lines before crash , I will unespected: -- 3045> 2019-08-07 00:39:32.013 7fe9a4996700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fe987e49700' had timed

Re: [ceph-users] How to maximize the OSD effective queue depth in Ceph?

2019-08-06 Thread Mark Lehrer
Thanks, that looks quite useful. I did a few tests and got basically a null result. In fact, when I put the RBDs on different pools on the same SSDs or pools on different SSDs, performance was a few percent worse than leaving them on the same pool. I definitely wasn't expecting this! It looks

Re: [ceph-users] [Ceph-users] Re: MDS failing under load with large cache sizes

2019-08-06 Thread Janek Bevendorff
> Your parallel rsync job is only getting 150 creates per second? What > was the previous throughput? I am actually not quite sure what the exact throughput was or is or what I can expect. It varies so much. I am copying from a 23GB file list that is split into 3000 chunks which are then

Re: [ceph-users] New CRUSH device class questions

2019-08-06 Thread Paul Emmerich
On Tue, Aug 6, 2019 at 7:45 PM Robert LeBlanc wrote: > We have a 12.2.8 luminous cluster with all NVMe and we want to take some of > the NVMe OSDs and allocate them strictly to metadata pools (we have a problem > with filling up this cluster and causing lingering metadata problems, and > this

[ceph-users] New CRUSH device class questions

2019-08-06 Thread Robert LeBlanc
We have a 12.2.8 luminous cluster with all NVMe and we want to take some of the NVMe OSDs and allocate them strictly to metadata pools (we have a problem with filling up this cluster and causing lingering metadata problems, and this will guarantee space for metadata operations). In the past, we

Re: [ceph-users] How to maximize the OSD effective queue depth in Ceph?

2019-08-06 Thread Mark Nelson
You may be interested in using my wallclock profiler to look at lock contention: https://github.com/markhpc/gdbpmp It will greatly slow down the OSD but will show you where time is being spent and so far the results appear to at least be relatively informative.  I used it recently when

Re: [ceph-users] [Ceph-users] Re: MDS failing under load with large cache sizes

2019-08-06 Thread Patrick Donnelly
On Tue, Aug 6, 2019 at 7:57 AM Janek Bevendorff wrote: > > > > 4k req/s is too fast for a create workload on one MDS. That must > > include other operations like getattr. > > That is rsync going through millions of files checking which ones need > updating. Right now there are not actually any

Re: [ceph-users] How to maximize the OSD effective queue depth in Ceph?

2019-08-06 Thread Mark Lehrer
I have a few more cycles this week to dedicate to the problem of making OSDs do more than maybe 5 simultaneous operations (as measured by the iostat effective queue depth of the drive). However, I'm starting to think that the problem isn't with the number of threads that have work to do... the

Re: [ceph-users] tcmu-runner: "Acquired exclusive lock" every 21s

2019-08-06 Thread Mike Christie
On 08/06/2019 11:28 AM, Mike Christie wrote: > On 08/06/2019 07:51 AM, Matthias Leopold wrote: >> >> >> Am 05.08.19 um 18:31 schrieb Mike Christie: >>> On 08/05/2019 05:58 AM, Matthias Leopold wrote: Hi, I'm still testing my 2 node (dedicated) iSCSI gateway with ceph 12.2.12

Re: [ceph-users] tcmu-runner: "Acquired exclusive lock" every 21s

2019-08-06 Thread Mike Christie
On 08/06/2019 07:51 AM, Matthias Leopold wrote: > > > Am 05.08.19 um 18:31 schrieb Mike Christie: >> On 08/05/2019 05:58 AM, Matthias Leopold wrote: >>> Hi, >>> >>> I'm still testing my 2 node (dedicated) iSCSI gateway with ceph 12.2.12 >>> before I dare to put it into production. I installed

Re: [ceph-users] radosgw (beast): how to enable verbose log? request, user-agent, etc.

2019-08-06 Thread EDH - Manuel Rios Fernandez
Hi Felix, You can increase debug option with debug rgw in your rgw nodes. We got it to 10. But at least in our case we switched again to civetweb because it don’t provide a clear log without a lot verbose. Regards Manuel De: ceph-users En nombre de Félix Barbeira

[ceph-users] radosgw (beast): how to enable verbose log? request, user-agent, etc.

2019-08-06 Thread Félix Barbeira
Hi, I'm testing radosgw with beast backend and I did not found a way to view more information on logfile. This is an example: 2019-08-06 16:59:14.488 7fc808234700 1 == starting new request req=0x5608245646f0 = 2019-08-06 16:59:14.496 7fc808234700 1 == req done req=0x5608245646f0 op

Re: [ceph-users] [Ceph-users] Re: MDS failing under load with large cache sizes

2019-08-06 Thread Janek Bevendorff
4k req/s is too fast for a create workload on one MDS. That must include other operations like getattr. That is rsync going through millions of files checking which ones need updating. Right now there are not actually any create operations, since I restarted the copy job. I wouldn't

Re: [ceph-users] [Ceph-users] Re: MDS failing under load with large cache sizes

2019-08-06 Thread Patrick Donnelly
On Tue, Aug 6, 2019 at 12:48 AM Janek Bevendorff wrote: > > However, now my client processes are basically in constant I/O wait > > state and the CephFS is slow for everybody. After I restarted the copy > > job, I got around 4k reqs/s and then it went down to 100 reqs/s with > > everybody waiting

Re: [ceph-users] tcmu-runner: "Acquired exclusive lock" every 21s

2019-08-06 Thread Matthias Leopold
Am 05.08.19 um 18:31 schrieb Mike Christie: On 08/05/2019 05:58 AM, Matthias Leopold wrote: Hi, I'm still testing my 2 node (dedicated) iSCSI gateway with ceph 12.2.12 before I dare to put it into production. I installed latest tcmu-runner release (1.5.1) and (like before) I'm seeing that

[ceph-users] OSD's keep crasching after clusterreboot

2019-08-06 Thread Ansgar Jazdzewski
hi folks, we had to move one of our clusters so we had to boot all servers, now we found an Error on all OSD with the EC-Pool. do we miss some opitons, will an upgrade to 13.2.6 help? Thanks, Ansgar 2019-08-06 12:10:16.265 7fb337b83200 -1 /build/ceph-13.2.4/src/osd/ECUtil.h: In function

Re: [ceph-users] bluestore write iops calculation

2019-08-06 Thread nokia ceph
On Mon, Aug 5, 2019 at 6:35 PM wrote: > > Hi Team, > > @vita...@yourcmc.ru , thank you for information and could you please > > clarify on the below quires as well, > > > > 1. Average object size we use will be 256KB to 512KB , will there be > > deferred write queue ? > > With the default

Re: [ceph-users] [Ceph-users] Re: MDS failing under load with large cache sizes

2019-08-06 Thread Janek Bevendorff
However, now my client processes are basically in constant I/O wait state and the CephFS is slow for everybody. After I restarted the copy job, I got around 4k reqs/s and then it went down to 100 reqs/s with everybody waiting their turn. So yes, it does seem to help, but it increases

Re: [ceph-users] [Ceph-users] Re: MDS failing under load with large cache sizes

2019-08-06 Thread Janek Bevendorff
Thanks that helps. Looks like the problem is that the MDS is not automatically trimming its cache fast enough. Please try bumping mds_cache_trim_threshold: bin/ceph config set mds mds_cache_trim_threshold 512K That did help. Somewhat. I removed the aggressive recall settings I set before