Re: [ceph-users] Blocked requests activating+remapped afterextendingpg(p)_num

2018-05-17 Thread Kevin Olbrich
Hi! @Paul Thanks! I know, I read the whole topic about size 2 some months ago. But this has not been my decision, I had to set it up like that. In the meantime, I did a reboot of node1001 and node1002 with flag "noout" set and now peering has finished and only 0.0x% are rebalanced. IO is flowing

Re: [ceph-users] Blocked requests activating+remapped afterextendingpg(p)_num

2018-05-17 Thread Paul Emmerich
Check ceph pg query, it will (usually) tell you why something is stuck inactive. Also: never do min_size 1. Paul 2018-05-17 15:48 GMT+02:00 Kevin Olbrich : > I was able to obtain another NVMe to get the HDDs in node1004 into the > cluster. > The number of disks (all 1TB) is now balanced betwe

Re: [ceph-users] Blocked requests activating+remapped afterextendingpg(p)_num

2018-05-17 Thread Kevin Olbrich
I was able to obtain another NVMe to get the HDDs in node1004 into the cluster. The number of disks (all 1TB) is now balanced between racks, still some inactive PGs: data: pools: 2 pools, 1536 pgs objects: 639k objects, 2554 GB usage: 5167 GB used, 14133 GB / 19300 GB avail p

Re: [ceph-users] Blocked requests activating+remapped afterextendingpg(p)_num

2018-05-17 Thread Kevin Olbrich
Ok, I just waited some time but I still got some "activating" issues: data: pools: 2 pools, 1536 pgs objects: 639k objects, 2554 GB usage: 5194 GB used, 11312 GB / 16506 GB avail pgs: 7.943% pgs not active 5567/1309948 objects degraded (0.425%) 1

Re: [ceph-users] Blocked requests activating+remapped afterextendingpg(p)_num

2018-05-17 Thread Kevin Olbrich
PS: Cluster currently is size 2, I used PGCalc on Ceph website which, by default, will place 200 PGs on each OSD. I read about the protection in the docs and later noticed that I better had only placed 100 PGs. 2018-05-17 13:35 GMT+02:00 Kevin Olbrich : > Hi! > > Thanks for your quick reply. > B

Re: [ceph-users] Blocked requests activating+remapped afterextendingpg(p)_num

2018-05-17 Thread Kevin Olbrich
Hi! Thanks for your quick reply. Before I read your mail, i applied the following conf to my OSDs: ceph tell 'osd.*' injectargs '--osd_max_pg_per_osd_hard_ratio 32' Status is now: data: pools: 2 pools, 1536 pgs objects: 639k objects, 2554 GB usage: 5211 GB used, 11295 GB / 16506

Re: [ceph-users] Blocked requests activating+remapped afterextendingpg(p)_num

2018-05-17 Thread Burkhard Linke
Hi, On 05/17/2018 01:09 PM, Kevin Olbrich wrote: Hi! Today I added some new OSDs (nearly doubled) to my luminous cluster. I then changed pg(p)_num from 256 to 1024 for that pool because it was complaining about to few PGs. (I noticed that should better have been small changes). This is the cu

[ceph-users] Blocked requests activating+remapped after extending pg(p)_num

2018-05-17 Thread Kevin Olbrich
Hi! Today I added some new OSDs (nearly doubled) to my luminous cluster. I then changed pg(p)_num from 256 to 1024 for that pool because it was complaining about to few PGs. (I noticed that should better have been small changes). This is the current status: health: HEALTH_ERR 336

Re: [ceph-users] Blocked Requests

2018-04-25 Thread Shantur Rathore
Hi all, So using ceph-ansible, i built the below mentioned cluster with 2 OSD Nodes and 3 Mons Just after creating osds i started benchmarking the performance using "rbd bench" and "rados bench" and started seeing the performance drop. Checking the status shows slow requests. [root@storage-28-1

Re: [ceph-users] Blocked requests

2017-12-14 Thread Fulvio Galeazzi
instability you mention, experimenting with BlueStore looks like a better alternative. Thanks again Fulvio Original Message Subject: Re: [ceph-users] Blocked requests From: Matthew Stroud To: Fulvio Galeazzi , Brian Andrus CC: "ceph-

Re: [ceph-users] Blocked requests

2017-12-13 Thread Matthew Stroud
al Message Subject: Re: [ceph-users] Blocked requests From: Matthew Stroud To: Brian Andrus CC: "ceph-users@lists.ceph.com" Date: 09/07/2017 11:01 PM > After some troubleshooting, the issues appear to be caused by gnocchi > using rados. I’

Re: [ceph-users] Blocked requests

2017-12-13 Thread Fulvio Galeazzi
your help Fulvio Original Message Subject: Re: [ceph-users] Blocked requests From: Matthew Stroud To: Brian Andrus CC: "ceph-users@lists.ceph.com" Date: 09/07/2017 11:01 PM After some troubleshooting, the issues appear to be caused by gnoc

Re: [ceph-users] Blocked requests

2017-09-07 Thread Brad Hubbard
roud > > > > From: Brian Andrus > Date: Thursday, September 7, 2017 at 1:53 PM > To: Matthew Stroud > Cc: David Turner , "ceph-users@lists.ceph.com" > > > > Subject: Re: [ceph-users] Blocked requests > > > > "ceph osd blocked-by" can do t

Re: [ceph-users] Blocked requests

2017-09-07 Thread Matthew Stroud
ail.com>> Date: Thursday, September 7, 2017 at 1:17 PM To: Matthew Stroud mailto:mattstr...@overstock.com>>, "ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>" mailto:ceph-users@lists.ceph.com>> Subject: Re: [ceph-users] Blocked requests I would recom

Re: [ceph-users] Blocked requests

2017-09-07 Thread Brian Andrus
ster [WRN] 73 slow requests, 1 included > below; oldest blocked for > 1821.342499 secs > > /var/log/ceph/ceph.log:2017-09-07 13:29:43.979559 osd.10 > 10.20.57.15:6806/7029 9371 : cluster [WRN] slow request 30.452344 seconds > old, received at 2017-09-07 13:29:13.527157: osd_op(client.1

Re: [ceph-users] Blocked requests

2017-09-07 Thread Matthew Stroud
: Matthew Stroud , "ceph-users@lists.ceph.com" Subject: Re: [ceph-users] Blocked requests I would recommend pushing forward with the update instead of rolling back. Ceph doesn't have a track record of rolling back to a previous version. I don't have enough information to really make

Re: [ceph-users] Blocked requests

2017-09-07 Thread David Turner
> > > 1 ops are blocked > 524.288 sec on osd.2 > > 1 ops are blocked > 262.144 sec on osd.2 > > 2 ops are blocked > 65.536 sec on osd.21 > > 9 ops are blocked > 1048.58 sec on osd.5 > > 9 ops are blocked > 524.288 sec on osd.5 > > 71 ops are blocke

Re: [ceph-users] Blocked requests

2017-09-07 Thread Matthew Stroud
have slow requests recovery 4678/1097738 objects degraded (0.426%) recovery 10364/1097738 objects misplaced (0.944%) From: David Turner Date: Thursday, September 7, 2017 at 11:33 AM To: Matthew Stroud , "ceph-users@lists.ceph.com" Subject: Re: [ceph-users] Blocked requests To be

Re: [ceph-users] Blocked requests

2017-09-07 Thread David Turner
To be fair, other times I have to go in and tweak configuration settings and timings to resolve chronic blocked requests. On Thu, Sep 7, 2017 at 1:32 PM David Turner wrote: > `ceph health detail` will give a little more information into the blocked > requests. Specifically which OSDs are the re

Re: [ceph-users] Blocked requests

2017-09-07 Thread David Turner
`ceph health detail` will give a little more information into the blocked requests. Specifically which OSDs are the requests blocked on and how long have they actually been blocked (as opposed to '> 32 sec'). I usually find a pattern after watching that for a time and narrow things down to an OSD

[ceph-users] Blocked requests

2017-09-07 Thread Matthew Stroud
After updating from 10.2.7 to 10.2.9 I have a bunch of blocked requests for ‘currently waiting for missing object’. I have tried bouncing the osds and rebooting the osd nodes, but that just moves the problems around. Previous to this upgrade we had no issues. Any ideas of what to look at? Thank

Re: [ceph-users] Blocked requests problem

2017-08-23 Thread Ramazan Terzi
Finally problem solved. First, I set noscrub, nodeep-scrub, norebalance, nobackfill, norecover, noup and nodown flags. Then I restarted the OSD which has problem. When OSD daemon started, blocked requests increased (up to 100) and some misplaced PGs appeared. Then I unset flags in order to noup,

Re: [ceph-users] Blocked requests problem

2017-08-23 Thread Manuel Lausch
Hi, Sometimes we have the same issue on our 10.2.9 Cluster. (24 Nodes á 60 OSDs) I think there is some racecondition or something like that which results in this state. The blocking requests starts exactly at the time the PG begins to scrub. you can try the following. The OSD will automaticaly

Re: [ceph-users] Blocked requests problem

2017-08-22 Thread Ranjan Ghosh
Hm. That's quite weird. On our cluster, when I set "noscrub", "nodeep-scrub", scrubbing will always stop pretty quickly (a few minutes). I wonder why this doesnt happen on your cluster. When exactly did you set the flag? Perhaps it just needs some more time... Or there might be a disk problem w

Re: [ceph-users] Blocked requests problem

2017-08-22 Thread Ramazan Terzi
Hi Ranjan, Thanks for your reply. I did set scrub and nodeep-scrub flags. But active scrubbing operation can’t working properly. Scrubbing operation always in same pg (20.1e). $ ceph pg dump | grep scrub dumped all in format plain pg_stat objects mip degrmispunf bytes log

Re: [ceph-users] Blocked requests problem

2017-08-22 Thread Ranjan Ghosh
Hi Ramazan, I'm no Ceph expert, but what I can say from my experience using Ceph is: 1) During "Scrubbing", Ceph can be extremely slow. This is probably where your "blocked requests" are coming from. BTW: Perhaps you can even find out which processes are currently blocking with: ps aux | grep

[ceph-users] Blocked requests problem

2017-08-22 Thread Ramazan Terzi
Hello, I have a Ceph Cluster with specifications below: 3 x Monitor node 6 x Storage Node (6 disk per Storage Node, 6TB SATA Disks, all disks have SSD journals) Distributed public and private networks. All NICs are 10Gbit/s osd pool default size = 3 osd pool default min size = 2 Ceph version is

Re: [ceph-users] Blocked requests after "osd in"

2015-12-11 Thread Christian Kauhaus
Am 10.12.2015 um 06:38 schrieb Robert LeBlanc: > Since I'm very interested in > reducing this problem, I'm willing to try and submit a fix after I'm > done with the new OP queue I'm working on. I don't know the best > course of action at the moment, but I hope I can get some input for > when I do t

Re: [ceph-users] Blocked requests after "osd in"

2015-12-10 Thread Jan Schermer
Just try to give the booting OSD and all MONs the resources they ask for (CPU, memory). Yes, it causes disruption but only for a select group of clients, and only for a moment (<20s with my extremely high number of PGs). From a service provider perspective this might break SLAs, but until you get

Re: [ceph-users] Blocked requests after "osd in"

2015-12-10 Thread Christian Kauhaus
Am 10.12.2015 um 06:38 schrieb Robert LeBlanc: > I noticed this a while back and did some tracing. As soon as the PGs > are read in by the OSD (very limited amount of housekeeping done), the > OSD is set to the "in" state so that peering with other OSDs can > happen and the recovery process can beg

Re: [ceph-users] Blocked requests after "osd in"

2015-12-09 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 I noticed this a while back and did some tracing. As soon as the PGs are read in by the OSD (very limited amount of housekeeping done), the OSD is set to the "in" state so that peering with other OSDs can happen and the recovery process can begin. Th

Re: [ceph-users] Blocked requests after "osd in"

2015-12-09 Thread Christian Kauhaus
Am 09.12.2015 um 11:21 schrieb Jan Schermer: > Are you seeing "peering" PGs when the blocked requests are happening? That's > what we see regularly when starting OSDs. Mostly "peering" and "activating". > I'm not sure this can be solved completely (and whether there are major > improvements in

Re: [ceph-users] Blocked requests after "osd in"

2015-12-09 Thread Jan Schermer
Are you seeing "peering" PGs when the blocked requests are happening? That's what we see regularly when starting OSDs. I'm not sure this can be solved completely (and whether there are major improvements in newer Ceph versions), but it can be sped up by 1) making sure you have free (and not dirt

[ceph-users] Blocked requests after "osd in"

2015-12-09 Thread Christian Kauhaus
Hi, I'm getting blocked requests (>30s) every time when an OSD is set to "in" in our clusters. Once this has happened, backfills run smoothly. I have currently no idea where to start debugging. Has anyone a hint what to examine first in order to narrow this issue? TIA Christian -- Dipl-Inf. C

Re: [ceph-users] Blocked requests/ops?

2015-05-28 Thread Christian Balzer
Hello, On Thu, 28 May 2015 12:05:03 +0200 Xavier Serrano wrote: > On Thu May 28 11:22:52 2015, Christian Balzer wrote: > > > > We are testing different scenarios before making our final decision > > > (cache-tiering, journaling, separate pool,...). > > > > > Definitely a good idea to test thing

Re: [ceph-users] Blocked requests/ops?

2015-05-28 Thread Xavier Serrano
On Thu May 28 11:22:52 2015, Christian Balzer wrote: > > We are testing different scenarios before making our final decision > > (cache-tiering, journaling, separate pool,...). > > > Definitely a good idea to test things out and get an idea what Ceph and > your hardware can do. > > From my experi

Re: [ceph-users] Blocked requests/ops?

2015-05-27 Thread Christian Balzer
On Wed, 27 May 2015 15:38:26 +0200 Xavier Serrano wrote: > Hello, > > On Wed May 27 21:20:49 2015, Christian Balzer wrote: > > > > > Hello, > > > > On Wed, 27 May 2015 12:54:04 +0200 Xavier Serrano wrote: > > > > > Hello, > > > > > > Slow requests, blocked requests and blocked ops occur quit

Re: [ceph-users] Blocked requests/ops?

2015-05-27 Thread Xavier Serrano
Hello, On Wed May 27 21:20:49 2015, Christian Balzer wrote: > > Hello, > > On Wed, 27 May 2015 12:54:04 +0200 Xavier Serrano wrote: > > > Hello, > > > > Slow requests, blocked requests and blocked ops occur quite often > > in our cluster; too often, I'd say: several times during one day. > >

Re: [ceph-users] Blocked requests/ops?

2015-05-27 Thread Christian Balzer
Hello, On Wed, 27 May 2015 12:54:04 +0200 Xavier Serrano wrote: > Hello, > > Slow requests, blocked requests and blocked ops occur quite often > in our cluster; too often, I'd say: several times during one day. > I must say we are running some tests, but we are far from pushing > the cluster to

Re: [ceph-users] Blocked requests/ops?

2015-05-27 Thread Xavier Serrano
Hello, Slow requests, blocked requests and blocked ops occur quite often in our cluster; too often, I'd say: several times during one day. I must say we are running some tests, but we are far from pushing the cluster to the limit (or at least, that's what I believe). Every time a blocked request/

Re: [ceph-users] Blocked requests/ops?

2015-05-26 Thread Christian Balzer
Hello, On Tue, 26 May 2015 10:00:13 -0600 Robert LeBlanc wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > I've seen I/O become stuck after we have done network torture tests. > It seems that after so many retries that the OSD peering just gives up > and doesn't retry any more. An

Re: [ceph-users] Blocked requests/ops?

2015-05-26 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 I've seen I/O become stuck after we have done network torture tests. It seems that after so many retries that the OSD peering just gives up and doesn't retry any more. An OSD restart kicks off another round of retries and the I/O completes. It seems

Re: [ceph-users] Blocked requests/ops?

2015-05-26 Thread Xavier Serrano
Hello, Thanks for your detailed explanation, and for the pointer to the "Unexplainable slow request" thread. After investigating osd logs, disk SMART status, etc., the disk under osd.71 seems OK, so we restarted the osd... And voilà, problem seems to be solved! (or at least, the "slow request" me

Re: [ceph-users] Blocked requests/ops?

2015-05-25 Thread Christian Balzer
Hello, Firstly, find my "Unexplainable slow request" thread in the ML archives and read all of it. On Tue, 26 May 2015 07:05:36 +0200 Xavier Serrano wrote: > Hello, > > We have observed that our cluster is often moving back and forth > from HEALTH_OK to HEALTH_WARN states due to "blocked reque

[ceph-users] Blocked requests/ops?

2015-05-25 Thread Xavier Serrano
Hello, We have observed that our cluster is often moving back and forth from HEALTH_OK to HEALTH_WARN states due to "blocked requests". We have also observed "blocked ops". For instance: # ceph status cluster 905a1185-b4f0-4664-b881-f0ad2d8be964 health HEALTH_WARN 1 requests

Re: [ceph-users] blocked requests question

2014-08-03 Thread Christian Balzer
Hello, On Mon, 4 Aug 2014 11:03:37 +0800 飞 wrote: > hello, I have running a ceph cluster(RBD) on production environment to > host 200 VMs, Under normal circumstances, ceph's performance is quite > good. but when I delete a snapshot or image, ceph cluster will be > appear ‍a lot of blocked request

[ceph-users] blocked requests question

2014-08-03 Thread
hello, I have running a ceph cluster(RBD) on production environment to host 200 VMs, Under normal circumstances, ceph's performance is quite good. but when I delete a snapshot or image, ceph cluster will be appear ‍a lot of blocked requests(generally morn than 1000‍), then , the whole cluster hav

[ceph-users] blocked requests question

2014-07-23 Thread
hello, I have running a ceph cluster(RBD) on production environment to host 200 VMs, Under normal circumstances, ceph's performance is quite good. but when I delete a snapshot or image, ceph cluster will be appear ‍a lot of blocked requests(generally morn than 1000‍), then , the whole cluster hav

Re: [ceph-users] Blocked requests during and after CephFS delete

2013-12-09 Thread Gregory Farnum
[ Re-added the list since I don't have log files. ;) ] On Mon, Dec 9, 2013 at 5:52 AM, Oliver Schulz wrote: > Hi Greg, > > I'll send this privately, maybe better not to post log-files, etc. > to the list. :-) > > >> Nobody's reported it before, but I think the CephFS MDS is sending out >> too man

Re: [ceph-users] Blocked requests during and after CephFS delete

2013-12-08 Thread Gregory Farnum
On Sun, Dec 8, 2013 at 7:16 AM, Oliver Schulz wrote: > Hello Ceph-Gurus, > > a short while ago I reported some trouble we had with our cluster > suddenly going into a state of "blocked requests". > > We did a few tests, and we can reproduce the problem: > During / after deleting of a substantial c

[ceph-users] Blocked requests during and after CephFS delete

2013-12-08 Thread Oliver Schulz
Hello Ceph-Gurus, a short while ago I reported some trouble we had with our cluster suddenly going into a state of "blocked requests". We did a few tests, and we can reproduce the problem: During / after deleting of a substantial chunk of data on CephFS (a few TB), ceph health shows blocked requ