Re: [ceph-users] can a OSD affect performance from pool X when blocking/slow requests PGs from pool Y ?

2017-03-23 Thread Alejandro Comisario
Deffinitelly in our case OSD were not the guilty ones, since all osd that where blocking requests allways from the same pool, worked flawlesly (and still do) after we deleted the pool where we always saw the blocked PG's. Since the pool was accesed by just one client, and had almost no ops to it,

Re: [ceph-users] can a OSD affect performance from pool X when blocking/slow requests PGs from pool Y ?

2017-03-23 Thread Peter Maloney
I think Greg (who appears to be a ceph committer) basically said he was interested in looking at it, if only you had the pool that failed this way. Why not try to reproduce it, and make a log of your procedure so he can reproduce it too? What caused the slow requests... copy on write from snapshot

Re: [ceph-users] can a OSD affect performance from pool X when blocking/slow requests PGs from pool Y ?

2017-03-22 Thread Alejandro Comisario
any thoughts ? On Tue, Mar 14, 2017 at 10:22 PM, Alejandro Comisario wrote: > Greg, thanks for the reply. > True that i cant provide enough information to know what happened since > the pool is gone. > > But based on your experience, can i please take some of your time, and > give me the TOP 5 f

Re: [ceph-users] can a OSD affect performance from pool X when blocking/slow requests PGs from pool Y ?

2017-03-14 Thread Alejandro Comisario
Greg, thanks for the reply. True that i cant provide enough information to know what happened since the pool is gone. But based on your experience, can i please take some of your time, and give me the TOP 5 fo what could happen / would be the reason to happen what hapened to that pool (or any pool

Re: [ceph-users] can a OSD affect performance from pool X when blocking/slow requests PGs from pool Y ?

2017-03-10 Thread Gregory Farnum
On Tue, Mar 7, 2017 at 10:18 AM Alejandro Comisario wrote: > Gregory, thanks for the response, what you've said is by far, the most > enlightneen thing i know about ceph in a long time. > > What brings even greater doubt, which is, this "non-functional" pool, was > only 1.5GB large, vs 50-150GB o

Re: [ceph-users] can a OSD affect performance from pool X when blocking/slow requests PGs from pool Y ?

2017-03-10 Thread Alejandro Comisario
Any thoughts ? On Tue, Mar 7, 2017 at 3:17 PM, Alejandro Comisario wrote: > Gregory, thanks for the response, what you've said is by far, the most > enlightneen thing i know about ceph in a long time. > > What brings even greater doubt, which is, this "non-functional" pool, was > only 1.5GB larg

Re: [ceph-users] can a OSD affect performance from pool X when blocking/slow requests PGs from pool Y ?

2017-03-07 Thread Alejandro Comisario
Gregory, thanks for the response, what you've said is by far, the most enlightneen thing i know about ceph in a long time. What brings even greater doubt, which is, this "non-functional" pool, was only 1.5GB large, vs 50-150GB on the other effected pools, the tiny pool was still being used, and ju

Re: [ceph-users] can a OSD affect performance from pool X when blocking/slow requests PGs from pool Y ?

2017-03-07 Thread Gregory Farnum
Some facts: The OSDs use a lot of gossip protocols to distribute information. The OSDs limit how many client messages they let in to the system at a time. The OSDs do not distinguish between client ops for different pools (the blocking happens before they have any idea what the target is). So, yes

[ceph-users] can a OSD affect performance from pool X when blocking/slow requests PGs from pool Y ?

2017-03-06 Thread Alejandro Comisario
Hi, we have a 7 nodes ubuntu ceph hammer pool (78 OSD to be exact). This weekend we'be experienced a huge outage from our customers vms (located on pool CUSTOMERS, replica size 3 ) when lots of OSD's started to slow request/block PG's on pool PRIVATE ( replica size 1 ) basically all PG's blocked wh

[ceph-users] can a OSD affect performance from pool X when blocking/slow requests PGs from pool Y ?

2017-03-05 Thread Alejandro Comisario
Hi, we have a 7 nodes ubuntu ceph hammer pool (78 OSD to be exact). This weekend we'be experienced a huge outage from our customers vms (located on pool CUSTOMERS, replica size 3 ) when lots of OSD's started to slow request/block PG's on pool PRIVATE ( replica size 1 ) basically all PG's blocked wh