Re: [ceph-users] Replication vs Erasure Coding with only 2 elementsinthe failure-domain.

2017-03-07 Thread Burkhard Linke
Hi, On 03/07/2017 05:53 PM, Francois Blondel wrote: Hi all, We have (only) 2 separate "rooms" (crush bucket) and would like to build a cluster being able to handle the complete loss of one room. *snipsnap* Second idea would be to use Erasure Coding, as it fits our performance require

Re: [ceph-users] PG active+remapped even I have three hosts

2017-03-07 Thread Stefan Lissmats
Hello! To me it looks like you have one osd on host Ceph-Stress-02 and therefore only a weight of 1 on that host and 7 on the other. If you want of three replicas on only three hosts you need about the same storage space on all hosts. On Wed, Mar 8, 2017 at 4:50 AM +0100, "TYLin" mailto:woo

[ceph-users] PG active+remapped even I have three hosts

2017-03-07 Thread TYLin
Hi all, We got 4 PG active+remapped in our cluster. We set the pool’s ruleset to ruleset 0 and got HEALTH_OK. After we set the ruleset to ruleset 1, 4 pg is active+remapped. The testing result from crushtool also shows some bad mapping exists. Anyone happened to know the reason? pool 0 'rbd

[ceph-users] MDS assert failed when shutting down

2017-03-07 Thread Xusangdi
Hi Cephers, We occasionally meet an assertion failure when trying to shutdown an MDS, as followings: -14> 2017-01-22 14:13:46.833804 7fd210c58700 2 -- 192.168.36.11:6801/2188363 >> 192.168.36.48:6800/42546 pipe(0x558ff3803400 sd=17 :52412 s=4 pgs=227 cs=1 l=1 c=0x558ff3758900).fault (0) Su

Re: [ceph-users] hammer to jewel upgrade experiences? cache tier experience?

2017-03-07 Thread Christian Balzer
[re-adding ML, so others may benefit] On Tue, 7 Mar 2017 13:14:14 -0700 Mike Lovell wrote: > On Mon, Mar 6, 2017 at 8:18 PM, Christian Balzer wrote: > > > On Mon, 6 Mar 2017 19:57:11 -0700 Mike Lovell wrote: > > > > > has anyone on the list done an upgrade from hammer (something later than >

Re: [ceph-users] MySQL and ceph volumes

2017-03-07 Thread Christian Balzer
Hello, as Adrian pointed out, this is not really Ceph specific. That being said, there are literally dozen of threads in this ML about this issue and speeding up things in general, use your google-foo. In particular Nick Fisk's articles are a good source for understanding what is happening and h

Re: [ceph-users] MySQL and ceph volumes

2017-03-07 Thread Adrian Saul
The problem is not so much ceph, but the fact that sync workloads tend to mean you have an effective queue depth of 1 because it serialises the IO from the application, as it waits for the last write to complete before issuing the next one. From: Matteo Dacrema [mailto:mdacr...@enter.eu] Sent

Re: [ceph-users] MySQL and ceph volumes

2017-03-07 Thread Matteo Dacrema
Thank you Adrian! I’ve forgot this option and I can reproduce the problem. Now, what could be the problem on ceph side with O_DSYNC writes? Regards Matteo This email and any files transmitted with it are confidential and intended solely for the use

Re: [ceph-users] MySQL and ceph volumes

2017-03-07 Thread Adrian Saul
Possibly MySQL is doing sync writes, where as your FIO could be doing buffered writes. Try enabling the sync option on fio and compare results. > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Matteo Dacrema > Sent: Wednesday, 8 March 20

Re: [ceph-users] MySQL and ceph volumes

2017-03-07 Thread Deepak Naidu
I hope you did 1 minute interval of iostat. Based on your iostat & disk info. · avgrq-sz is showing 750.49 & avgqu-sz is showing 17.39 · 375.245 KB is your average block size. · That said, your disk is showing a Quee of 17.39 length. Typically higher Q length will inc

Re: [ceph-users] Snapshot Costs (Was: Re: Pool Sizes)

2017-03-07 Thread Kent Borg
On 03/07/2017 04:35 PM, Gregory Farnum wrote: Creating a snapshot generally involves a round-trip to the monitor, which requires a new OSDMap epoch (although it can coalesce) — ie, the monitor paxos commit and processing the new map on all the OSDs/PGs. Destroying a snapshot involves adding the s

Re: [ceph-users] Snapshot Costs (Was: Re: Pool Sizes)

2017-03-07 Thread Gregory Farnum
On Tue, Mar 7, 2017 at 12:43 PM, Kent Borg wrote: > On 01/04/2017 03:41 PM, Brian Andrus wrote: >> >> Think "many objects, few pools". The number of pools do not scale well >> because of PG limitations. Keep a small number of pools with the proper >> number of PGs. > > > I finally got it through m

Re: [ceph-users] replica questions

2017-03-07 Thread Matteo Dacrema
Hi, thank you all. I’m using Mellanox switches with connectX-3 40 gbit pro NIC. Bond balance-xor with policy layer3+4 It’s a bit expensive but it’s very hard to saturate. I’m using one single nic for both replica and access network. > Il giorno 03 mar 2017, alle ore 14:52, Vy Nguyen Tan >

Re: [ceph-users] MySQL and ceph volumes

2017-03-07 Thread Matteo Dacrema
Hi Deepak, thank you. Here an example of iostat avg-cpu: %user %nice %system %iowait %steal %idle 5.160.002.64 15.740.00 76.45 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-szavgqu-sz

Re: [ceph-users] MySQL and ceph volumes

2017-03-07 Thread Deepak Naidu
My response is without any context to ceph or any SDS, purely how to check the IO bottleneck. You can then determine if its Ceph or any other process or disk. >> MySQL can reach only 150 iops both read or writes showing 30% of IOwait. Lower IOPS is not issue with itself as your block size migh

[ceph-users] MySQL and ceph volumes

2017-03-07 Thread Matteo Dacrema
Hi All, I have a galera cluster running on openstack with data on ceph volumes capped at 1500 iops for read and write ( 3000 total ). I can’t understand why with fio I can reach 1500 iops without IOwait and MySQL can reach only 150 iops both read or writes showing 30% of IOwait. I tried with fi

[ceph-users] Strange read results using FIO inside RBD QEMU VM ...

2017-03-07 Thread Xavier Trilla
Hi, We have a pure SSD based Ceph cluster (+100 OSDs with Enterprise SSDs and IT mode cards) Hammer 0.94.9 over 10G. It's really stable and we are really happy with the performance we are getting. But after a customer ran some tests, we realized about something quite strange. Our user did some

[ceph-users] Snapshot Costs (Was: Re: Pool Sizes)

2017-03-07 Thread Kent Borg
On 01/04/2017 03:41 PM, Brian Andrus wrote: Think "many objects, few pools". The number of pools do not scale well because of PG limitations. Keep a small number of pools with the proper number of PGs. I finally got it through my head, seems the larger answer is: Not only it is okay to have a

Re: [ceph-users] can a OSD affect performance from pool X when blocking/slow requests PGs from pool Y ?

2017-03-07 Thread Alejandro Comisario
Gregory, thanks for the response, what you've said is by far, the most enlightneen thing i know about ceph in a long time. What brings even greater doubt, which is, this "non-functional" pool, was only 1.5GB large, vs 50-150GB on the other effected pools, the tiny pool was still being used, and ju

Re: [ceph-users] purging strays faster

2017-03-07 Thread Patrick Donnelly
Hi Dan, On Tue, Mar 7, 2017 at 11:10 AM, Daniel Davidson wrote: > When I try this command, I still get errors: > > ceph daemon mds.0 config show > admin_socket: exception getting command descriptions: [Errno 2] No such file > or directory > admin_socket: exception getting command descriptions: [E

Re: [ceph-users] can a OSD affect performance from pool X when blocking/slow requests PGs from pool Y ?

2017-03-07 Thread Gregory Farnum
Some facts: The OSDs use a lot of gossip protocols to distribute information. The OSDs limit how many client messages they let in to the system at a time. The OSDs do not distinguish between client ops for different pools (the blocking happens before they have any idea what the target is). So, yes

[ceph-users] Replication vs Erasure Coding with only 2 elements in the failure-domain.

2017-03-07 Thread Francois Blondel
Hi all, We have (only) 2 separate "rooms" (crush bucket) and would like to build a cluster being able to handle the complete loss of one room. First idea would be to use replication: -> As we read the mail thread "2x replication: A BIG warning", we would chose a replication size of 3. -> We

Re: [ceph-users] purging strays faster

2017-03-07 Thread Daniel Davidson
When I try this command, I still get errors: ceph daemon mds.0 config show admin_socket: exception getting command descriptions: [Errno 2] No such file or directory admin_socket: exception getting command descriptions: [Errno 2] No such file or directory I am guessing there is a path set up i

Re: [ceph-users] Much more dentries than inodes, is that normal?

2017-03-07 Thread Xiaoxi Chen
Thanks John. Very likely, note that mds_mem::ino + mds_cache::strays_created ~= mds::inodes, plus the MDS was the active-standby one, and become active days ago due to failover. mds": { "inodes": 1291393, } "mds_cache": { "num_strays": 3559, "strays_created": 706120,

Re: [ceph-users] Much more dentries than inodes, is that normal?

2017-03-07 Thread John Spray
On Tue, Mar 7, 2017 at 9:17 AM, Xiaoxi Chen wrote: > Hi, > > From the admin socket of mds, I got following data on our > production cephfs env, roughly we have 585K inodes and almost same > amount of caps, but we have>2x dentries than inodes. > > I am pretty sure we dont use hard link

Re: [ceph-users] osds crashing during hit_set_trim and hit_set_remove_all

2017-03-07 Thread kefu chai
On Tue, Mar 7, 2017 at 3:30 PM, kefu chai wrote: > On Fri, Mar 3, 2017 at 11:40 PM, Sage Weil wrote: >> On Fri, 3 Mar 2017, Mike Lovell wrote: >>> i started an upgrade process to go from 0.94.7 to 10.2.5 on a production >>> cluster that is using cache tiering. this cluster has 3 monitors, 28 stor

Re: [ceph-users] RBD device on Erasure Coded Pool with kraken and Ubuntu Xenial.

2017-03-07 Thread Francois Blondel
Am Dienstag, den 07.03.2017, 11:14 +0100 schrieb Ilya Dryomov: On Tue, Mar 7, 2017 at 10:27 AM, Francois Blondel mailto:fblon...@intelliad.de>> wrote: Hi all, I have been triyng to use RBD devices on a Erasure Coded data-pool on Ubuntu Xenial. I created my block device "blockec2" with : rbd c

Re: [ceph-users] RBD device on Erasure Coded Pool with kraken and Ubuntu Xenial.

2017-03-07 Thread Ilya Dryomov
On Tue, Mar 7, 2017 at 10:27 AM, Francois Blondel wrote: > Hi all, > > I have been triyng to use RBD devices on a Erasure Coded data-pool on Ubuntu > Xenial. > > I created my block device "blockec2" with : > rbd create blockec2 --size 300G --data-pool ecpool --image-feature > layering,data-pool >

Re: [ceph-users] ceph/hammer - debian7/wheezy repository doesnt work correctly

2017-03-07 Thread linux-ml
they are there, as workaround, download it manually from: http://download.ceph.com/debian-hammer/pool/main/c/ceph/ *0.94.9-1~bpo70+1 http://download.ceph.com/debian-hammer/pool/main/c/curl/ *7.29.0-1~bpo70+1 and install with dpkg -i *.deb what u need best regards rainer On 06/03/17 22:50,

Re: [ceph-users] A Jewel in the rough? (cache tier bugs and documentation omissions)

2017-03-07 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of John > Spray > Sent: 07 March 2017 01:45 > To: Christian Balzer > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] A Jewel in the rough? (cache tier bugs and > documentation omissions

[ceph-users] RBD device on Erasure Coded Pool with kraken and Ubuntu Xenial.

2017-03-07 Thread Francois Blondel
Hi all, I have been triyng to use RBD devices on a Erasure Coded data-pool on Ubuntu Xenial. I created my block device "blockec2" with : rbd create blockec2 --size 300G --data-pool ecpool --image-feature layering,data-pool (same issue with "rbd create blockec2 --size 300G --data-pool ecpool" )

[ceph-users] Much more dentries than inodes, is that normal?

2017-03-07 Thread Xiaoxi Chen
Hi, From the admin socket of mds, I got following data on our production cephfs env, roughly we have 585K inodes and almost same amount of caps, but we have>2x dentries than inodes. I am pretty sure we dont use hard link intensively (if any). And the #ino match with "rados ls --pool $