Re: [ceph-users] New eu.ceph.com mirror machine

2015-03-09 Thread HEWLETT, Paul (Paul)** CTR **
Hi Wildo If I disable the epel repo then the error changes: [root@ninja ~]# yum install --disablerepo=epel ceph Loaded plugins: langpacks, priorities, product-id, subscription-manager 10 packages excluded due to repository priority protections Resolving Dependencies . -- Finished Dependency

Re: [ceph-users] New eu.ceph.com mirror machine

2015-03-09 Thread HEWLETT, Paul (Paul)** CTR **
Hi Wildo Has something broken with this move? The following has worked for me repeatedly over the last 2 months: This a.m. I tried to install ceph using the following repo file: [root@citrus ~]# cat /etc/yum.repos.d/ceph.repo [ceph] name=Ceph packages for $basearch

Re: [ceph-users] New eu.ceph.com mirror machine

2015-03-09 Thread Wido den Hollander
On 03/09/2015 12:54 PM, HEWLETT, Paul (Paul)** CTR ** wrote: Hi Wildo Has something broken with this move? The following has worked for me repeatedly over the last 2 months: It shouldn't have broken anything, but you never know. The machine rsyncs the data from ceph.com directly. The

[ceph-users] Stuck PGs blocked_by non-existent OSDs

2015-03-09 Thread joel.merr...@gmail.com
Hi, I'm trying to fix an issue within 0.93 on our internal cloud related to incomplete pg's (yes, I realise the folly of having the dev release - it's a not-so-test env now, so I need to recover this really). I'll detail the current outage info; 72 initial (now 65) OSDs 6 nodes * Update to 0.92

Re: [ceph-users] Ceph BIG outage : 200+ OSD are down , OSD cannot create thread

2015-03-09 Thread Mohamed Pakkeer
Hi Karan, We faced same issue and resolved after increasing the open file limit and maximum no of threads Config reference /etc/security/limit.conf root hard nofile 65535 sysctl -w kernel.pid_max=4194303 http://tracker.ceph.com/issues/10554#change-47024 Cheers Mohamed Pakkeer On Mon, Mar

Re: [ceph-users] New eu.ceph.com mirror machine

2015-03-09 Thread Wido den Hollander
On 03/09/2015 02:47 PM, HEWLETT, Paul (Paul)** CTR ** wrote: Hi Wildo If I disable the epel repo then the error changes: [root@ninja ~]# yum install --disablerepo=epel ceph Loaded plugins: langpacks, priorities, product-id, subscription-manager 10 packages excluded due to repository

[ceph-users] Stuck PGs blocked_by non-existent OSDs

2015-03-09 Thread joel.merr...@gmail.com
Hi, I'm trying to fix an issue within 0.93 on our internal cloud related to incomplete pg's (yes, I realise the folly of having the dev release - it's a not-so-test env now, so I need to recover this really). I'll detail the current outage info; 72 initial (now 65) OSDs 6 nodes * Update to 0.92

Re: [ceph-users] New eu.ceph.com mirror machine

2015-03-09 Thread HEWLETT, Paul (Paul)** CTR **
When did you make the change? It worked on Friday albeit with these extra lines in ceph.repo: [Ceph-el7] name=Ceph-el7 baseurl=http://eu.ceph.com/rpms/rhel7/noarch/ enabled=1 gpgcheck=0 which I removed when I discovered this no longer existed. Regards Paul Hewlett Senior Systems Engineer

Re: [ceph-users] New eu.ceph.com mirror machine

2015-03-09 Thread Wido den Hollander
On 03/09/2015 02:27 PM, HEWLETT, Paul (Paul)** CTR ** wrote: When did you make the change? Yesterday It worked on Friday albeit with these extra lines in ceph.repo: [Ceph-el7] name=Ceph-el7 baseurl=http://eu.ceph.com/rpms/rhel7/noarch/ enabled=1 gpgcheck=0 which I removed when I

[ceph-users] New eu.ceph.com mirror machine

2015-03-09 Thread Wido den Hollander
Hi, Since the recent reports of rsync failing on eu.ceph.com I moved eu.ceph.com to a new machine. It went from physical to a KVM VM backed by RBD, so it's now running on Ceph. URLs or rsync paths haven't changed, it's still eu.ceph.com and available over IPv4 and IPv6. This Virtual Machine is

Re: [ceph-users] Ceph BIG outage : 200+ OSD are down , OSD cannot create thread

2015-03-09 Thread Christian Eichelmann
Hi Karan, as you are actually writing in your own book, the problem is the sysctl setting kernel.pid_max. I've seen in your bug report that you were setting it to 65536, which is still to low for high density hardware. In our cluster, one OSD server has in an idle situation about 66.000 Threads

[ceph-users] Disk serial number from OSD

2015-03-09 Thread Nick Fisk
Hi All, I just created this little bash script to retrieve the /dev/disk/by-id string for each OSD on a host. Our disks are internally mounted so have no concept of drive bays, this should make it easier to work out what disk has failed. #!/bin/bash DISKS=`ceph-disk list | grep ceph data`

Re: [ceph-users] New eu.ceph.com mirror machine

2015-03-09 Thread HEWLETT, Paul (Paul)** CTR **
Hi Wildo It seems that your move coincided with yet another change in the EPEL repo. For anyone who is interested, I fixed this by: 1. ensuring that check_obsoletes=1 is in /etc/yum/pluginconf.d/priorities.conf 2. Install libunwind explicitly: yum install libunwind

Re: [ceph-users] Ceph BIG outage : 200+ OSD are down , OSD cannot create thread

2015-03-09 Thread Karan Singh
Thanks Guys kernel.pid_max=4194303 did the trick. - Karan - On 09 Mar 2015, at 14:48, Christian Eichelmann christian.eichelm...@1und1.de wrote: Hi Karan, as you are actually writing in your own book, the problem is the sysctl setting kernel.pid_max. I've seen in your bug report that

Re: [ceph-users] Stuck PGs blocked_by non-existent OSDs

2015-03-09 Thread Samuel Just
You'll probably have to recreate osds with the same ids (empty ones), let them boot, stop them, and mark them lost. There is a feature in the tracker to improve this behavior: http://tracker.ceph.com/issues/10976 -Sam On Mon, 2015-03-09 at 12:24 +, joel.merr...@gmail.com wrote: Hi, I'm

Re: [ceph-users] S3 RadosGW - Create bucket OP

2015-03-09 Thread Yehuda Sadeh-Weinraub
- Original Message - From: Steffen Winther ceph.u...@siimnet.dk To: ceph-users@lists.ceph.com Sent: Monday, March 9, 2015 12:43:58 AM Subject: Re: [ceph-users] S3 RadosGW - Create bucket OP Steffen W Sørensen stefws@... writes: Response: HTTP/1.1 200 OK Date: Fri, 06 Mar

Re: [ceph-users] Ceph BIG outage : 200+ OSD are down , OSD cannot create thread

2015-03-09 Thread Nicheal
Umm.. Too many Threads are created in SimpleMessenger, every pipe should create two working threads for sending and receiving messages. Thus, AsyncMessenger would be promissing but still in development. Regards Ning Yao 2015-03-09 20:48 GMT+08:00 Christian Eichelmann

Re: [ceph-users] tgt and krbd

2015-03-09 Thread Nick Fisk
Hi Mike, I was using bs_aio with the krbd and still saw a small caching effect. I'm not sure if it was on the ESXi or tgt/krbd page cache side, but I was definitely seeing the IO's being coalesced into larger ones on the krbd device in iostat. Either way, it would make me potentially nervous to

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-03-09 Thread mad Engineer
Thank you Nick for explaining the problem with 4k writes.Queue depth used in this setup is 256 the maximum supported. Can you clarify that adding more nodes will not increase iops.In general how will we increase iops of a ceph cluster. Thanks for your help On Sat, Mar 7, 2015 at 5:57 PM, Nick

[ceph-users] [ANN] ceph-deploy 1.5.22 released

2015-03-09 Thread Travis Rhoden
Hi All, This is a new release of ceph-deploy that changes a couple of behaviors. On RPM-based distros, ceph-deploy will now automatically enable check_obsoletes in the Yum priorities plugin. This resolves an issue many community members hit where package dependency resolution was breaking due to

Re: [ceph-users] S3 RadosGW - Create bucket OP

2015-03-09 Thread Yehuda Sadeh-Weinraub
- Original Message - From: Steffen Winther ceph.u...@siimnet.dk To: ceph-users@lists.ceph.com Sent: Monday, March 9, 2015 1:25:43 PM Subject: Re: [ceph-users] S3 RadosGW - Create bucket OP Yehuda Sadeh-Weinraub yehuda@... writes: If you're using apache, then it filters out

Re: [ceph-users] Ceph BIG outage : 200+ OSD are down , OSD cannot create thread

2015-03-09 Thread Tony Harris
I know I'm not even close to this type of a problem yet with my small cluster (both test and production clusters) - but it would be great if something like that could appear in the cluster HEALTHWARN, if Ceph could determine the amount of used processes and compare them against the current limit

Re: [ceph-users] EC Pool and Cache Tier Tuning

2015-03-09 Thread Nick Fisk
Either option #1 or #2 depending on if your data has hot spots or you need to use EC pools. I'm finding that the cache tier can actually slow stuff down depending on how much data is in the cache tier vs on the slower tier. Writes will be about the same speed for both solutions, reads will be a

Re: [ceph-users] Ceph repo - RSYNC?

2015-03-09 Thread Jesus Chavez (jeschave)
Hi David also for the Calamari or gui monitoring interface is there any way to get user account and passwd of inktank since the repo to install Calamari seems to be only for people inside of inktank Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.commailto:jesch...@cisco.com Phone: +52

Re: [ceph-users] EC Pool and Cache Tier Tuning

2015-03-09 Thread Steffen Winther
Nick Fisk nick@... writes: My Ceph cluster comprises of 4 Nodes each with the following:- 10x 3TB WD Red Pro disks - EC pool k=3 m=3 (7200rpm) 2x S3700 100GB SSD's (20k Write IOPs) for HDD Journals 1x S3700 400GB SSD (35k Write IOPs) for cache tier - 3x replica If I have following 4x node

Re: [ceph-users] Ceph BIG outage : 200+ OSD are down , OSD cannot create thread

2015-03-09 Thread Udo Lembke
Hi Tony, sounds like an good idea! Udo On 09.03.2015 21:55, Tony Harris wrote: I know I'm not even close to this type of a problem yet with my small cluster (both test and production clusters) - but it would be great if something like that could appear in the cluster HEALTHWARN, if Ceph could

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-03-09 Thread Nick Fisk
Can you run the Fio test again but with a queue depth of 32. This will probably show what your cluster is capable of. Adding more nodes with SSD's will probably help scale, but only at higher io depths. At low queue depths you are probably already at the limit as per my earlier email. From:

Re: [ceph-users] qemu-kvm and cloned rbd image

2015-03-09 Thread koukou73gr
On 03/05/2015 07:19 PM, Josh Durgin wrote: client.libvirt key: caps: [mon] allow r caps: [osd] allow class-read object_prefix rbd_children, allow rw class-read pool=rbd This includes everything except class-write on the pool you're using. You'll need that so that a copy_up

Re: [ceph-users] Prioritize Heartbeat packets

2015-03-09 Thread Robert LeBlanc
I've found commit 9b9a682fe035c985e416ee1c112fa58f9045a27c and I see that when 'osd heartbeat use min delay socket = true' it will mark the packet with DSCP CS6. Based on the setting of the socket in msg/simple/Pipe.cc is it possible that this can apply to both OSD and monitor? I don't understand

Re: [ceph-users] Ceph BIG outage : 200+ OSD are down , OSD cannot create thread

2015-03-09 Thread Sage Weil
On Mon, 9 Mar 2015, Karan Singh wrote: Thanks Guys kernel.pid_max=4194303 did the trick. Great to hear! Sorry we missed that you only had it at 65536. This is a really common problem that people hit when their clusters start to grow. Is there somewhere in the docs we can put this to catch

Re: [ceph-users] S3 RadosGW - Create bucket OP

2015-03-09 Thread Steffen Winther
Yehuda Sadeh-Weinraub yehuda@... writes: If you're using apache, then it filters out zero Content-Length. Nothing much radosgw can do about it. You can try using the radosgw civetweb frontend, see if it changes anything. Thanks, only no difference... Req: PUT /mssCl/ HTTP/1.1 Host:

Re: [ceph-users] Tr : RadosGW - Bucket link and ACLs

2015-03-09 Thread Italo Santos
Yeah, I was thinking about that and will be the alternative for me too... Regards. Italo Santos http://italosantos.com.br/ On Friday, March 6, 2015 at 18:20, ghislain.cheval...@orange.com wrote: Message d'origine De : CHEVALIER Ghislain IMT/OLPS

Re: [ceph-users] RadosGW - Create bucket via admin API

2015-03-09 Thread Italo Santos
Hello Georgios, I thought which had some admin alternative to do that, but I realised don’t have once the bucket belongs to a specify user. So the alternative is, after create the user authenticate with created credentials to create the bucket. Thanks Regards. Italo Santos

Re: [ceph-users] Stuck PGs blocked_by non-existent OSDs

2015-03-09 Thread joel.merr...@gmail.com
On Mon, Mar 9, 2015 at 2:28 PM, Samuel Just sj...@redhat.com wrote: You'll probably have to recreate osds with the same ids (empty ones), let them boot, stop them, and mark them lost. There is a feature in the tracker to improve this behavior: http://tracker.ceph.com/issues/10976 -Sam Thanks

[ceph-users] Ceph BIG outage : 200+ OSD are down , OSD cannot create thread

2015-03-09 Thread Karan Singh
Hello Community need help to fix a long going Ceph problem. Cluster is unhealthy , Multiple OSDs are DOWN. When i am trying to restart OSD’s i am getting this error 2015-03-09 12:22:16.312774 7f760dac9700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7f760dac9700 time

Re: [ceph-users] ceph mds zombie

2015-03-09 Thread Francois Lafont
Hi, On 09/03/2015 04:06, kenmasida wrote : I have resolved the problem,thank you very much。 When I use ceph-fuse to mount the client,it work well. Good news but can you give the kernel version of your client cephfs OS? Like you, I had one problem with cephfs in the client side and it come

Re: [ceph-users] Ceph BIG outage : 200+ OSD are down , OSD cannot create thread

2015-03-09 Thread Azad Aliyar
*Check Max Threadcount:* If you have a node with a lot of OSDs, you may be hitting the default maximum number of threads (e.g., usually 32k), especially during recovery. You can increase the number of threads using sysctl to see if increasing the maximum number of threads to the maximum possible

Re: [ceph-users] Ceph BIG outage : 200+ OSD are down , OSD cannot create thread

2015-03-09 Thread Azad Aliyar
Great Karan. On Mon, Mar 9, 2015 at 9:32 PM, Karan Singh karan.si...@csc.fi wrote: Thanks Guys kernel.pid_max=4194303 did the trick. - Karan - On 09 Mar 2015, at 14:48, Christian Eichelmann christian.eichelm...@1und1.de wrote: Hi Karan, as you are actually writing in your own book,

[ceph-users] how to improve seek time using hammer-test release

2015-03-09 Thread kevin parrikar
hello All, I just setup single node ceph with no replication to familiarize with ceph. using 2 intel S3500 SSD 800 Gb and 8Gb RAM and 16 core CPU. Os is ubuntu 14.04 64 bit ,kbd is loaded (modprobe kbd) When running bonniee++ against /dev/rbd0 it shows a seekrate of 892.2/s. How

Re: [ceph-users] Ceph BIG outage : 200+ OSD are down , OSD cannot create thread

2015-03-09 Thread Nicheal
2015-03-10 3:01 GMT+08:00 Sage Weil s...@newdream.net: On Mon, 9 Mar 2015, Karan Singh wrote: Thanks Guys kernel.pid_max=4194303 did the trick. Great to hear! Sorry we missed that you only had it at 65536. This is a really common problem that people hit when their clusters start to grow.

Re: [ceph-users] Prioritize Heartbeat packets

2015-03-09 Thread Robert LeBlanc
Jian, Thanks for the clarification. I'll mark traffic destined for the monitors as well. We are getting ready to put our first cluster into production. If you are interested we will be testing the heartbeat priority to see if we can saturate the network (not an easy task for 40 Gb) and keep the

Re: [ceph-users] Prioritize Heartbeat packets

2015-03-09 Thread Jian Wen
Only OSD calls set_socket_priority(). See https://github.com/ceph/ceph/pull/3353 On Tue, Mar 10, 2015 at 3:36 AM, Robert LeBlanc rob...@leblancnet.us wrote: I've found commit 9b9a682fe035c985e416ee1c112fa58f9045a27c and I see that when 'osd heartbeat use min delay socket = true' it will mark

[ceph-users] rados import error: short write

2015-03-09 Thread Leslie Teo
we use `rados export poolA /opt/zs.rgw-buckets` export ceph cluster pool named poolA into localdir /opt/ .and import the directroy /opt/zs.rgw-buckets into another ceph cluster pool named hello , and following the error :shell rados import /opt/zs.rgw-buckets hello --create[ERROR]

Re: [ceph-users] S3 RadosGW - Create bucket OP

2015-03-09 Thread Steffen Winther
Steffen W Sørensen stefws@... writes: Response: HTTP/1.1 200 OK Date: Fri, 06 Mar 2015 10:41:14 GMT Server: Apache/2.2.22 (Fedora) Connection: close Transfer-Encoding: chunked Content-Type: application/xml This response makes the App say: S3.createBucket, class S3, code

[ceph-users] Ceph node operating system high availability and osd restoration best practices.

2015-03-09 Thread Vivek Varghese Cherian
Hi, I have a 4 node ceph cluster and the operating system used on the nodes is Ubuntu 14.04. The ceph cluster currently has 12 osds spread across the 4 nodes. Currently one of the nodes has been restored after an operating system file system corruption which basically made the node and the osds