[ceph-users] Hammer cache behavior

2015-05-18 Thread Brian Rak
We just enabled a small cache pool on one of our clusters (v 0.94.1) and have run into some issues: 1) Cache population appears to happen via the public network (not the cluster network). We're seeing basically no traffic on the cluster network, and multiple gigabits inbound to our cache OSDs

Re: [ceph-users] Fwd: ceph-deploy : Certificate Error using wget on Debian

2015-03-27 Thread Brian Rak
It looks like ceph.com is having some major issues with their git repository right now.. https://ceph.com/git/ gives a 500 error On 3/27/2015 8:11 AM, Vasilis Souleles wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello, I'm trying to create a 4-node Ceph Storage Cluster using ceph-dep

[ceph-users] Ceph repo - RSYNC?

2015-03-05 Thread Brian Rak
Do any of the Ceph repositories run rsync? We generally mirror the repository locally so we don't encounter any unexpected upgrades. eu.ceph.com used to run this, but it seems to be down now. # rsync rsync://eu.ceph.com rsync: failed to connect to eu.ceph.com: Connection refused (111) rsync er

Re: [ceph-users] v0.87.1 Giant released

2015-02-26 Thread Brian Rak
On 2/26/2015 9:46 AM, Sage Weil wrote: This is the first (and possibly final) point release for Giant. Our focus on stability fixes will be directed towards Hammer and Firefly. Is this something that was decided beforehand? Can we tell if a major version is going to be maintained or not, bef

Re: [ceph-users] PG stuck degraded, undersized, unclean

2015-02-18 Thread Brian Rak
On 2/18/2015 3:24 PM, Florian Haas wrote: On Wed, Feb 18, 2015 at 9:09 PM, Brian Rak wrote: What does your crushmap look like (ceph osd getcrushmap -o /tmp/crushmap; crushtool -d /tmp/crushmap)? Does your placement logic prevent Ceph from selecting an OSD for the third replica? Cheers

Re: [ceph-users] PG stuck degraded, undersized, unclean

2015-02-18 Thread Brian Rak
On 2/18/2015 3:01 PM, Florian Haas wrote: On Wed, Feb 18, 2015 at 7:53 PM, Brian Rak wrote: We're running ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578), and seeing this: HEALTH_WARN 1 pgs degraded; 1 pgs stuck degraded; 1 pgs stuck unclean; 1 pgs stuck undersized;

[ceph-users] PG stuck degraded, undersized, unclean

2015-02-18 Thread Brian Rak
We're running ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578), and seeing this: HEALTH_WARN 1 pgs degraded; 1 pgs stuck degraded; 1 pgs stuck unclean; 1 pgs stuck undersized; 1 pgs undersized pg 4.2af is stuck unclean for 77192.522960, current state active+undersized+degraded, las

[ceph-users] osd crush create-or-move doesn't move things?

2015-01-26 Thread Brian Rak
I have an existing cluster where all the hosts were just added directly, for example: # ceph osd tree # idweight type name up/down reweight -1 60.06 root default ... -14 1.82host OSD75 12 1.82osd.12 up 1 -15 1.82hos

Re: [ceph-users] 4 GB mon database?

2015-01-22 Thread Brian Rak
On 1/21/2015 5:56 PM, Gregory Farnum wrote: On Mon, Jan 19, 2015 at 2:48 PM, Brian Rak wrote: Awhile ago, I ran into this issue: http://tracker.ceph.com/issues/10411 I did manage to solve that by deleting the PGs, however ever since that issue my mon databases have been growing indefinitely

[ceph-users] 4 GB mon database?

2015-01-21 Thread Brian Rak
Awhile ago, I ran into this issue: http://tracker.ceph.com/issues/10411 I did manage to solve that by deleting the PGs, however ever since that issue my mon databases have been growing indefinitely. At the moment, I'm up to 3404 sst files, totaling 7.4GB of space. This appears to be causing

[ceph-users] subscribe

2015-01-19 Thread Brian Rak
___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Rsync mirror for repository?

2014-12-01 Thread Brian Rak
Is there a place I can download the entire repository for giant? I'm really just looking for a rsync server that presents all the files here: http://download.ceph.com/ceph/giant/centos6.5/ I know that eu.ceph.com runs one, but I'm not sure how up to date that is (because of http://eu.ceph.com

Re: [ceph-users] ceph-announce list

2014-11-25 Thread Brian Rak
Hmm, that doesn't seem to be linked to from http://ceph.com/resources/mailing-list-irc/ On 11/25/2014 4:08 AM, JuanFra Rodriguez Cardoso wrote: Sorry.. as mentioned above, it's now open to sign up: http://lists.cep communityh.com/listinfo.cgi/ceph-announce-ceph.com

[ceph-users] ceph-announce list

2014-10-29 Thread Brian Rak
Would it be possible to establish an announcement mailing list, used only for announcing new versions? Many other projects have similar lists, and they're very helpful for keeping up on changes, while not being particularly noisy. ___ ceph-users mail

Re: [ceph-users] Ceph storage pool definition with KVM/libvirt

2014-10-17 Thread Brian Rak
to do something like this: Paraphrasing or course, but can we leverage the contents of the pool definitions to abstract them at runtime on VMs or are they purely there for generation of the vm settings when a vm is instantiated? Thanks Dan - Original Message --

Re: [ceph-users] Ceph storage pool definition with KVM/libvirt

2014-10-16 Thread Brian Rak
What I've found is the nicest way of handling this is to add all the mons to your ceph.conf file. The QEMU client will use these if you don't define any in the libvirt config. Similarly, define a libvirt 'secret' and you can use that for auth, so you only have one place to change it. My enti

Re: [ceph-users] time out of sync after power failure

2014-09-26 Thread Brian Rak
Make sure you've configured ntpd with 'iburst' too. Basically, it sends a burst of packets to the time server at startup, which reduces the time it takes to get a valid time. On 9/26/2014 6:00 PM, Craig Lewis wrote: First, make sure you're running ntpd on all of the nodes. I prefer to config

[ceph-users] RBD import slow

2014-09-24 Thread Brian Rak
I've been doing some testing of importing virtual machine images, and I've found that 'rbd import' is at least 2x as slow as 'qemu-img convert'. Is there anything I can do to speed this process up? I'd like to use rbd import because it gives me a little additional flexibility. My test setup

Re: [ceph-users] OSDs are crashing with "Cannot fork" or "cannot create thread" but plenty of memory is left

2014-09-12 Thread Brian Rak
That's not how ulimit works. Check the `ulimit -a` output. On 9/12/2014 10:15 AM, Christian Eichelmann wrote: Hi, I am running all commands as root, so there are no limits for the processes. Regards, Christian ___ Von: Mariusz Gronczewski [mariusz.gronczew.

Re: [ceph-users] OSDs are crashing with "Cannot fork" or "cannot create thread" but plenty of memory is left

2014-09-12 Thread Brian Rak
What are your ulimit settings? You could be hitting the max process count. On 9/12/2014 9:06 AM, Christian Eichelmann wrote: Hi Ceph-Users, I have absolutely no idea what is going on on my systems... Hardware: 45 x 4TB Harddisks 2 x 6 Core CPUs 256GB Memory When initializing all disks and jo

Re: [ceph-users] Best Practice to Copy/Move Data Across Clusters

2014-08-20 Thread Brian Rak
We do it with rbd volumes. We're using rbd export/import and netcat to transfer it across clusters. This was the most efficient solution, that did not require one cluster to have access to the other clusters (though it does require some way of starting the process on the different machines).

Re: [ceph-users] PGs stuck creating

2014-08-08 Thread Brian Rak
Ahh figured it out. I hadn't removed the dead OSDs from the crush map, which was apparently confusing ceph. I just did 'ceph osd crush rm XXX' for all of them, restarted all the online OSDs, and the pg got created! On 8/8/2014 4:51 PM, Brian Rak wrote: ceph

[ceph-users] PGs stuck creating

2014-08-08 Thread Brian Rak
ceph version 0.80.4 (7c241cfaa6c8c068bc9da8578ca00b9f4fc7567f) I recently managed to cause some problems for one of our clusters, we had 1/3 of the OSDs fail and lose all the data. I removed all the failed OSDs from the crush map, and did 'ceph osd rm'. Once it finished recovering, I was lef

Re: [ceph-users] Firefly OSDs stuck in creating state forever

2014-08-01 Thread Brian Rak
What happens if you remove nodown? I'd be interested to see what OSDs it thinks are down. My next thought would be tcpdump on the private interface. See if the OSDs are actually managing to connect to each other. For comparison, when I bring up a cluster of 3 OSDs it goes to HEALTH_OK nearly

Re: [ceph-users] Firefly OSDs stuck in creating state forever

2014-08-01 Thread Brian Rak
Why do you have a MDS active? I'd suggest getting rid of that at least until you have everything else working. I see you've set nodown on the OSDs, did you have problems with the OSDs flapping? Do the OSDs have broken connectivity between themselves? Do you have some kind of firewall interf

Re: [ceph-users] Problem with RadosGW and special characters

2014-07-14 Thread Brian Rak
he code enough to tell. On 6/30/2014 5:41 PM, Brian Rak wrote: Just for reference, I've opened http://tracker.ceph.com/issues/8702 On 6/26/2014 10:18 PM, Brian Rak wrote: My current workaround plan is to just upload both versions of the file... I think this is probably the simplest solutio

Re: [ceph-users] radosgw issues

2014-06-30 Thread Brian Rak
That sounds like you have some kind of odd situation going on. We only use radosgw with nginx/tengine so I can't comment on the apache part of it. My understanding is this: You start ceph-radosgw, this creates a fastcgi socket somewhere (verify this is created with lsof, there are some permis

Re: [ceph-users] Problem with RadosGW and special characters

2014-06-30 Thread Brian Rak
Just for reference, I've opened http://tracker.ceph.com/issues/8702 On 6/26/2014 10:18 PM, Brian Rak wrote: My current workaround plan is to just upload both versions of the file... I think this is probably the simplest solution with the least possibility of breaking later on. On 6/26/2

Re: [ceph-users] PXE Booting from RBD

2014-06-29 Thread Brian Rak
This had come up on the iPXE lists awhile ago, and I had the following suggestion (at least for linux): Setup radosgw, store your kernel and initrd there. Create an iPXE script to boot off this stored kernel/initrd, and have the kernel know how to mount a RBD volume directly. This is a litt

Re: [ceph-users] Problem with RadosGW and special characters

2014-06-27 Thread Brian Rak
quot; or "\" (I don't which), authentification fails using python-swiftclient. Is it an issue ? On 06/25/2014 11:58 PM, Brian Rak wrote: I'm trying to find an issue with RadosGW and special characters in filenames. Specifically, it seems that filenames wi

Re: [ceph-users] Problem with RadosGW and special characters

2014-06-26 Thread Brian Rak
want to watch the release notes. Once you work around a bug, someone will fix the bug and break your hack. On Thu, Jun 26, 2014 at 8:54 AM, Brian Rak <mailto:b...@gameservers.com>> wrote: Going back to my first post, I linked to this http://stackoverflow.com/question

Re: [ceph-users] Problem with RadosGW and special characters

2014-06-26 Thread Brian Rak
Going back to my first post, I linked to this http://stackoverflow.com/questions/1005676/urls-and-plus-signs Per the defintion of application/x-www-form-urlencoded: http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.1 "Control names and values are escaped. Space characters are replace

Re: [ceph-users] Problem with RadosGW and special characters

2014-06-26 Thread Brian Rak
rote: The gateway itself supports these kind of characters. Usually we see this issue when there's something in front of the web server (like a load balancer) that modifies the requests. Another possibility is the web server configuration that might be rewriting the requests. In this case i

Re: [ceph-users] Problem with RadosGW and special characters

2014-06-26 Thread Brian Rak
e web server configuration that might be rewriting the requests. In this case it seems that you're using nginx which is outside of our usual test environment, so it might be related. Yehuda On Jun 25, 2014 5:39 PM, "Brian Rak" wrote: Unfortunately, both the client and actual files a

Re: [ceph-users] Problem with RadosGW and special characters

2014-06-25 Thread Brian Rak
filenames or percentage encode the URL explicitly. Rgds, G> On Wed, Jun 25, 2014 at 8:41 PM, Brian Rak <mailto:b...@gameservers.com>> wrote: ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74) I'll try to take a look through the bug tracker, but I didn&#

Re: [ceph-users] Problem with RadosGW and special characters

2014-06-25 Thread Brian Rak
, but it sounds familiar so I think you probably want to search the list archives and the bug tracker (http://tracker.ceph.com/projects/rgw). What version precisely are you on? -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Wed, Jun 25, 2014 at 2:58 PM, Brian Rak wrote: I&#

[ceph-users] Problem with RadosGW and special characters

2014-06-25 Thread Brian Rak
I'm trying to find an issue with RadosGW and special characters in filenames. Specifically, it seems that filenames with a + in them are not being handled correctly, and that I need to explicitly escape them. For example: ---request begin--- HEAD /ubuntu/pool/main/a/adduser/adduser_3.113+nmu3

Re: [ceph-users] Firefly RPMs broken on CentOS 6

2014-06-03 Thread Brian Rak
.1-2.el6.x86_64 As you can see the package comes from ceph repo. I also notice that python-ceph-0.80.1-0 and ceph-0.80.1-2 have different versions. Maybe this is the problem? Thanks. Pedro Sousa On Tue, Jun 3, 2014 at 7:46 PM, Brian Rak <mailto:b...@gameservers.com>> wrote:

Re: [ceph-users] Firefly RPMs broken on CentOS 6

2014-06-03 Thread Brian Rak
?repo=epel-$releasever&arch=$basearch failovermethod=priority enabled=1 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-$releasever exclude=*ceph* I've gotten several emails about this, so it's definitely something other people are running into. On 6/2/2014 1:15 PM, B

Re: [ceph-users] Firefly RPMs broken on CentOS 6

2014-06-02 Thread Brian Rak
fredo Deza wrote: Brian Where is that ceph repo coming from? I don't see any 0.80.1-2 in http://ceph.com/rpm-firefly/el6/x86_64/ On Mon, Jun 2, 2014 at 10:01 AM, Brian Rak wrote: Also the 0.80.1-2.el6 ceph-radosgw RPM no longer includes an init script. Where is the proper place to report

Re: [ceph-users] Firefly RPMs broken on CentOS 6

2014-06-02 Thread Brian Rak
un 2, 2014 at 10:01 AM, Brian Rak wrote: Also the 0.80.1-2.el6 ceph-radosgw RPM no longer includes an init script. Where is the proper place to report issues with the RPMs? On 6/2/2014 9:53 AM, Brian Rak wrote: Did the python-ceph package go away or something? Upgrading from 0.80.1-0.el6 to 0.

Re: [ceph-users] Firefly RPMs broken on CentOS 6

2014-06-02 Thread Brian Rak
Also the 0.80.1-2.el6 ceph-radosgw RPM no longer includes an init script. Where is the proper place to report issues with the RPMs? On 6/2/2014 9:53 AM, Brian Rak wrote: Did the python-ceph package go away or something? Upgrading from 0.80.1-0.el6 to 0.80.1-2.el6 does not work. # yum

[ceph-users] Firefly RPMs broken on CentOS 6

2014-06-02 Thread Brian Rak
Did the python-ceph package go away or something? Upgrading from 0.80.1-0.el6 to 0.80.1-2.el6 does not work. # yum install ceph python-ceph Package python-ceph-0.80.1-0.el6.x86_64 already installed and latest version Resolving Dependencies --> Running transaction check ---> Package ceph.x86_64

Re: [ceph-users] nginx (tengine) and radosgw

2014-05-29 Thread Brian Rak
the ssl work so that nginx acts as an ssl proxy in front of the radosgw? Cheers Andrei -------- *From: *"Brian Rak" <mailto:b...@gameservers.com> *To: *ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com&g

Re: [ceph-users] NGINX and 100-Continue

2014-05-29 Thread Brian Rak
Don't use nginx. The current version buffers all the uploads to the local disk, which causes all sorts of problems with radosgw (timeouts, clock skew errors, etc). Use tengine instead (or apache). I sent the mailing list some info on tengine a couple weeks ago. On 5/29/2014 6:11 AM, Michael

Re: [ceph-users] nginx (tengine) and radosgw

2014-05-20 Thread Brian Rak
use nginx, but from what I recall it had some ssl related issues. Have you tried to make the ssl work so that nginx acts as an ssl proxy in front of the radosgw? Cheers Andrei -------- *From: *"Brian Rak" *To: *c

[ceph-users] nginx (tengine) and radosgw

2014-05-20 Thread Brian Rak
I've just finished converting from nginx/radosgw to tengine/radosgw, and it's fixed all the weird issues I was seeing (uploads failing, random clock skew errors, timeouts). The problem with nginx and radosgw is that nginx insists on buffering all the uploads to disk. This causes a significant

Re: [ceph-users] Flapping OSDs. Safe to upgrade?

2014-05-14 Thread Brian Rak
Anything in dmesg? When you say restart, do you mean a physical restart, or just restarting the daemon? If it takes a physical restart and you're using intel NICs, it might be worth upgrading network drivers. Old versions have some bugs that cause them to just drop traffic. On 5/14/2014 9:0

[ceph-users] cephx authentication defaults

2014-05-14 Thread Brian Rak
Why are the defaults for 'cephx require signatures' and similar still false? Is it still necessary to maintain backwards compatibility with very old clients by default? It seems like from a security POV, you'd want everything to be more secure out of the box, and require the user to explicitl

Re: [ceph-users] Lost access to radosgw after crash?

2014-05-13 Thread Brian Rak
This turns out to have been a configuration change to nginx that I forgot I had made. It wasn't passing all the http options through any more, so authentication was failing. On 5/13/2014 12:43 PM, Brian Rak wrote: On 5/13/2014 12:29 PM, Yehuda Sadeh wrote: On Tue, May 13, 2014 at 8:

Re: [ceph-users] Lost access to radosgw after crash?

2014-05-13 Thread Brian Rak
On 5/13/2014 12:29 PM, Yehuda Sadeh wrote: On Tue, May 13, 2014 at 8:52 AM, Brian Rak wrote: I hit a "bug" where radosgw crashed with -101> 2014-05-13 15:26:07.188494 7fde82886820 0 ERROR: FCGX_Accept_r returned -24 too many files opened. You probably need to adjust your limi

Re: [ceph-users] Lost access to radosgw after crash?

2014-05-13 Thread Brian Rak
lock on gc.14 2014-05-13 16:27:49.792050 7f51828fa700 0 ERROR: can't read user header: ret=-2 2014-05-13 16:27:49.792055 7f51828fa700 0 ERROR: sync_user() failed, user=centosmirror2 ret=-2 Any ideas? On 5/13/2014 11:52 AM, Brian Rak wrote: I hit a "bug" where radosgw crashed wit

[ceph-users] Lost access to radosgw after crash?

2014-05-13 Thread Brian Rak
I hit a "bug" where radosgw crashed with -101> 2014-05-13 15:26:07.188494 7fde82886820 0 ERROR: FCGX_Accept_r returned -24 0> 2014-05-13 15:26:07.193772 7fde82886820 -1 rgw/rgw_main.cc: In function 'virtual void RGWProcess::RGWWQ::_clear()' thread 7fde82886820 time 2014-05-13 15:26:07.1

Re: [ceph-users] Migrate system VMs from local storage to CEPH

2014-05-05 Thread Brian Rak
This would be a better question for the Cloudstack community. On 5/2/2014 10:06 AM, Andrija Panic wrote: Hi. I was wondering what would be correct way to migrate system VMs (storage,console,VR) from local storage to CEPH. I'm on CS 4.2.1 and will be soon updating to 4.3... Is it enough to j

Re: [ceph-users] Copying RBD images between clusters?

2014-04-29 Thread Brian Rak
al0m21.877s user0m0.675s sys 0m0.637 On 4/26/2014 1:58 AM, Vladislav Gorbunov wrote: rbd -m mon-cluster1 export rbd/one-1 - | rbd -m mon-cluster2 import - rbd/one-1 пятница, 25 апреля 2014 г. пользователь Brian Rak написал: Is there a recommended way to copy an RBD image betwe

Re: [ceph-users] apple support automated mail ? WTF!?

2014-04-28 Thread Brian Rak
I thought that was just me.. I guess someone from apple is subscribed? On 4/28/2014 10:06 AM, Alphe Salas Michels wrote: **Hello, each time I send a mail to the ceph user mailing list I receive an email from apple support?! Is that a joke? Alphe Salas

[ceph-users] CentOS 6 Yum repository broken / tampered with?

2014-04-28 Thread Brian Rak
Were there any changes to the EL6 yum packages at http://ceph.com/rpm/el6/x86_64/ ? There are a number of files showing a modification date of '25-Apr-2014', but it seems that no one regenerated the repository metadata. This breaks installations using the repository, you'll get errors like th

[ceph-users] Copying RBD images between clusters?

2014-04-24 Thread Brian Rak
Is there a recommended way to copy an RBD image between two different clusters? My initial thought was 'rbd export - | ssh "rbd import -"', but I'm not sure if there's a more efficient way. ___ ceph-users mailing list ceph-users@lists.ceph.com http: