Re: [ceph-users] Can't start osd- one osd alway be down.

2014-10-24 Thread Ta Ba Tuan
Hi Craig, Thanks for replying. When i started that osd, Ceph Log from "ceph -w" warns pgs 7.9d8 23.596, 23.9c6, 23.63 can't recovery as pasted log. Those pgs are "active+degraded" state. #ceph pg map 7.9d8 osdmap e102808 pg 7.9d8 (7.9d8) -> up [93,49] acting [93,49] (When start osd.21 then pg

[ceph-users] journals relabeled by OS, symlinks broken

2014-10-24 Thread Steve Anthony
Hello, I was having problems with a node in my cluster (Ceph v0.80.7/Debian Wheezy/Kernel 3.12), so I rebooted it and the disks were relabled when it came back up. Now all the symlinks to the journals are broken. The SSDs are now sda, sdb, and sdc but the journals were sdc, sdd, and sde: root@cep

Re: [ceph-users] librados crash in nova-compute

2014-10-24 Thread Xu (Simon) Chen
I am actually curious about one more thing. In the image -> rbd case, is rbd_secret_uuid config option really used? I am running nova-compute as a non-root user, so virsh secret shouldn't be accessible unless we get it via rootwrap. I had to make ceph keyring file readable to the nova-compute user

Re: [ceph-users] librados crash in nova-compute

2014-10-24 Thread Xu (Simon) Chen
Thanks. I found the commit on git and confirms 0.80.7 fixes the issue. On Friday, October 24, 2014, Josh Durgin wrote: > On 10/24/2014 08:21 AM, Xu (Simon) Chen wrote: > >> Hey folks, >> >> I am trying to enable OpenStack to use RBD as image backend: >> https://bugs.launchpad.net/nova/+bug/12263

Re: [ceph-users] Fio rbd stalls during 4M reads

2014-10-24 Thread Mark Kirkwood
Yeah, looks like it. If I disable the rbd ccahe: $ tail /etc/ceph/ceph.conf ... [client] rbd cache = false then the 2-4M reads work fine (no invalid reads in valgrind either). I'll let the fio guys know. Cheers Mark On 25/10/14 06:56, Gregory Farnum wrote: There's an issue in master branch

Re: [ceph-users] Can't start osd- one osd alway be down.

2014-10-24 Thread Craig Lewis
It looks like you're running into http://tracker.ceph.com/issues/5699 You're running 0.80.7, which has a fix for that bug. From my reading of the code, I believe the fix only prevents the issue from occurring. It doesn't work around or repair bad snapshots created on older versions of Ceph. Wer

Re: [ceph-users] can we deploy multi-rgw on one ceph cluster?

2014-10-24 Thread Craig Lewis
You can deploy multiple RadosGW in a single cluster. You'll need to setup zones (see http://ceph.com/docs/master/radosgw/federated-config/). Most people seem to be using zones for geo-replication, but local replication works even better. Multiple zones don't have to be replicated either. For ex

Re: [ceph-users] get/put files with radosgw once MDS crash

2014-10-24 Thread Craig Lewis
No, MDS and RadosGW store their data in different pools. There's no way for them to access the other's data. All of the data is stored in RADOS, and can be accessed via the rados CLI. It's not easy, and you'd probably have to spend a lot of time reading the source code to do it. On Fri, Oct 24,

[ceph-users] How to recover Incomplete PGs from "lost time" symptom?

2014-10-24 Thread Chris Kitzmiller
I have a number of PGs which are marked as incomplete. I'm at a loss for how to go about recovering these PGs and believe they're suffering from the "lost time" symptom. How do I recover these PGs? I'd settle for sacrificing the "lost time" and just going with what I've got. I've lost the abilit

[ceph-users] Ceph and hadoop

2014-10-24 Thread Matan Safriel
Hi, Given HDFS is far from ideal for small files, I am examining the possibility of using Hadoop on top Ceph. I found mainly one online resource about it https://ceph.com/docs/v0.79/cephfs/hadoop/. I am wondering whether there is any reference implementation or blog post you are aware of, about ha

Re: [ceph-users] Object Storage Statistics

2014-10-24 Thread Yehuda Sadeh
On Fri, Oct 24, 2014 at 8:17 AM, Dane Elwell wrote: > Hi list, > > We're using the object storage in production and billing people based > on their usage, much like S3. We're also trying to produce things like > hourly bandwidth graphs for our clients. > > We're having some issues with the API not

Re: [ceph-users] librados crash in nova-compute

2014-10-24 Thread Josh Durgin
On 10/24/2014 08:21 AM, Xu (Simon) Chen wrote: Hey folks, I am trying to enable OpenStack to use RBD as image backend: https://bugs.launchpad.net/nova/+bug/1226351 For some reason, nova-compute segfaults due to librados crash: ./log/SubsystemMap.h: In function 'bool ceph::log::SubsystemMap::sh

Re: [ceph-users] Fio rbd stalls during 4M reads

2014-10-24 Thread Mark Nelson
FWIW the specific fio read problem appears to have started after 0.86 and before commit 42bcabf. Mark On 10/24/2014 12:56 PM, Gregory Farnum wrote: There's an issue in master branch temporarily that makes rbd reads greater than the cache size hang (if the cache was on). This might be that. (Ja

Re: [ceph-users] Fio rbd stalls during 4M reads

2014-10-24 Thread Gregory Farnum
There's an issue in master branch temporarily that makes rbd reads greater than the cache size hang (if the cache was on). This might be that. (Jason is working on it: http://tracker.ceph.com/issues/9854) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Thu, Oct 23, 2014 at 5

Re: [ceph-users] RGW Federated Gateways and Apache 2.4 problems

2014-10-24 Thread Craig Lewis
Thanks! I'll continue with Apache 2.2 until the next release. On Fri, Oct 24, 2014 at 8:58 AM, Yehuda Sadeh wrote: > On Thu, Oct 23, 2014 at 3:51 PM, Craig Lewis > wrote: > > I'm having a problem getting RadosGW replication to work after upgrading > to > > Apache 2.4 on my primary test cluster

Re: [ceph-users] Extremely slow small files rewrite performance

2014-10-24 Thread Yan, Zheng
On Fri, Oct 24, 2014 at 8:47 AM, Sergey Nazarov wrote: > Any update? > The short answer is that when the command is executed for second time, the MDS needs to truncate the file zero length. The speed of truncate a file is limited by the OSD speed. (creating file and write data to the file are asy

Re: [ceph-users] Lost monitors in a multi mon cluster

2014-10-24 Thread Loic Dachary
Bonjour, Maybe http://ceph.com/docs/giant/rados/troubleshooting/troubleshooting-mon/ can help ? Joao wrote that a few month ago and it covers a number of scenarios. Cheers On 24/10/2014 08:27, HURTEVENT VINCENT wrote: > Hello, > > I was running a multi mon (3) Ceph cluster and in a migration m

Re: [ceph-users] Lost monitors in a multi mon cluster

2014-10-24 Thread Dan van der Ster
Hi, October 24 2014 5:28 PM, "HURTEVENT VINCENT" wrote: > Hello, > > I was running a multi mon (3) Ceph cluster and in a migration move, I > reinstall 2 of the 3 monitors > nodes without deleting them properly into the cluster. > > So, there is only one monitor left which is stuck in probing

Re: [ceph-users] RGW Federated Gateways and Apache 2.4 problems

2014-10-24 Thread Yehuda Sadeh
On Thu, Oct 23, 2014 at 3:51 PM, Craig Lewis wrote: > I'm having a problem getting RadosGW replication to work after upgrading to > Apache 2.4 on my primary test cluster. Upgrading the secondary cluster to > Apache 2.4 doesn't cause any problems. Both Ceph's apache packages and > Ubuntu's package

Re: [ceph-users] Extremely slow small files rewrite performance

2014-10-24 Thread Sergey Nazarov
Any update? On Tue, Oct 21, 2014 at 3:32 PM, Sergey Nazarov wrote: > Ouch, I think client log is missing. > Here it goes: > https://www.dropbox.com/s/650mjim2ldusr66/ceph-client.admin.log.gz?dl=0 > > On Tue, Oct 21, 2014 at 3:22 PM, Sergey Nazarov wrote: >> I enabled logging and performed same t

[ceph-users] Lost monitors in a multi mon cluster

2014-10-24 Thread HURTEVENT VINCENT
Hello, I was running a multi mon (3) Ceph cluster and in a migration move, I reinstall 2 of the 3 monitors nodes without deleting them properly into the cluster. So, there is only one monitor left which is stuck in probing phase and the cluster is down. As I can only connect to mon socket, I d

[ceph-users] librados crash in nova-compute

2014-10-24 Thread Xu (Simon) Chen
Hey folks, I am trying to enable OpenStack to use RBD as image backend: https://bugs.launchpad.net/nova/+bug/1226351 For some reason, nova-compute segfaults due to librados crash: ./log/SubsystemMap.h: In function 'bool ceph::log::SubsystemMap::should_gather(unsigned int, int)' thread 7f1b477fe7

[ceph-users] Object Storage Statistics

2014-10-24 Thread Dane Elwell
Hi list, We're using the object storage in production and billing people based on their usage, much like S3. We're also trying to produce things like hourly bandwidth graphs for our clients. We're having some issues with the API not returning the correct statistics. I can see that there is a --sy

Re: [ceph-users] mds isn't working anymore after osd's running full

2014-10-24 Thread Jasper Siero
Hello Greg and John, I used the patch on the ceph cluster and tried it again: /usr/bin/ceph-mds -i th1-mon001 -c /etc/ceph/ceph.conf --cluster ceph --undump-journal 0 journaldumptgho-mon001 undump journaldumptgho-mon001 start 9483323613 len 134213311 writing header 200. writing 948332361

Re: [ceph-users] Continuous OSD crash with kv backend (firefly)

2014-10-24 Thread Haomai Wang
It's not stable at Firely for kvstore. But for the master branch, it's should be no existing/known bug. On Fri, Oct 24, 2014 at 7:41 PM, Andrey Korolyov wrote: > Hi, > > during recovery testing on a latest firefly with leveldb backend we > found that the OSDs on a selected host may crash at once,

[ceph-users] Continuous OSD crash with kv backend (firefly)

2014-10-24 Thread Andrey Korolyov
Hi, during recovery testing on a latest firefly with leveldb backend we found that the OSDs on a selected host may crash at once, leaving attached backtrace. In other ways, recovery goes more or less smoothly for hours. Timestamps shows how the issue is correlated between different processes on s

[ceph-users] get/put files with radosgw once MDS crash

2014-10-24 Thread 廖建锋
dear cepher, Today, I use mds to put/get files from ceph storgate cluster as it is very easy to use for each side of a company. But ceph mds is not very stable, So my question: is it possbile to get the file name and contentes from OSD with radosgw once MDS crash and how ? ___