Hello,
I tried to upgrade to 0.56.1 this morning as it could help with
recovery. No luck so far...
What's wrong with your primary OSD?
I don't know what's really wrong. The disk seems fine.
In general they shouldn't really be crashing that frequently and if you've got
a new bug we'd
Hi,
We recently experienced issue with the backplane of our server, resulting in
loosing half of our osd.
During that period the rados gateway failed initializing (timeout).
We found that the gateway was hanging in the init_watch function.
We recreate our OSDs and we still have this issue, but
Hello,
I'm wondering if I can get every rb.0.8e10.3e2219d7.* from the OSD
drive and cat them together and get back a usable raw volume from which
I could get back my data ?
Everything seems to be there but I don't know the order of the rbd
objects. Are the last bytes of the file name the
On 01/08/2013 01:57 PM, Denis Fondras wrote:
Hello,
I'm wondering if I can get every rb.0.8e10.3e2219d7.* from the OSD
drive and cat them together and get back a usable raw volume from which
I could get back my data ?
Yes, that is doable. The problem only is that RBD is sparse. So you'd
On 01/08/2013 02:10 PM, Wido den Hollander wrote:
On 01/08/2013 01:57 PM, Denis Fondras wrote:
Hello,
I'm wondering if I can get every rb.0.8e10.3e2219d7.* from the OSD
drive and cat them together and get back a usable raw volume from which
I could get back my data ?
Yes, that is doable.
Notify and gc objects where unfound, we marked them as lost and now the rados
start.
But this means that if some notify object are not fully available, the
radosgateway stop responding.
-Original Message-
From: ceph-devel-ow...@vger.kernel.org
[mailto:ceph-devel-ow...@vger.kernel.org]
If you know the prefix (which is seems you do) and the original size of the rbd
you should be able to use my utility.
https://github.com/smmoore/ceph/blob/master/rbd_restore.sh
You will need all the rados files in the current working directory you execute
the script from. We have used it many
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256
On 08/01/13 15:51, Chen, Xiaoxi wrote:
I would like to upgrade to 0.56-1 but there is no package for 3.7
kernel(raring)
I uploaded 0.56.1 to Ubuntu Raring this morning - its published and
should ripple through archive mirrors in the next few
On Mon, Dec 31, 2012 at 10:27 AM, Wido den Hollander w...@widodh.nl wrote:
Just make sure you use cephx (enabled by default in 0.55) so that you don't
accidentally connect to the wrong cluster.
Use of cephx will provide an additional layer of protection for the
clients, but the OSDs and
I'm confused. Isn't the HeartbeatMap all about local thread
heartbeating (so, not pings with other OSDs)? I would assume the
upgrade and restart just caused a bunch of work and the CPUs got
overloaded.
-Greg
On Thu, Jan 3, 2013 at 8:52 AM, Sage Weil s...@inktank.com wrote:
Hi Wido,
On Thu, 3
On 2013/01/08 at 10:08, Gregory Farnum g...@inktank.com wrote:
On Mon, Jan 7, 2013 at 9:36 PM, Cesar Mello cme...@gmail.com
wrote:
Hi,
I have been playing with ceph and reading the docs/thesis the last
couple of nights just to learn something during my vacation. I was
not
expecting to find
I'd like to adjust the branches we're maintaining in ceph.git. Currently:
master
- active development
next
- frozen for next release
- bug fixes only
- regularly merged back into master
testing
- last development release
- cherry-pick -x'd critical fixes
- packages at
On 01/08/2013 12:39 PM, Sage Weil wrote:
I'd like to adjust the branches we're maintaining in ceph.git. Currently:
master
- active development
next
- frozen for next release
- bug fixes only
- regularly merged back into master
testing
- last development release
- cherry-pick -x'd
Hello,
What error message do you get when you try and turn it on? If the
daemon is crashing, what is the backtrace?
The daemon is crashing. Here is the full log if you want to take a look
: http://vps.ledeuns.net/ceph-osd.0.log.gz
The RBD rebuild script helped to get the data back. I will
I have been testing ceph for a little over a month now. Our design goal is to
have 3 datacenters in different buildings all tied together over 10GbE.
Currently there are 10 servers each serving 1 osd in 2 of the datacenters. In
the third is one large server with 16 SAS disks serving 8 osds.
We lost data in notify and gc. What bother me is that the rados gateway can
start if we desactivate the cache.
I think the availability of the cache objects shouldn't take down the rados
gateway. The option should be more a I want the cache if available.
-Message d'origine-
De : Gregory
On Tue, Jan 8, 2013 at 1:11 PM, Yann ROBIN yann.ro...@youscribe.com wrote:
We lost data in notify and gc. What bother me is that the rados gateway can
start if we desactivate the cache.
I think the availability of the cache objects shouldn't take down the rados
gateway. The option should be
On Tue, 8 Jan 2013, Gregory Farnum wrote:
I'm confused. Isn't the HeartbeatMap all about local thread
heartbeating (so, not pings with other OSDs)? I would assume the
upgrade and restart just caused a bunch of work and the CPUs got
overloaded.
It is. In #3714's case, the OSD was down for a
I tried increasing the number of metadata replicas from 2 to 3 on my
test cluster with the following command:
ceph osd pool set metadata size 3
Afterwards it appears that all the metadata placement groups switch to
a degraded state and doesn't seem to be attempting to recover:
2013-01-08
What are your CRUSH rules? Depending on how you set this cluster up,
it might not be placing more than one replica in a single host, and
you've only got two hosts so it couldn't satisfy your request for 3
copies.
-Greg
On Tue, Jan 8, 2013 at 2:11 PM, Bryan Stillwell
bstillw...@photobucket.com
That would make sense. Here's what the metadata rule looks like:
rule metadata {
ruleset 1
type replicated
min_size 2
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
On Tue, Jan 8, 2013 at 3:23 PM, Gregory
Yep! The step chooseleaf firstn 0 type host means choose n nodes of
type host, and select a leaf under each one of them, where n is the
pool size. You only have two hosts so it can't do more than 2 with
that rule type.
You could do step chooseleaf firstn 0 type device, but that won't
guarantee a
I appreciate you giving more detail on this. I plan on expanding the
test cluster to 5 servers soon, so I'll just wait until then before
changing the number of replicas.
Thanks,
Bryan
On Tue, Jan 8, 2013 at 3:49 PM, Gregory Farnum g...@inktank.com wrote:
Yep! The step chooseleaf firstn 0 type
On Tue, Jan 8, 2013 at 11:44 AM, Denis Fondras c...@ledeuns.net wrote:
Hello,
What error message do you get when you try and turn it on? If the
daemon is crashing, what is the backtrace?
The daemon is crashing. Here is the full log if you want to take a look :
Hi,
Setting rep size to 3 only make the data triple-replication, that means
when you fail all OSDs in 2 out of 3 DCs, the data still accessable.
But Monitor is another story, for monitor clusters with 2N+1 nodes, it
require at least N+1 nodes alive, and indeed this is why you
After several months, we have an important update for the argonaut v0.48.x
series. This release contains a critical fix that can prevent data loss or
corruption in a power loss or kernel panic situation. There are also
several fixes for the OSDs and for the radosgw. We recommend all v0.48.x
On Mon, 7 Jan 2013, Filippos Giannakos wrote:
Hi Josh,
On 01/05/2013 02:08 AM, Josh Durgin wrote:
On 01/04/2013 05:01 AM, Filippos Giannakos wrote:
Hi Team,
Is there any progress or any comments regarding the librados aio stat
patch ?
They look good to me. I put them in the
Hello Mark,
ok, adding another osd is a good option, however my initial plan was
to raise full ratio watermark and remove unnecessary data. it' clear
for me that overfilling one of osd will cause big problems to the fs
consistency.
But... 2 other OSDs are still having plenty of space. what is the
Thanks a lot Greg,
that was the black magic command I was looking for )
I deleted some obsolete data and reached those figures:
chef@cephgw:~$ ./clu.sh exec df -kh|grep osd
/dev/mapper/vg00-osd 252G 153G 100G 61% /var/lib/ceph/osd/ceph-0
/dev/mapper/vg00-osd 252G 180G 73G 72%
On Wed, 9 Jan 2013, Roman Hlynovskiy wrote:
Thanks a lot Greg,
that was the black magic command I was looking for )
I deleted some obsolete data and reached those figures:
chef@cephgw:~$ ./clu.sh exec df -kh|grep osd
/dev/mapper/vg00-osd 252G 153G 100G 61% /var/lib/ceph/osd/ceph-0
On Tuesday, January 8, 2013 at 10:52 PM, Sage Weil wrote:
On Wed, 9 Jan 2013, Roman Hlynovskiy wrote:
Thanks a lot Greg,
that was the black magic command I was looking for )
I deleted some obsolete data and reached those figures:
chef@cephgw:~$ ./clu.sh (http://clu.sh) exec df
31 matches
Mail list logo