possible bug in blacklist

2013-09-24 Thread Mandell Degerness
See trace below. We run this command on system restart in order to clear any blacklist which was created while node was mis-behaving. Now, rather than giving a reasonable error, it causes at Traceback: [root@node-172-20-0-13 ~]# ceph osd blacklist rm 172.20.0.13 Traceback (most recent call last):

Re: OSD failure on start

2013-02-13 Thread Mandell Degerness
t; manual process to fix it if you can't wipe this OSD. > > Good Luck, > Mike > > > > On 2/13/2013 2:57 PM, Mandell Degerness wrote: >> >> I'm getting this error on one of my OSD's when I try to start it. >> >> I can gather more complete log

OSD failure on start

2013-02-13 Thread Mandell Degerness
I'm getting this error on one of my OSD's when I try to start it. I can gather more complete log data if no-one recognizes the error from this: Feb 13 19:30:04 node-192-168-8-14 ceph-osd: 2013-02-13 19:30:04.612847 7f4f607e7780 0 filestore(/mnt/osd96) mount found snaps <> Feb 13 19:30:04 node-19

Re: File exists not handled in 0.48argonaut1

2013-02-12 Thread Mandell Degerness
e [osd] section of the ceph.conf. > -Sam > > On Mon, Feb 11, 2013 at 2:21 PM, Mandell Degerness > wrote: >> Since the attachment didn't work, apparently, here is a link to the log: >> >> http://dl.dropbox.com/u/766198/error17.log.gz >> >> On Mon, Fe

Re: File exists not handled in 0.48argonaut1

2013-02-11 Thread Mandell Degerness
Since the attachment didn't work, apparently, here is a link to the log: http://dl.dropbox.com/u/766198/error17.log.gz On Mon, Feb 11, 2013 at 1:42 PM, Samuel Just wrote: > I don't see the more complete log. > -Sam > > On Mon, Feb 11, 2013 at 11:12 AM, Mandell Degerness &g

Re: File exists not handled in 0.48argonaut1

2013-02-11 Thread Mandell Degerness
Anyone have any thoughts on this??? It looks like I may have to wipe out the OSDs effected and rebuild them, but I'm afraid that may result in data loss because of the old OSD first crush map in place :(. On Fri, Feb 8, 2013 at 1:36 PM, Mandell Degerness wrote: > We ran into an err

Re: Increase number of pg in running system

2013-02-05 Thread Mandell Degerness
I would like very much to specify pg_num and pgp_num for the default pools, but they are defaulting to 64 (no OSDs are defined in the config file). I have tried using the options indicated by Artem, but they didn't seem to have any effect on the data and rbd pools which are created by default. Is

Re: v0.56.2 released

2013-02-05 Thread Mandell Degerness
OSD first in test scenarios. On Tue, Feb 5, 2013 at 12:37 PM, Sage Weil wrote: > On Tue, 5 Feb 2013, Mandell Degerness wrote: >> We are doing the latter. What is the best way to force it to host >> first selection? > > Checking the code, you should be seeing either

Re: v0.56.2 released

2013-02-05 Thread Mandell Degerness
We are doing the latter. What is the best way to force it to host first selection? On Tue, Feb 5, 2013 at 11:34 AM, Sage Weil wrote: > On Tue, 5 Feb 2013, Mandell Degerness wrote: >> We are using v0.56.2 and it still seems that the default crushmap is >> osd centered. Here is

Re: v0.56.2 released

2013-02-05 Thread Mandell Degerness
(586538e22afba85c59beda49789ec42024e7a061) We do not run any explicit crushtool commands as part of our start up at this time. Should we be? Regards, Mandell Degerness On Wed, Jan 30, 2013 at 3:46 PM, Sage Weil wrote: > The next bobtail point release is ready, and it's looking pretty good. >

Re: how to protect rbd from multiple simultaneous mapping

2013-01-24 Thread Mandell Degerness
The advisory locks are nice, but it would be really nice to have the fencing. If a node is temporarily off the network and a heartbeat monitor attempts to bring up a service on a different node, there is no way to ensure that the first node will not write data to the rbd after the rbd is mounted o

ceph osd crush set command under 0.53

2012-11-12 Thread Mandell Degerness
Did the syntax and behavior of the "ceph osd crush set ..." command change between 0.48 and 0.53? When trying out ceph 0.53, I get the following in my log when trying to add the first OSD to a new cluster (similar behavior for osds 2 and 3). It appears that the ceph osd crush command fails, but s

rbd map command hangs for 15 minutes during system start up

2012-11-08 Thread Mandell Degerness
l.patch 21-ceph-avoid-32-bit-page-index-overflow.patch 23-ceph-fix-dentry-reference-leak-in-encode_fh.patch Any suggestions? One thought is that the following patch (which we could not apply) is what is required: 22-rbd-reset-BACKOFF-if-unable-to-re-queue.patch Regards, Mandell Degerness hang

Re: Add monitor problems

2012-11-06 Thread Mandell Degerness
Sorry, all. It turns out the problems were entirely on our side (bad ceph.conf files on the new servers). On Tue, Nov 6, 2012 at 11:09 AM, Mandell Degerness wrote: > I'm seeing some weird errors when I add multiple monitors and I'm > hoping the list can shed some light to let m

Re: Problem with "ceph osd create "

2012-10-08 Thread Mandell Degerness
3351a0f0-f6e8-430a-b7a4-ea613a3ddf35 osd.2 down out weight 0 up_from 0 up_thru 0 down_at 0 last_clean_interval [0,0) :/0 :/0 :/0 exists,new 3f04cdbe-a468-42d3-a465-2487cc369d90 On Mon, Oct 8, 2012 at 3:49 PM, Sage Weil wrote: > On Mon, 8 Oct 2012, Mandell Degerness wrote: >> Sorry, I should have u

Re: Problem with "ceph osd create "

2012-10-08 Thread Mandell Degerness
Sorry, I should have used the https link: https://gist.github.com/af546ece91be0ba268d3 On Mon, Oct 8, 2012 at 3:20 PM, Mandell Degerness wrote: > Here is the log I got when running with the options suggested by sage: > > g...@gist.github.com:af546ece91be0ba268d3.git > > On Mon,

Re: Problem with "ceph osd create "

2012-10-08 Thread Mandell Degerness
Here is the log I got when running with the options suggested by sage: g...@gist.github.com:af546ece91be0ba268d3.git On Mon, Oct 8, 2012 at 11:34 AM, Sage Weil wrote: > Hi Mandell, > > On Mon, 8 Oct 2012, Mandell Degerness wrote: >> Hi list, >> >> I've run int

Problem with "ceph osd create "

2012-10-08 Thread Mandell Degerness
is disk came from a previous cluster, somehow) - should be just a sanity check 'ceph', 'osd', 'create', '32895846-ca1c-4265-9ce7-9f2a42b41672' (Returns 1!) This is clearly a race condition because we have several cluster creations without this happening and t

kernel oops on 0.47.2, kernel 3.4.4

2012-08-23 Thread Mandell Degerness
I know this is an old build, but I just want to verify that this isn't an unknown bug. For context, the attached log covers the time from when server .15 dropped off the net (we think power failure at this point). OSDs 72, 73, 74, and 75 are on the node which apparently lost power. Ceph version

Re: ceph osd create

2012-08-21 Thread Mandell Degerness
Found it (digging through the source code to find a guess, since it is in no way obvious): --osd-uuid On Tue, Aug 21, 2012 at 4:38 PM, Mandell Degerness wrote: > Thanks, Sage. This is what I was looking for, but what version of > ceph do I need for this to work (it isn't there

Re: ceph osd create

2012-08-21 Thread Mandell Degerness
need finer control of the block devices. Regards, Mandell Degerness On Tue, Aug 21, 2012 at 11:15 AM, Sage Weil wrote: > On Tue, 21 Aug 2012, Mandell Degerness wrote: >> OK. I think I'm getting there. >> >> I want to be able to generate the fsid to be used in the OS

Re: ceph osd create

2012-08-21 Thread Mandell Degerness
mount the OSD in a temp dir to read the fsid file, determine the OSD number, and then re-mount it where it belongs, which seems the wrong way to go. Regards, Mandell Degerness On Mon, Aug 20, 2012 at 4:26 PM, Tommi Virtanen wrote: > On Mon, Aug 20, 2012 at 3:53 PM, Mandell Degerness >

Re: ceph osd create

2012-08-20 Thread Mandell Degerness
We're running Argonaut and it only has the OSD id in the whoami file and nothing else. -Mandell On Mon, Aug 20, 2012 at 1:37 PM, Tommi Virtanen wrote: > On Mon, Aug 20, 2012 at 1:30 PM, Mandell Degerness > wrote: >> Is there now, or will there be a migration path that works t

Re: ceph osd create

2012-08-20 Thread Mandell Degerness
n each OSD to register the information with the monitors (I am assuming that is where the correlation is stored). Regards, Mandell Degerness On Wed, Aug 1, 2012 at 4:43 PM, Tommi Virtanen wrote: > On Wed, Aug 1, 2012 at 4:27 PM, Mandell Degerness > wrote: >> As of this time, we are

Re: [PATCH] make mkcephfs and init-ceph osd filesystem handling more flexible

2012-08-10 Thread Mandell Degerness
Comment below in-line: On Fri, Aug 10, 2012 at 9:12 AM, Sage Weil wrote: > On Fri, 10 Aug 2012, Danny Kukawka wrote: >> Am 10.08.2012 17:54, schrieb Sage Weil: >> > On Thu, 9 Aug 2012, Danny Kukawka wrote: >> >> Remove btrfs specific keys and replace them by more generic >> >> keys to be able to

ceph osd create

2012-08-01 Thread Mandell Degerness
is unique, with no worries about race conditions? 2. Does this eliminate the need to worry about the value of max_osd? 3. Are there any other side effects of running the command? Regards, Mandell Degerness -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of

Re: Documentation and Versions

2012-08-01 Thread Mandell Degerness
"ceph osd crush get" (i.e. trying to see what other options work), you get the result "unknown command crush" which implies to me that none of the crush subcommands are implemented, rather than that the "get" subcommand is not implemented. Regards, Mandell Degerness

Documentation and Versions

2012-07-31 Thread Mandell Degerness
version of ceph to run the "ceph osd crush ..." commands? Regards, Mandell Degerness -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

monitor start up question

2012-07-25 Thread Mandell Degerness
cluster incarnation. With the OSDs, I can just check the cluster_fsid file. Regards, Mandell Degerness -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Possible deadlock condition

2012-06-20 Thread Mandell Degerness
57 PM, Dan Mick  wrote: >>> >>> Does the xfs on the OSD have plenty of free space left, or could this be >>> an >>> allocation deadlock? >>> >>> >>> On 06/18/2012 03:17 PM, Mandell Degerness wrote: >>>> >>>> >>

Re: Possible deadlock condition

2012-06-18 Thread Mandell Degerness
allocation deadlock? > > > On 06/18/2012 03:17 PM, Mandell Degerness wrote: >> >> Here is, perhaps, a more useful traceback from a different run of >> tests that we just ran into: >> >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.680815] INFO: task >

Re: Possible deadlock condition

2012-06-18 Thread Mandell Degerness
rker_fn+0x13b/0x13b Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.694232] [] ? gs_change+0xb/0xb On Mon, Jun 18, 2012 at 11:37 AM, Mandell Degerness wrote: > We've been seeing random issues of apparent deadlocks.  We are running > ceph 0.47 on kernel 3.2.18.  OSDs are running on X

Recovery from loss of monitors

2012-06-07 Thread Mandell Degerness
I am thinking about data reliability issues and I'd like to know if we can recover a cluster if we have most of the OSD data intact (i.e. there are enough copies of all of the PGs), but we have lost all of the monitor data. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in

write-through cache

2012-05-23 Thread Mandell Degerness
I would like to test the effect of using the new write-through cache on RBD volumes mounted to Openstack VMs. What, precisely, are the changes I need to make to the volume XML in order to do so? -Mandell -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a mes

Re: Make Openstack work with Ceph

2012-03-08 Thread Mandell Degerness
I, also, am watching this closely as I'd like to see it integrated into OpenStack (and thereby into PentOS). -Mandell On Thu, Mar 8, 2012 at 10:15 AM, Josh Durgin wrote: > On 03/08/2012 12:58 AM, Tomasz Paszkowski wrote: >> >> This is good enough but I'am working on solution to implement volume

Size of monitor data dir

2012-02-14 Thread Mandell Degerness
How big can the monitor data directory grow? Is there a rule of thumb that can be used? -Mandell Degerness -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org

Best plan for recovery if disk holding monitor data dies

2012-02-13 Thread Mandell Degerness
like confirmation. Any information on gotcha's would be useful as well. Regards, Mandell Degerness -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Nova on RBD Device

2012-02-07 Thread Mandell Degerness
nova/virt/libvirt/connection.py. Regards, Mandell Degerness -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: ceph-mon blocked error

2011-11-07 Thread Mandell Degerness
So, does this apply only to Posix, or to RBD mounts as well? If so, I think we may have to rethink using Ceph in our environment at all. On Mon, Nov 7, 2011 at 1:45 PM, Tommi Virtanen wrote: > On Mon, Nov 7, 2011 at 11:48, Gregory Farnum > wrote: >> (Hopefully this email still makes sense; I re

Re: ceph-mon blocked error

2011-11-07 Thread Mandell Degerness
Can someone give me the bug number for this? On Sat, Nov 5, 2011 at 7:48 PM, Alexandre Oliva wrote: > On Nov  5, 2011, Mandell Degerness wrote: > >> Yes, we are using kernel module for ceph and there was a posix file >> system and an RBD mounted on the node at the time.  

Re: ceph-mon blocked error

2011-11-03 Thread Mandell Degerness
] host = 172.16.0.129 On Thu, Nov 3, 2011 at 12:23 PM, Tommi Virtanen wrote: > On Thu, Nov 3, 2011 at 12:09, Mandell Degerness > wrote: >> We are currently running all services on each of the three nodes. >> (mon, mds, and several osd processes). > > Is the osd da

Re: ceph-mon blocked error

2011-11-03 Thread Mandell Degerness
We are currently running all services on each of the three nodes. (mon, mds, and several osd processes). On Wed, Nov 2, 2011 at 9:47 PM, Gregory Farnum wrote: > On Wed, Nov 2, 2011 at 9:30 PM, Mandell Degerness > wrote: >> Should this error showing up in the log worry me? > &

ceph-mon blocked error

2011-11-02 Thread Mandell Degerness
Should this error showing up in the log worry me? Nov 2 21:28:12 node-172-16-0-128 kernel: INFO: task ceph-mon:5715 blocked for more than 120 seconds. Nov 2 21:28:12 node-172-16-0-128 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 2 21:28:12 node-172-16-0

Re: rbd hangs

2011-10-20 Thread Mandell Degerness
We see this error occur on the system running the OSD, at the time that the rbd call is made (an rbd create call, if that helps). On Thu, Oct 20, 2011 at 12:31 AM, Wido den Hollander wrote: > Hi, > > On 10/20/2011 01:41 AM, Mandell Degerness wrote: >> >> I'm having an o

rbd hangs

2011-10-19 Thread Mandell Degerness
I'm having an occasional bug where rbd is hanging. This trace is in the logs: Oct 19 16:33:04 node-172-16-0-130 kernel: [ cut here ] Oct 19 16:33:04 node-172-16-0-130 kernel: kernel BUG at fs/btrfs/inode.c:3653! Oct 19 16:33:04 node-172-16-0-130 kernel: invalid opcode: 00