RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-07-31 Thread James Harper
I'm about to start trying this out. Has anything changed since this email http://www.mail-archive.com/ceph-devel@vger.kernel.org/msg13984.html ? Thanks James -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More

ocf script for ceph quorum check and fs

2013-08-02 Thread James Harper
I want to mount ceph fs (using fuse) but /etc/fstab treats it as a local filesystem and so tries to mount it before ceph is started, or indeed before the network is even up. Also, ceph tries to start before the network is up and fails because it can't bind to an address. I think this is

bug in /etc/init.d/ceph debian

2013-08-02 Thread James Harper
I'm running ceph 0.61.7-1~bpo70+1 and I think there is a bug in /etc/init.d/ceph The heartbeat RA expects that the init.d script will return 3 for not running, but if there is no agent (eg mds) defined for that host it will return 0 instead, so pacemaker thinks the agent is running on a node

RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-05 Thread James Harper
Yes the procedure didn't change. If you're on debian I could also sent your prebuilt .deb for blktap and for a patched xen version that includes userspace RBD support. It's working great so far. I just pulled the source and built it then copied blktap in. For some reason I already had

RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-05 Thread James Harper
For some reason I already had a tapdisk in /usr/sbin, as well as the one in /usr/bin, which confused the issue for a while. I must have installed something manually but I don't remember what. What distribution are you using ? Debian Wheezy James -- To unsubscribe from this list:

RE: ocf script for ceph quorum check and fs

2013-08-06 Thread James Harper
I want to mount ceph fs (using fuse) but /etc/fstab treats it as a local filesystem and so tries to mount it before ceph is started, or indeed before the network is even up. Also, ceph tries to start before the network is up and fails because it can't bind to an address. I think this is

RE: bug in /etc/init.d/ceph debian

2013-08-06 Thread James Harper
I'm running ceph 0.61.7-1~bpo70+1 and I think there is a bug in /etc/init.d/ceph The heartbeat RA expects that the init.d script will return 3 for not running, but if there is no agent (eg mds) defined for that host it will return 0 instead, so pacemaker thinks the agent is running on

RE: bug in /etc/init.d/ceph debian

2013-08-07 Thread James Harper
Hi James, Here is a somewhat simpler patch; does this work for you? Note that if you something like /etc/init.d/ceph status osd.123 where osd.123 isn't in ceph.conf then you get a status 1 instead of 3. But for the /etc/init.d/ceph status mds (or osd or mon) case where there are no

RE: bug in /etc/init.d/ceph debian

2013-08-08 Thread James Harper
On Wed, 7 Aug 2013, James Harper wrote: Hi James, Here is a somewhat simpler patch; does this work for you? Note that if you something like /etc/init.d/ceph status osd.123 where osd.123 isn't in ceph.conf then you get a status 1 instead of 3. But for the /etc/init.d/ceph

RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-08 Thread James Harper
Yes the procedure didn't change. If you're on debian I could also sent your prebuilt .deb for blktap and for a patched xen version that includes userspace RBD support. If you have any issue, I can be found on ceph's IRC under 'tnt' nick. I've had a few occasions where tapdisk has

RE: bug in /etc/init.d/ceph debian

2013-08-09 Thread James Harper
I haven't tried your patch yet, but can it ever return 0? It seems to set it to 3 initially, and then change it to 1 if it finds an error. I can't see that it ever sets it to 0 indicating that daemons are running. Easy enough to fix by setting the EXIT_STATUS=0 after the check of

RE: bug in /etc/init.d/ceph debian

2013-08-09 Thread James Harper
But I think this still won't have the desired outcome if you have 2 OSD's. The possible situations if the resource is supposed to be running are: . Both running = all good, pacemaker will do nothing . Both stopped = all good, pacemaker will start the services . One stopped one running =

RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-10 Thread James Harper
Hi, I've had a few occasions where tapdisk has segfaulted: tapdisk[9180]: segfault at 7f7e3a5c8c10 ip 7f7e387532d4 sp 7f7e3a5c8c10 error 4 in libpthread-2.13.so[7f7e38748000+17000] tapdisk:9180 blocked for more than 120 seconds. tapdisk D 88043fc13540 0

RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-12 Thread James Harper
tapdisk[9180]: segfault at 7f7e3a5c8c10 ip 7f7e387532d4 sp 7f7e3a5c8c10 error 4 in libpthread-2.13.so[7f7e38748000+17000] tapdisk:9180 blocked for more than 120 seconds. tapdisk D 88043fc13540 0 9180 1 0x You can try generating a core file by

RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-12 Thread James Harper
Here's an (untested yet) patch in the rbd error path: diff --git a/drivers/block-rbd.c b/drivers/block-rbd.c index 68fbed7..ab2d2c5 100644 --- a/drivers/block-rbd.c +++ b/drivers/block-rbd.c @@ -560,6 +560,9 @@ err: if (c) rbd_aio_release(c); +

RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-13 Thread James Harper
FWIW, I can confirm via printf's that this error path is never hit in at least some of the crashes I'm seeing. Ok thanks. Are you using cache btw ? I hope not. How could I tell? It's not something I've explicitly enabled. Thanks James -- To unsubscribe from this list: send the

RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-13 Thread James Harper
-company.com] Sent: Tuesday, 13 August 2013 7:20 PM To: James Harper Cc: Pasi Kärkkäinen; ceph-devel@vger.kernel.org; xen-de...@lists.xen.org Subject: Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p Hi, I hope not. How could I tell? It's not something I've explicitly

RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-13 Thread James Harper
I think I have a separate problem too - tapdisk will segfault almost immediately upon starting but seemingly only for Linux PV DomU's. Once it has started doing this I have to wait a few hours to a day before it starts working again. My Windows DomU's appear to be able to start normally though.

RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-13 Thread James Harper
On Wed, Aug 14, 2013 at 1:39 AM, James Harper james.har...@bendigoit.com.au wrote: I think I have a separate problem too - tapdisk will segfault almost immediately upon starting but seemingly only for Linux PV DomU's. Once it has started doing this I have to wait a few hours to a day

RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-13 Thread James Harper
On Wed, Aug 14, 2013 at 1:39 AM, James Harper james.har...@bendigoit.com.au wrote: I think I have a separate problem too - tapdisk will segfault almost immediately upon starting but seemingly only for Linux PV DomU's. Once it has started doing this I have to wait a few hours

RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-14 Thread James Harper
Hi, I just tested with tap2:aio and that worked (had an old image of the VM on lvm still so just tested with that). Switching back to rbd and it crashes every time, just as postgres is starting in the vm. Booting into single user mode, waiting 30 seconds, then letting the boot continue

RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-15 Thread James Harper
Hi, I just tested with tap2:aio and that worked (had an old image of the VM on lvm still so just tested with that). Switching back to rbd and it crashes every time, just as postgres is starting in the vm. Booting into single user mode, waiting 30 seconds, then letting the boot

RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-15 Thread James Harper
I just had a crash since upgrading to dumpling, and will disable merging tonight. Still crashes with merging disabled. James -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at

RE: debugging librbd async

2013-08-15 Thread James Harper
On Fri, 16 Aug 2013, James Harper wrote: I'm testing out the tapdisk rbd that Sylvain wrote under Xen, and have been having all sorts of problems as the tapdisk process is segfaulting. To make matters worse, any attempt to use gdb on the resulting core just tells me it can't find

RE: debugging librbd async

2013-08-16 Thread James Harper
Of course, the old standby is to just crank up the logging detail and try to narrow down where the crash happens. Have you tried that yet? I haven't touched the rbd code. Is increased logging a compile-time option or a config option? That is probably the first you should try then.

RE: debugging librbd async

2013-08-16 Thread James Harper
I'm also testing valgrind at the moment, just basic memtest, but suddenly everything is quite stable even though it's under reasonable load right now. Stupid heisenbugs. Valgrind makes things go very slow (~10x?), which can have a huge effect on timing. Sometimes that reveals new races,

RE: debugging librbd async

2013-08-17 Thread James Harper
I can now reliably reproduce this with fio (see config following email), but never under valgrind so far. James [global] directory=/tmp/fio size=128M ioengine=libaio [randwrite1] rw=randwrite iodepth=32 [randread1] rw=randread iodepth=32 [randwrite2] rw=randwrite iodepth=32 [randread2]

RE: v0.61.8 Cuttlefish released

2013-08-19 Thread James Harper
We've made another point release for Cuttlefish. This release contains a number of fixes that are generally not individually critical, but do trip up users from time to time, are non-intrusive, and have held up under testing. Notable changes include: * librados: fix async aio

RE: v0.61.8 Cuttlefish released

2013-08-19 Thread James Harper
On Mon, 19 Aug 2013, James Harper wrote: We've made another point release for Cuttlefish. This release contains a number of fixes that are generally not individually critical, but do trip up users from time to time, are non-intrusive, and have held up under testing. Notable

trivial bug in aio_write

2013-08-27 Thread James Harper
In internal.cc function aio_write, the ldout(cct, ...) statement writes out buf instead of buf. I assume this is not intentional. James -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at

RE: debugging librbd async

2013-08-27 Thread James Harper
Still having crashes with the rbd module for blktap. I think I can't get consistent debugging info out of librbd. When it writes to a file the logging is buffered so the tail is always missing. When it logs to syslog I thought I was getting everything but now I'm not so sure. What is the best

RE: debugging librbd async

2013-08-28 Thread James Harper
I have set the logfile to be opened with SYNC and that seems to be giving me more consistent output I see the crash is mostly happening around queue_aio_write. Most of the time the last thing I see is this entry librados: queue_aio_write 0x7f0928004390 completion 0x1ea65d0 write_seq 147. I've

RE: debugging librbd async - valgrind memtest hit

2013-08-30 Thread James Harper
I finally got a valgrind memtest hit... output attached below email. I recompiled all of tapdisk and ceph without any -O options (thought I had already...) and it seems to have done the trick Basically it looks like an instance of AioRead is being accessed after being free'd. I need some hints

RE: debugging librbd async - valgrind memtest hit

2013-08-30 Thread James Harper
On Fri, 30 Aug 2013, James Harper wrote: I finally got a valgrind memtest hit... output attached below email. I recompiled all of tapdisk and ceph without any -O options (thought I had already...) and it seems to have done the trick What version is this? The line numbers don't seem

still crashes with tapdisk rbd

2013-09-12 Thread James Harper
I'm still getting crashes with tapdisk rbd. Most of the time it crashes gdb if I try. When I do get something, the crashing thread is always segfaulting in pthread_cond_wait and the stack is always corrupt: (gdb) bt #0 0x7faae20c52d7 in pthread_cond_wait@@GLIBC_2.3.2 () from

RE: still crashes with tapdisk rbd

2013-09-12 Thread James Harper
There isn't a simple magic string I can point to except for struct ceph_msg_header, but I doubt that will help, since it is reading the headers and message bodies into different buffers. Ok thanks. I forget: are you able to reproduce any of this with debugging enabled? I would suggest

RE: still crashes with tapdisk rbd

2013-09-13 Thread James Harper
You might also want to just look at read_message, connect, and accept in Pipe.cc as I think those are the only places where data is read off the network into a buffer/struct on the stack. After adding the following to the [client] section of the config file the problem seems to have gone

RE: writing a ceph cliente for MS windows

2013-11-05 Thread James Harper
Good day developers! I would like to propose to the one interested work with me to develop a ceph cliente for MS windows world, Basing us on dokanFS. I've looked at porting the rbd client to windows a little while back. That would require a kernel driver and all the rbd stuff is C++

libuuid vs boost uuid

2013-11-08 Thread James Harper
Just out of curiosity (recent thread about windows port) I just had a quick go at compiling librados under mingw (win32 cross compile), and one of the errors that popped up was the lack of libuuid under mingw. Ceph appears to use libuuid, but I notice boost appears to include a uuid class too,

portability issue with gmtime method in utime_t

2013-11-09 Thread James Harper
utime.h defines a utime_t class with a gmtime() method, and also calls the library function gmtime_r(). mingw implements gmtime_r() as a macro in pthread.h that in turn calls gmtime(), and gcc bails because it gets confused about which is being called: utime.h: In member function 'utime_t

RE: libuuid vs boost uuid

2013-11-09 Thread James Harper
On Sat, 9 Nov 2013, James Harper wrote: Just out of curiosity (recent thread about windows port) I just had a quick go at compiling librados under mingw (win32 cross compile), and one of the errors that popped up was the lack of libuuid under mingw. Ceph appears to use libuuid, but I

RE: libuuid vs boost uuid

2013-11-13 Thread James Harper
Hi James, I just wanted to follow up on this thread. I'd like to bring this patch into the wip-port portability branch. Were you able to get the boost::uuid to work as a drop-in replacement? I have it compiling but haven't tested. I'll send through what I have. James -- To unsubscribe

RE: libuuid vs boost uuid

2013-11-13 Thread James Harper
Patch follows. When I wrote it I was just thinking it would be used for win32 build, hence the #ifdef. As I said before, it compiles but I haven't tested it. I can clean it up a bit and resend it with a signed-off-by if anyone wants to pick it up and follow it through sooner than I can. I don't

does librbd actually need leveldb?

2013-11-16 Thread James Harper
Does librbd actually need leveldb? And if so, does it need it in all cases? I'm trying to figure out what dependencies are required for librbd for win32, and leveldb doesn't build cleanly at first go. Thanks James -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the

adminsocket

2013-11-22 Thread James Harper
What is an adminsocket used for? Would librbd use one in normal operation? Thanks James -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: adminsocket

2013-11-22 Thread James Harper
On 11/22/2013 06:42 PM, James Harper wrote: What is an adminsocket used for? Would librbd use one in normal operation? It's a way to send administrative and informational commands directly to a Ceph entity (usually a daemon, but sometimes a client). Almost all the ceph entities create

RE: win32 build of librbd

2013-11-24 Thread James Harper
Does this list block attachments? I sent this through yesterday with the patch as an attachment but it doesn't appear to have arrived. With the attached patch I can use mingw32 on Debian, with a few extra bits (boost, cryptopp, libatomic-ops), and build librados.dll, librbd.dll, and

RE: libuuid vs boost uuid

2013-11-25 Thread James Harper
James, I'm using uuid.begin()/end() to grab the 16-byte representation of the UUID. Did you figure out how to populate a boost::uuid_t from the bytes? In particular, I'm referring to FileJournal::decode. Actually, I suppose that any Ceph usage of the 16-byte representation should be

RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-11-29 Thread James Harper
Sylvain, Are you still working on this in any way? It's been working great for me but seems to use an excessive amount of memory, like 300MB per process. Is that expected? Thanks James -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-

RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-11-30 Thread James Harper
Hi James, Are you still working on this in any way? Well I'm using it, but I haven't worked on it. I never was able to reproduce any issue with it locally ... In prod, I do run it with cache disabled though since I never took the time to check using the cache was safe in the various

link errors

2013-12-20 Thread James Harper
I'm trying to compile qemu with rbd support, but it throws up a heap of errors like '/usr/local/src/ceph-0.67.4/src/./osd/osd_types.h:2442: undefined reference to `__cxa_throw'' when it tries to test for rbd support. It's breaking when doing static link of a dummy file in configure. Is this

SMART monitoring

2013-12-26 Thread James Harper
What would be the best approach to integrate SMART with ceph, for the predictive failure case? Assuming you agree with SMART diagnosis of an impending failure, would it be better to automatically start migrating data off the OSD (reduce the weight to 0?), or to just prompt the user to replace

ceph and scsi reservation-like locking

2013-12-28 Thread James Harper
Is the rbd locking feature-compatible with scsi3 persistent reservations? To use ceph as a storage backend for a Hyper-V cluster directly (rather than through an iscsi gateway), it looks like I'll need a virtual scsi3 device that supports persistent reservations. Thanks James -- To

documentation error for adding monitor?

2014-01-05 Thread James Harper
When I follow the steps for adding a new monitor at http://ceph.com/docs/master/rados/operations/add-or-rm-mons/, the 'ceph mon add mon-id ip[:port' step always tells me that the monitor already exists. Is this step actually necessary? Thanks James -- To unsubscribe from this list: send the

poor write performance

2013-04-18 Thread James Harper
I'm doing some basic testing so I'm not really fussed about poor performance, but my write performance appears to be so bad I think I'm doing something wrong. Using dd to test gives me kbytes/second for write performance for 4kb block sizes, while read performance is acceptable (for testing at

RE: poor write performance

2013-04-18 Thread James Harper
Where should I start looking for performance problems? I've tried running some of the benchmark stuff in the documentation but I haven't gotten very far... Hi James! Sorry to hear about the performance trouble! Is it just sequential 4KB direct IO writes that are giving you troubles?

RE: poor write performance

2013-04-19 Thread James Harper
Where should I start looking for performance problems? I've tried running some of the benchmark stuff in the documentation but I haven't gotten very far... Hi James! Sorry to hear about the performance trouble! Is it just sequential 4KB direct IO writes that are giving you

RE: poor write performance

2013-04-19 Thread James Harper
I did an strace -c to gather some performance info, if that helps: Oops. Forgot to say that that's an strace -c of the osd process! % time seconds usecs/call callserrors syscall -- --- --- - - 78.13 39.589549

RE: poor write performance

2013-04-19 Thread James Harper
I just tried a 3.8 series kernel and can now get 25mbytes/second using dd with a 4mb block size, instead of the 700kbytes/second I was getting with the debian 3.2 kernel. I'm still getting 120kbytes/second with a dd 4kb block size though... is that expected? James -- To unsubscribe from this

RE: poor write performance

2013-04-19 Thread James Harper
On 04/19/2013 06:09 AM, James Harper wrote: I just tried a 3.8 series kernel and can now get 25mbytes/second using dd with a 4mb block size, instead of the 700kbytes/second I was getting with the debian 3.2 kernel. That's unexpected. Was this the kernel on the client, the OSDs

RE: poor write performance

2013-04-20 Thread James Harper
Hi James, do you VLAN's interfaces configured on your bonding interfaces? Because I saw a similar situation in my setup. No VLAN's on my bonding interface, although extensively used elsewhere. Thanks James

clean shutdown and failover of osd

2013-04-20 Thread James Harper
I'm doing some testing with ceph trying to figure out why my performance is so bad, and have noticed that there doesn't seem to be a way to cleanly stop an osd, or at least under debian /etc/init.d/ceph stop seems to just kill the OSD resulting in the client also stopping io until it figures

RE: clean shutdown and failover of osd

2013-04-20 Thread James Harper
[ This is a good query for ceph-users. ] Well... this is embarrassing. In reading the docs at http://ceph.com/docs/master/start/get-involved/ there was no mention of a users list so I just assumed there wasn't one. Looking again I see that if I go to the link from the main page

RE: poor write performance

2013-04-21 Thread James Harper
Hi, My goal is 4 OSD's, each on separate machines, with 1 drive in each for a start, but I want to see performance of at least the same order of magnitude as the theoretical maximum on my hardware before I think about replacing my existing setup. My current understanding is that it's

RE: poor write performance

2013-04-21 Thread James Harper
On 04/19/2013 08:30 PM, James Harper wrote: rados -p pool -b 4096 bench 300 seq -t 64 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 read got -2 error during benchmark: -5 error 5

RE: poor write performance

2013-04-22 Thread James Harper
Hi, Correct, but that's the theoretical maximum I was referring to. If I calculate that I should be able to get 50MB/second then 30MB/second is acceptable but 500KB/second is not :) I have written a small benchmark for RBD : https://gist.github.com/smunaut/5433222 It uses the

RE: poor write performance

2013-04-22 Thread James Harper
My read speed is consistently around 40MB/second, and my write speed is consistently around 22MB/second. I had expected better of read... You may want to try increasing your read_ahead_kb on the OSD data disks and see if that helps read speeds. Default appears to be 128 and I was

windows rbd

2013-05-17 Thread James Harper
What would be the bare minimum to be implemented for a windows rbd kernel driver? In the Linux kernel I can see a drivers/block/rbd.c, which in turn uses net/ceph/*. There is also fs/ceph/* for the filesystem stuff. Is a windows client even a worthwhile exercise? I know I can use an iscsi

mon crash

2013-06-19 Thread James Harper
Every time I start up one of my mons it crashes. Two others are running but there seems to be long delays (=several seconds) when doing mon status (maybe this is the behaviour when one mon is down?) The tail of /var/log/ceph/ceph-mon.4.log follows this email. Version is 0.61.3-1~bpo70+1 from

dynamically move busy pg's to fast storage

2013-06-19 Thread James Harper
Suppose you had two classes of OSD, one fast (eg SSD's or 15K SAS drives) and the other slow (eg 7200RPM SATA drives). The fast storage is expensive so you might not have so much of it. Rather than try and map whole volumes to the best class of storage (eg fast for databases, slow for user

RE: mon crash

2013-06-19 Thread James Harper
On Wed, 19 Jun 2013, James Harper wrote: Every time I start up one of my mons it crashes. Two others are running but there seems to be long delays (=several seconds) when doing mon status (maybe this is the behaviour when one mon is down?) The tail of /var/log/ceph/ceph-mon.4.log

RE: [ceph-users] v0.80.4 Firefly released

2014-07-16 Thread James Harper
Can you offer some comments on what the impact is likely to be to the data in an affected cluster? Should all data now be treated with suspicion and restored back to before the firefly upgrade? James -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On