I'm about to start trying this out. Has anything changed since this email
http://www.mail-archive.com/ceph-devel@vger.kernel.org/msg13984.html ?
Thanks
James
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More
I want to mount ceph fs (using fuse) but /etc/fstab treats it as a local
filesystem and so tries to mount it before ceph is started, or indeed before
the network is even up.
Also, ceph tries to start before the network is up and fails because it can't
bind to an address. I think this is
I'm running ceph 0.61.7-1~bpo70+1 and I think there is a bug in /etc/init.d/ceph
The heartbeat RA expects that the init.d script will return 3 for not
running, but if there is no agent (eg mds) defined for that host it will
return 0 instead, so pacemaker thinks the agent is running on a node
Yes the procedure didn't change.
If you're on debian I could also sent your prebuilt .deb for blktap
and for a patched xen version that includes userspace RBD support.
It's working great so far. I just pulled the source and built it then copied
blktap in.
For some reason I already had
For some reason I already had a tapdisk in /usr/sbin, as well as the one in
/usr/bin, which confused the issue for a while. I must have installed
something manually but I don't remember what.
What distribution are you using ?
Debian Wheezy
James
--
To unsubscribe from this list:
I want to mount ceph fs (using fuse) but /etc/fstab treats it as a local
filesystem and so tries to mount it before ceph is started, or indeed before
the network is even up.
Also, ceph tries to start before the network is up and fails because it can't
bind to an address. I think this is
I'm running ceph 0.61.7-1~bpo70+1 and I think there is a bug in
/etc/init.d/ceph
The heartbeat RA expects that the init.d script will return 3 for not
running,
but if there is no agent (eg mds) defined for that host it will return 0
instead,
so pacemaker thinks the agent is running on
Hi James,
Here is a somewhat simpler patch; does this work for you? Note that if
you something like /etc/init.d/ceph status osd.123 where osd.123 isn't in
ceph.conf then you get a status 1 instead of 3. But for the
/etc/init.d/ceph status mds (or osd or mon) case where there are no
On Wed, 7 Aug 2013, James Harper wrote:
Hi James,
Here is a somewhat simpler patch; does this work for you? Note that if
you something like /etc/init.d/ceph status osd.123 where osd.123 isn't in
ceph.conf then you get a status 1 instead of 3. But for the
/etc/init.d/ceph
Yes the procedure didn't change.
If you're on debian I could also sent your prebuilt .deb for blktap
and for a patched xen version that includes userspace RBD support.
If you have any issue, I can be found on ceph's IRC under 'tnt' nick.
I've had a few occasions where tapdisk has
I haven't tried your patch yet, but can it ever return 0? It seems to
set it to 3 initially, and then change it to 1 if it finds an error. I
can't see that it ever sets it to 0 indicating that daemons are running.
Easy enough to fix by setting the EXIT_STATUS=0 after the check of
But I think this still won't have the desired outcome if you have 2 OSD's.
The possible situations if the resource is supposed to be running are:
. Both running = all good, pacemaker will do nothing
. Both stopped = all good, pacemaker will start the services
. One stopped one running =
Hi,
I've had a few occasions where tapdisk has segfaulted:
tapdisk[9180]: segfault at 7f7e3a5c8c10 ip 7f7e387532d4 sp
7f7e3a5c8c10 error 4 in libpthread-2.13.so[7f7e38748000+17000]
tapdisk:9180 blocked for more than 120 seconds.
tapdisk D 88043fc13540 0
tapdisk[9180]: segfault at 7f7e3a5c8c10 ip 7f7e387532d4 sp
7f7e3a5c8c10 error 4 in libpthread-2.13.so[7f7e38748000+17000]
tapdisk:9180 blocked for more than 120 seconds.
tapdisk D 88043fc13540 0 9180 1 0x
You can try generating a core file by
Here's an (untested yet) patch in the rbd error path:
diff --git a/drivers/block-rbd.c b/drivers/block-rbd.c
index 68fbed7..ab2d2c5 100644
--- a/drivers/block-rbd.c
+++ b/drivers/block-rbd.c
@@ -560,6 +560,9 @@ err:
if (c)
rbd_aio_release(c);
+
FWIW, I can confirm via printf's that this error path is never hit in at
least
some of the crashes I'm seeing.
Ok thanks.
Are you using cache btw ?
I hope not. How could I tell? It's not something I've explicitly enabled.
Thanks
James
--
To unsubscribe from this list: send the
-company.com]
Sent: Tuesday, 13 August 2013 7:20 PM
To: James Harper
Cc: Pasi Kärkkäinen; ceph-devel@vger.kernel.org; xen-de...@lists.xen.org
Subject: Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to
test ? :p
Hi,
I hope not. How could I tell? It's not something I've explicitly
I think I have a separate problem too - tapdisk will segfault almost
immediately upon starting but seemingly only for Linux PV DomU's. Once it has
started doing this I have to wait a few hours to a day before it starts working
again. My Windows DomU's appear to be able to start normally though.
On Wed, Aug 14, 2013 at 1:39 AM, James Harper
james.har...@bendigoit.com.au wrote:
I think I have a separate problem too - tapdisk will segfault almost
immediately upon starting but seemingly only for Linux PV DomU's. Once it
has started doing this I have to wait a few hours to a day
On Wed, Aug 14, 2013 at 1:39 AM, James Harper
james.har...@bendigoit.com.au wrote:
I think I have a separate problem too - tapdisk will segfault almost
immediately upon starting but seemingly only for Linux PV DomU's. Once it
has started doing this I have to wait a few hours
Hi,
I just tested with tap2:aio and that worked (had an old image of the VM on
lvm still so just tested with that). Switching back to rbd and it crashes
every
time, just as postgres is starting in the vm. Booting into single user mode,
waiting 30 seconds, then letting the boot continue
Hi,
I just tested with tap2:aio and that worked (had an old image of the VM
on
lvm still so just tested with that). Switching back to rbd and it crashes
every
time, just as postgres is starting in the vm. Booting into single user mode,
waiting 30 seconds, then letting the boot
I just had a crash since upgrading to dumpling, and will disable merging
tonight.
Still crashes with merging disabled.
James
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at
On Fri, 16 Aug 2013, James Harper wrote:
I'm testing out the tapdisk rbd that Sylvain wrote under Xen, and have
been having all sorts of problems as the tapdisk process is segfaulting. To
make matters worse, any attempt to use gdb on the resulting core just tells
me it can't find
Of course, the old standby is to just crank up the logging detail and try
to narrow down where the crash happens. Have you tried that yet?
I haven't touched the rbd code. Is increased logging a compile-time
option or a config option?
That is probably the first you should try then.
I'm also testing valgrind at the moment, just basic memtest, but suddenly
everything is quite stable even though it's under reasonable load right now.
Stupid heisenbugs.
Valgrind makes things go very slow (~10x?), which can have a huge effect
on timing. Sometimes that reveals new races,
I can now reliably reproduce this with fio (see config following email), but
never under valgrind so far.
James
[global]
directory=/tmp/fio
size=128M
ioengine=libaio
[randwrite1]
rw=randwrite
iodepth=32
[randread1]
rw=randread
iodepth=32
[randwrite2]
rw=randwrite
iodepth=32
[randread2]
We've made another point release for Cuttlefish. This release contains a
number of fixes that are generally not individually critical, but do trip
up users from time to time, are non-intrusive, and have held up under
testing.
Notable changes include:
* librados: fix async aio
On Mon, 19 Aug 2013, James Harper wrote:
We've made another point release for Cuttlefish. This release contains a
number of fixes that are generally not individually critical, but do trip
up users from time to time, are non-intrusive, and have held up under
testing.
Notable
In internal.cc function aio_write, the ldout(cct, ...) statement writes out
buf instead of buf. I assume this is not intentional.
James
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at
Still having crashes with the rbd module for blktap. I think I can't get
consistent debugging info out of librbd. When it writes to a file the logging
is buffered so the tail is always missing. When it logs to syslog I thought I
was getting everything but now I'm not so sure.
What is the best
I have set the logfile to be opened with SYNC and that seems to be giving me
more consistent output
I see the crash is mostly happening around queue_aio_write. Most of the time
the last thing I see is this entry librados: queue_aio_write 0x7f0928004390
completion 0x1ea65d0 write_seq 147. I've
I finally got a valgrind memtest hit... output attached below email. I
recompiled all of tapdisk and ceph without any -O options (thought I had
already...) and it seems to have done the trick
Basically it looks like an instance of AioRead is being accessed after being
free'd. I need some hints
On Fri, 30 Aug 2013, James Harper wrote:
I finally got a valgrind memtest hit... output attached below email. I
recompiled all of tapdisk and ceph without any -O options (thought I had
already...) and it seems to have done the trick
What version is this? The line numbers don't seem
I'm still getting crashes with tapdisk rbd. Most of the time it crashes gdb if
I try. When I do get something, the crashing thread is always segfaulting in
pthread_cond_wait and the stack is always corrupt:
(gdb) bt
#0 0x7faae20c52d7 in pthread_cond_wait@@GLIBC_2.3.2 () from
There isn't a simple magic string I can point to except for struct
ceph_msg_header, but I doubt that will help, since it is reading the
headers and message bodies into different buffers.
Ok thanks.
I forget: are you able
to reproduce any of this with debugging enabled? I would suggest
You might also want to just look at read_message, connect, and accept in
Pipe.cc as I think those are the only places where data is read off the
network into a buffer/struct on the stack.
After adding the following to the [client] section of the config file the
problem seems to have gone
Good day developers!
I would like to propose to the one interested work with me to develop a
ceph cliente for MS windows world, Basing us on dokanFS.
I've looked at porting the rbd client to windows a little while back. That
would require a kernel driver and all the rbd stuff is C++
Just out of curiosity (recent thread about windows port) I just had a quick go
at compiling librados under mingw (win32 cross compile), and one of the errors
that popped up was the lack of libuuid under mingw. Ceph appears to use
libuuid, but I notice boost appears to include a uuid class too,
utime.h defines a utime_t class with a gmtime() method, and also calls the
library function gmtime_r().
mingw implements gmtime_r() as a macro in pthread.h that in turn calls
gmtime(), and gcc bails because it gets confused about which is being called:
utime.h: In member function 'utime_t
On Sat, 9 Nov 2013, James Harper wrote:
Just out of curiosity (recent thread about windows port) I just had a
quick go at compiling librados under mingw (win32 cross compile), and
one of the errors that popped up was the lack of libuuid under mingw.
Ceph appears to use libuuid, but I
Hi James,
I just wanted to follow up on this thread. I'd like to bring this patch into
the
wip-port portability branch. Were you able to get the boost::uuid to work as
a drop-in replacement?
I have it compiling but haven't tested. I'll send through what I have.
James
--
To unsubscribe
Patch follows. When I wrote it I was just thinking it would be used for win32
build, hence the #ifdef. As I said before, it compiles but I haven't tested it.
I can clean it up a bit and resend it with a signed-off-by if anyone wants to
pick it up and follow it through sooner than I can. I don't
Does librbd actually need leveldb?
And if so, does it need it in all cases?
I'm trying to figure out what dependencies are required for librbd for win32,
and leveldb doesn't build cleanly at first go.
Thanks
James
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the
What is an adminsocket used for? Would librbd use one in normal operation?
Thanks
James
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/22/2013 06:42 PM, James Harper wrote:
What is an adminsocket used for? Would librbd use one in normal
operation?
It's a way to send administrative and informational commands directly
to a Ceph entity (usually a daemon, but sometimes a client). Almost
all the ceph entities create
Does this list block attachments? I sent this through yesterday with the patch
as an attachment but it doesn't appear to have arrived.
With the attached patch I can use mingw32 on Debian, with a few extra bits
(boost, cryptopp, libatomic-ops), and build librados.dll, librbd.dll, and
James,
I'm using uuid.begin()/end() to grab the 16-byte representation of the UUID.
Did you figure out how to populate a boost::uuid_t from the bytes? In
particular, I'm referring to FileJournal::decode.
Actually, I suppose that any Ceph usage of the 16-byte representation should
be
Sylvain,
Are you still working on this in any way?
It's been working great for me but seems to use an excessive amount of memory,
like 300MB per process. Is that expected?
Thanks
James
-Original Message-
From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
Hi James,
Are you still working on this in any way?
Well I'm using it, but I haven't worked on it. I never was able to
reproduce any issue with it locally ...
In prod, I do run it with cache disabled though since I never took the
time to check using the cache was safe in the various
I'm trying to compile qemu with rbd support, but it throws up a heap of errors
like '/usr/local/src/ceph-0.67.4/src/./osd/osd_types.h:2442: undefined
reference to `__cxa_throw'' when it tries to test for rbd support.
It's breaking when doing static link of a dummy file in configure. Is this
What would be the best approach to integrate SMART with ceph, for the
predictive failure case?
Assuming you agree with SMART diagnosis of an impending failure, would it be
better to automatically start migrating data off the OSD (reduce the weight to
0?), or to just prompt the user to replace
Is the rbd locking feature-compatible with scsi3 persistent reservations?
To use ceph as a storage backend for a Hyper-V cluster directly (rather than
through an iscsi gateway), it looks like I'll need a virtual scsi3 device that
supports persistent reservations.
Thanks
James
--
To
When I follow the steps for adding a new monitor at
http://ceph.com/docs/master/rados/operations/add-or-rm-mons/, the 'ceph mon add
mon-id ip[:port' step always tells me that the monitor already exists. Is
this step actually necessary?
Thanks
James
--
To unsubscribe from this list: send the
I'm doing some basic testing so I'm not really fussed about poor performance,
but my write performance appears to be so bad I think I'm doing something wrong.
Using dd to test gives me kbytes/second for write performance for 4kb block
sizes, while read performance is acceptable (for testing at
Where should I start looking for performance problems? I've tried running
some of the benchmark stuff in the documentation but I haven't gotten very
far...
Hi James! Sorry to hear about the performance trouble! Is it just
sequential 4KB direct IO writes that are giving you troubles?
Where should I start looking for performance problems? I've tried
running
some of the benchmark stuff in the documentation but I haven't gotten
very
far...
Hi James! Sorry to hear about the performance trouble! Is it just
sequential 4KB direct IO writes that are giving you
I did an strace -c to gather some performance info, if that helps:
Oops. Forgot to say that that's an strace -c of the osd process!
% time seconds usecs/call callserrors syscall
-- --- --- - -
78.13 39.589549
I just tried a 3.8 series kernel and can now get 25mbytes/second using dd with
a 4mb block size, instead of the 700kbytes/second I was getting with the debian
3.2 kernel.
I'm still getting 120kbytes/second with a dd 4kb block size though... is that
expected?
James
--
To unsubscribe from this
On 04/19/2013 06:09 AM, James Harper wrote:
I just tried a 3.8 series kernel and can now get 25mbytes/second using dd
with a 4mb block size, instead of the 700kbytes/second I was getting with the
debian 3.2 kernel.
That's unexpected. Was this the kernel on the client, the OSDs
Hi James,
do you VLAN's interfaces configured on your bonding interfaces? Because
I saw a similar situation in my setup.
No VLAN's on my bonding interface, although extensively used elsewhere.
Thanks
James
I'm doing some testing with ceph trying to figure out why my performance is so
bad, and have noticed that there doesn't seem to be a way to cleanly stop an
osd, or at least under debian /etc/init.d/ceph stop seems to just kill the OSD
resulting in the client also stopping io until it figures
[ This is a good query for ceph-users. ]
Well... this is embarrassing. In reading the docs at
http://ceph.com/docs/master/start/get-involved/ there was no mention of a users
list so I just assumed there wasn't one. Looking again I see that if I go to
the link from the main page
Hi,
My goal is 4 OSD's, each on separate machines, with 1 drive in each for a
start, but I want to see performance of at least the same order of magnitude
as the theoretical maximum on my hardware before I think about replacing
my existing setup.
My current understanding is that it's
On 04/19/2013 08:30 PM, James Harper wrote:
rados -p pool -b 4096 bench 300 seq -t 64
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
0 0 0 0 0 0 - 0
read got -2
error during benchmark: -5
error 5
Hi,
Correct, but that's the theoretical maximum I was referring to. If I
calculate
that I should be able to get 50MB/second then 30MB/second is acceptable
but 500KB/second is not :)
I have written a small benchmark for RBD :
https://gist.github.com/smunaut/5433222
It uses the
My read speed is consistently around 40MB/second, and my write speed is
consistently around 22MB/second. I had expected better of read...
You may want to try increasing your read_ahead_kb on the OSD data disks
and see if that helps read speeds.
Default appears to be 128 and I was
What would be the bare minimum to be implemented for a windows rbd kernel
driver? In the Linux kernel I can see a drivers/block/rbd.c, which in turn uses
net/ceph/*. There is also fs/ceph/* for the filesystem stuff.
Is a windows client even a worthwhile exercise? I know I can use an iscsi
Every time I start up one of my mons it crashes. Two others are running but
there seems to be long delays (=several seconds) when doing mon status (maybe
this is the behaviour when one mon is down?)
The tail of /var/log/ceph/ceph-mon.4.log follows this email.
Version is 0.61.3-1~bpo70+1 from
Suppose you had two classes of OSD, one fast (eg SSD's or 15K SAS drives) and
the other slow (eg 7200RPM SATA drives). The fast storage is expensive so you
might not have so much of it. Rather than try and map whole volumes to the best
class of storage (eg fast for databases, slow for user
On Wed, 19 Jun 2013, James Harper wrote:
Every time I start up one of my mons it crashes. Two others are running
but there seems to be long delays (=several seconds) when doing mon
status (maybe this is the behaviour when one mon is down?)
The tail of /var/log/ceph/ceph-mon.4.log
Can you offer some comments on what the impact is likely to be to the data in
an affected cluster? Should all data now be treated with suspicion and restored
back to before the firefly upgrade?
James
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
72 matches
Mail list logo