Re: RBD fio Performance concerns

2012-11-20 Thread Sébastien Han
@Alexandre: thanks for publishing your results as well :) I also tried with different sizes and no difference. On Tue, Nov 20, 2012 at 8:32 AM, Alexandre DERUMIER aderum...@odiso.com wrote: Which iodepth did you use for those benchs? iodepth = 100 filesize = 1G, 10G, 30G , same result (3

[PATCH] use int64_t for return values from rbd instead of int

2012-11-20 Thread Stefan Priebe
rbd / rados tends to return pretty often length of writes or discarded blocks. These values might be bigger than int. Signed-off-by: Stefan Priebe s.pri...@profihost.ag --- block/rbd.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/block/rbd.c b/block/rbd.c index

Re: librbd discard bug problems - i REALLY got it

2012-11-20 Thread Stefan Priebe - Profihost AG
Hi Josh, the problem was again in the qemu rbd driver. It was using int for several return values instead of int_64. I send 2nd patch to the qemu list and ceph devel. Could you please care that BOTH patches the one from yesterday AND today get into qemu? Greets, Stefan Am 20.11.2012

does still not recommended place rbd device on nodes, where osd daemon located?

2012-11-20 Thread ruslan usifov
Hello Now i can't find link where i read this info (this was old ceph wiki) but there was written that rbd on osd can prevent hung. Does this situation actual for present days? -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to

Re: [PATCH 3/3] ceph: fix dentry reference leak in ceph_encode_fh().

2012-11-20 Thread Alex Elder
On 11/11/2012 02:49 PM, Cyril Roelandt wrote: dput() was not called in the error path. Signed-off-by: Cyril Roelandt tipec...@gmail.com This looks good, thanks a lot. I'll apply it. Reviewed-by: Alex Elder el...@inktank.com --- fs/ceph/export.c |4 +++- 1 file changed, 3

Re: [Qemu-devel] [PATCH] use int64_t for return values from rbd instead of int

2012-11-20 Thread Stefan Hajnoczi
On Tue, Nov 20, 2012 at 01:44:55PM +0100, Stefan Priebe wrote: rbd / rados tends to return pretty often length of writes or discarded blocks. These values might be bigger than int. Signed-off-by: Stefan Priebe s.pri...@profihost.ag --- block/rbd.c |4 ++-- 1 file changed, 2

Re: OSD daemon changes port no

2012-11-20 Thread Sage Weil
On Tue, 20 Nov 2012, hemant surale wrote: Hi Community, I have question about port number used by ceph-osd daemon . I observed traffic (inter -osd communication while data ingest happened) on port 6802 and then after some time when I ingested second file after some delay port no 6804 was

Re: librbd discard bug problems - i REALLY got it

2012-11-20 Thread Josh Durgin
On 11/20/2012 04:46 AM, Stefan Priebe - Profihost AG wrote: Hi Josh, the problem was again in the qemu rbd driver. It was using int for several return values instead of int_64. I send 2nd patch to the qemu list and ceph devel. Could you please care that BOTH patches the one from yesterday AND

Re: Files lost after mds rebuild

2012-11-20 Thread Gregory Farnum
On Tue, Nov 20, 2012 at 1:25 AM, Drunkard Zhang gongfan...@gmail.com wrote: 2012/11/20 Gregory Farnum g...@inktank.com: On Mon, Nov 19, 2012 at 7:55 AM, Drunkard Zhang gongfan...@gmail.com wrote: I created a ceph cluster for test, here's mistake I made: Add a second mds: mds.ab, executed 'ceph

Re: Can't start ceph mon

2012-11-20 Thread Dave (Bob)
Do you have other monitors in working order? The easiest way to handle it if that's the case is just to remove this monitor from the cluster and add it back in as a new monitor with a fresh store. If not we can look into reconstructing it. -Greg Also, if you still have it, could you zip up your

Re: [Qemu-devel] [PATCH] use int64_t for return values from rbd instead of int

2012-11-20 Thread Stefan Priebe
Hi Stefan, Am 20.11.2012 17:29, schrieb Stefan Hajnoczi: On Tue, Nov 20, 2012 at 01:44:55PM +0100, Stefan Priebe wrote: rbd / rados tends to return pretty often length of writes or discarded blocks. These values might be bigger than int. Signed-off-by: Stefan Priebe s.pri...@profihost.ag ---

Hadoop and Ceph client/mds view of modification time

2012-11-20 Thread Noah Watkins
This is a description of the clock synchronization issue we are facing in Hadoop: Components of Hadoop use mtime as a versioning mechanism. Here is an example where Client B tests the expected 'version' of a file created by Client A: Client A: create file, write data into file. Client A:

[PATCH 0/6] rbd: consolidate request operation creation

2012-11-20 Thread Alex Elder
I'm working on getting data structures for every rbd image request initialized in one place, and similarly getting all osd object requests for rbd images initialized at once at well. This series addresses an issue I bumped into while trying to do this, and then goes a little further to implement

[PATCH 3/6] rbd: initialize off and len in rbd_create_rw_op()

2012-11-20 Thread Alex Elder
Move the initialization of a read or write operation's offset, length, and payload length fields into rbd_create_rw_op(). This will actually get removed in the next patch, but it finishes the consolidation of setting these fields at osd op creation time. Signed-off-by: Alex Elder

[PATCH 4/6] rbd: define generalized osd request op routines

2012-11-20 Thread Alex Elder
Create a baseline function to encapsulate the creation of osd requests, along with a matching function to destroy them. For now this just duplicates what rbd_create_rw_op() does for read and write operations, but the next patches will expand on this. Since rbd_create_rw_op() is no longer used

[PATCH 5/6] rbd: move call osd op setup into rbd_osd_req_op_create()

2012-11-20 Thread Alex Elder
Move the initialization of the CEPH_OSD_OP_CALL operation into rbd_osd_req_op_create(). Signed-off-by: Alex Elder el...@inktank.com --- drivers/block/rbd.c | 48 ++-- 1 file changed, 30 insertions(+), 18 deletions(-) diff --git a/drivers/block/rbd.c

[PATCH 6/6] rbd: move remaining osd op setup into rbd_osd_req_op_create()

2012-11-20 Thread Alex Elder
The two remaining osd ops used by rbd are CEPH_OSD_OP_WATCH and CEPH_OSD_OP_NOTIFY_ACK. Move the setup of those operations into rbd_osd_req_op_create(), and get rid of rbd_create_rw_op() and rbd_destroy_op(). Signed-off-by: Alex Elder el...@inktank.com --- drivers/block/rbd.c | 68

[PATCH] rbd: assign watch request more directly

2012-11-20 Thread Alex Elder
Both rbd_req_sync_op() and rbd_do_request() have a linger parameter, which is the address of a pointer that should refer to the osd request structure used to issue a request to an osd. Only one case ever supplies a non-null linger argument: an CEPH_OSD_OP_WATCH start. And in that one case it is

Re: Hadoop and Ceph client/mds view of modification time

2012-11-20 Thread Sam Lang
On 11/20/2012 01:44 PM, Noah Watkins wrote: This is a description of the clock synchronization issue we are facing in Hadoop: Components of Hadoop use mtime as a versioning mechanism. Here is an example where Client B tests the expected 'version' of a file created by Client A: Client A:

Re: rbd map command hangs for 15 minutes during system start up

2012-11-20 Thread Nick Bartos
I reproduced the problem and got several sysrq states captured. During this run, the monitor running on the host complained a few times about the clocks being off, but all messages were for under 0.55 seconds. Here are the kernel logs. Note that there are several traces, I thought multiple

Re: libcephfs create file with layout and replication

2012-11-20 Thread Noah Watkins
On Mon, Nov 19, 2012 at 7:28 PM, Sage Weil s...@inktank.com wrote: We could avoid the whole issue by passing 4 arguments to the function... I pushed a new patch that takes each of the 4 new arguments. wip-client-open-layout Thanks, -Noah -- To unsubscribe from this list: send the line

Re: [PATCH] make mkcephfs and init-ceph osd filesystem handling more flexible

2012-11-20 Thread Sage Weil
If you haven't gotten to this yet, I'll go ahead and jump on it.. let me know! Thanks- sage On Thu, 9 Aug 2012, Danny Kukawka wrote: Remove btrfs specific keys and replace them by more generic keys to be able to replace btrfs with e.g. xfs or ext4 easily. Add new key to define the osd fs

Re: rbd map command hangs for 15 minutes during system start up

2012-11-20 Thread Nick Bartos
Since I now have a decent script which can reproduce this, I decided to re-test with the same 3.5.7 kernel, but just not applying the patches from the wip-3.5 branch. With the patches, I can only go 2 builds before I run into a hang. Without the patches, I have gone 9 consecutive builds (and

Re: Files lost after mds rebuild

2012-11-20 Thread Drunkard Zhang
' 2012/11/21 Gregory Farnum g...@inktank.com: On Tue, Nov 20, 2012 at 1:25 AM, Drunkard Zhang gongfan...@gmail.com wrote: 2012/11/20 Gregory Farnum g...@inktank.com: On Mon, Nov 19, 2012 at 7:55 AM, Drunkard Zhang gongfan...@gmail.com wrote: I created a ceph cluster for test, here's mistake

Re: OSD daemon changes port no

2012-11-20 Thread hemant surale
and one more thing how can it be possible to read from one osd and then simultaneous write to direct on other osd with less/no traffic? I'm not sure I understand the question... Scenario : I have written file X.txt on some osd which is primary for filr X.txt ( direct write operation

Re: OSD daemon changes port no

2012-11-20 Thread Sage Weil
On Wed, 21 Nov 2012, hemant surale wrote: and one more thing how can it be possible to read from one osd and then simultaneous write to direct on other osd with less/no traffic? I'm not sure I understand the question... Scenario : I have written file X.txt on some osd which

Re: Files lost after mds rebuild

2012-11-20 Thread Sage Weil
On Wed, 21 Nov 2012, Drunkard Zhang wrote: 2012/11/21 Gregory Farnum g...@inktank.com: On Tue, Nov 20, 2012 at 1:25 AM, Drunkard Zhang gongfan...@gmail.com wrote: 2012/11/20 Gregory Farnum g...@inktank.com: On Mon, Nov 19, 2012 at 7:55 AM, Drunkard Zhang gongfan...@gmail.com wrote:

Re: OSD daemon changes port no

2012-11-20 Thread hemant surale
Its a little confusing question I believe . Actually there are two files X Y. When I am reading X from its primary .I want to make sure simultaneous writing of Y should go to any other OSD except primary OSD for X (from where my current read is getting served ) . - Hemant Sural.e On Wed, Nov

Re: [Qemu-devel] [PATCH] use int64_t for return values from rbd instead of int

2012-11-20 Thread Stefan Hajnoczi
On Tue, Nov 20, 2012 at 8:16 PM, Stefan Priebe s.pri...@profihost.ag wrote: Hi Stefan, Am 20.11.2012 17:29, schrieb Stefan Hajnoczi: On Tue, Nov 20, 2012 at 01:44:55PM +0100, Stefan Priebe wrote: rbd / rados tends to return pretty often length of writes or discarded blocks. These values

Re: OSD daemon changes port no

2012-11-20 Thread Sage Weil
On Wed, 21 Nov 2012, hemant surale wrote: Its a little confusing question I believe . Actually there are two files X Y. When I am reading X from its primary .I want to make sure simultaneous writing of Y should go to any other OSD except primary OSD for X (from where my current read is

Re: [Qemu-devel] [PATCH] use int64_t for return values from rbd instead of int

2012-11-20 Thread Stefan Priebe - Profihost AG
Am 21.11.2012 07:41, schrieb Stefan Hajnoczi: On Tue, Nov 20, 2012 at 8:16 PM, Stefan Priebe s.pri...@profihost.ag wrote: Hi Stefan, Am 20.11.2012 17:29, schrieb Stefan Hajnoczi: On Tue, Nov 20, 2012 at 01:44:55PM +0100, Stefan Priebe wrote: rbd / rados tends to return pretty often length