Re: [BUG] rbd discard should return OK even if rbd file does not exist

2012-11-19 Thread Stefan Priebe - Profihost AG
Hi Josh, i got the following info from the qemu devs. The discards get canceled by the client kernel as they take TOO long. This happens due to the fact that ceph handle discards as buffered I/O. I see that there are max pending 800 requests. And rbd returns success first when there are no

Re: [BUG] rbd discard should return OK even if rbd file does not exist

2012-11-19 Thread Stefan Priebe - Profihost AG
sorry meant the building in this case. The building of 900 requests takes too long. So the kernel starts to cancel these I/O requests. void AioCompletion::finish_adding_requests(CephContext *cct) { ldout(cct, 20) AioCompletion::finish_adding_requests (void*)this pending

Re: [BUG] rbd discard should return OK even if rbd file does not exist

2012-11-19 Thread Stefan Priebe - Profihost AG
Hi Josh, sorry for the bunch of mails. It turns out not to be a bug in RBD or ceph but a bug in the linux kernel itself. Paolo from qemu told me the linux kernel should serialize these requests instead of sending the whole bunch and then hoping that all of them get's handling in miliseconds.

Re: [BUG] rbd discard should return OK even if rbd file does not exist

2012-11-19 Thread Stefan Priebe - Profihost AG
But strange enough this works fine with normal iscsi target... no idea why. Stefan Am 19.11.2012 11:15, schrieb Stefan Priebe - Profihost AG: Hi Josh, sorry for the bunch of mails. It turns out not to be a bug in RBD or ceph but a bug in the linux kernel itself. Paolo from qemu told me the

Re: rbd tool changed format? (breaks compatibility)

2012-11-19 Thread Constantinos Venetsanopoulos
On 11/16/2012 07:14 PM, Josh Durgin wrote: On 11/16/2012 06:36 AM, Constantinos Venetsanopoulos wrote: Hello ceph team, As you may already know, our team in GRNET is building a complete open source cloud platform called Synnefo [1], which already powers our production public cloud service

Re: RBD fio Performance concerns

2012-11-19 Thread Sébastien Han
Hello Mark, First of all, thank you again for another accurate answer :-). I would have expected write aggregation and cylinder affinity to have eliminated some seeks and improved rotational latency resulting in better than theoretical random write throughput. Against those expectations

Re: RBD fio Performance concerns

2012-11-19 Thread Sébastien Han
If I remember, you use fio with 4MB block size for sequential. So it's normal that you have less ios, but more bandwith. That's correct for some of the benchmarks. However even with 4K for seq, I still get less IOPS. See below my last fio: # fio rbd-bench.fio seq-read: (g=0): rw=read,

Re: RBD fio Performance concerns

2012-11-19 Thread Sage Weil
On Mon, 19 Nov 2012, S?bastien Han wrote: If I remember, you use fio with 4MB block size for sequential. So it's normal that you have less ios, but more bandwith. That's correct for some of the benchmarks. However even with 4K for seq, I still get less IOPS. See below my last fio: Small

Re: RBD fio Performance concerns

2012-11-19 Thread Mark Kampe
Recall: 1. RBD volumes are striped (4M wide) across RADOS objects 2. distinct writes to a single RADOS object are serialized Your sequential 4K writes are direct, depth=256, so there are (at all times) 256 writes queued to the same object. All of your writes are waiting through a very

Re: Many dns domain names in radosgw

2012-11-19 Thread Yehuda Sadeh
On Sat, Nov 17, 2012 at 1:50 PM, Sławomir Skowron szi...@gmail.com wrote: Welcome, I have a question. Is there, any way to support multiple domains names in one radosgw on virtual host type connection in S3 ?? Are you aiming at having multiple virtual domain names pointing at the same bucket?

Re: RBD fio Performance concerns

2012-11-19 Thread Sébastien Han
@Sage, thanks for the info :) @Mark: If you want to do sequential I/O, you should do it buffered (so that the writes can be aggregated) or with a 4M block size (very efficient and avoiding object serialization). The original benchmark has been performed with 4M block size. And as you can see

Re: Many dns domain names in radosgw

2012-11-19 Thread Sławomir Skowron
Yes. I am looking for using domain x.com, and y.com with virtual host buckets like b.x.com, c.y.com But if it's not possible i can handle this with cname *.x.com and use only b and c on x.com domain. Thanks for response. 19 lis 2012 19:02, Yehuda Sadeh yeh...@inktank.com napisał(a): On Sat,

Remote Ceph Install

2012-11-19 Thread Blackwell, Edward
Hi, I work for Harris Corporation, and we are investigating Ceph as a potential solution to a storage problem that one of our government customers is currently having. I've already created a two-node cluster on a couple of VMs with another VM acting as an administrative client. The cluster

Re: [PATCH] rbd: get rid of rbd_{get,put}_dev()

2012-11-19 Thread Dan Mick
Reviewed-by: Dan Mick dan.m...@inktank.com On 11/16/2012 07:43 AM, Alex Elder wrote: The functions rbd_get_dev() and rbd_put_dev() are trivial wrappers that add no values, and their existence suggests they may do more than what they do. Get rid of them. Signed-off-by: Alex Elder

[PATCH] rbd block driver fix race between aio completition and aio cancel

2012-11-19 Thread Stefan Priebe
From: Stefan Priebe s.pri...@profhost.ag This one fixes a race qemu also had in iscsi block driver between cancellation and io completition. qemu_rbd_aio_cancel was not synchronously waiting for the end of the command. It also removes the useless cancelled flag and introduces instead a status

[no subject]

2012-11-19 Thread Stefan Priebe
From Stefan Priebe s.pri...@profihost.ag # This line is ignored. From: Stefan Priebe s.pri...@profihost.ag Cc: pve-de...@pve.proxmox.com Cc: pbonz...@redhat.com Cc: ceph-devel@vger.kernel.org Subject: QEMU/PATCH: rbd block driver: fix race between completition and cancel In-Reply-To:

Re: RBD fio Performance concerns

2012-11-19 Thread Sébastien Han
Which iodepth did you use for those benchs? I really don't understand why I can't get more rand read iops with 4K block ... Me neither, hope to get some clarification from the Inktank guys. It doesn't make any sense to me... -- Bien cordialement. Sébastien HAN. On Mon, Nov 19, 2012 at 8:11

Re: RBD fio Performance concerns

2012-11-19 Thread Sébastien Han
Hello Mark, See below my benchmarks results: -RADOS Bench with 4M block size write: # rados -p bench bench 300 write -t 32 --no-cleanup Maintaining 32 concurrent writes of 4194304 bytes for at least 300 seconds. 2012-11-19 21:35:01.722143min lat: 0.255396 max lat: 8.40212 avg lat: 1.14076

Can't start ceph mon

2012-11-19 Thread Dave Humphreys (Datatone)
I have a problem in which I can't start my ceph monitor. The log is shown below. The log shows version 0.54. I was running 0.52 when the problem arose, and I moved to the latest in case the newer version fixed the problem. The original failure happened a week or so ago, and could have been as

Cannot Start Ceph Mon

2012-11-19 Thread Dave Humphreys (Datatone)
(Apologies if this is seen to be a repeat posting: I think that the last attempt fell into the void). I can't start my ceph monitor. The log is below. Though this shows version 0.54, the problem arose whilst using 0.52. Something may have become corrupted when the disk space ran out due to an

librbd discard bug problems - i got it

2012-11-19 Thread Stefan Priebe
Hello Josh, after digging three days around i got it. The problem is in aio_discard in internal.cc. The i/o fails when AioZero or AioTruncate is used. It works fine with AioRemove. It seems to depend on overlapping. Hopefully i'm able to provide a patch this nicht. Greets, Stefan -- To

Can't Start Ceph Mon

2012-11-19 Thread Dave Humphreys (Bob)
I can't start my ceph monitor, the log is attached below. Whilst the log shows 0.54, the problem arose with 0.52, and may have been caused when disk space ran out as a result of a huge set of ceph log files. Is there a way to recover? Ragards, David bash-4.1# cat

Re: Can't start ceph mon

2012-11-19 Thread Gregory Farnum
On Mon, Nov 19, 2012 at 1:08 PM, Dave Humphreys (Datatone) d...@datatone.co.uk wrote: I have a problem in which I can't start my ceph monitor. The log is shown below. The log shows version 0.54. I was running 0.52 when the problem arose, and I moved to the latest in case the newer version

Re: Files lost after mds rebuild

2012-11-19 Thread Gregory Farnum
On Mon, Nov 19, 2012 at 7:55 AM, Drunkard Zhang gongfan...@gmail.com wrote: I created a ceph cluster for test, here's mistake I made: Add a second mds: mds.ab, executed 'ceph mds set_max_mds 2', then removed the mds just added; Then 'ceph mds set_max_mds 1', the first mds.aa crashed, and

Re: Is the disk on MDS used for journal?

2012-11-19 Thread Gregory Farnum
On Sun, Nov 18, 2012 at 7:14 PM, liu yaqi liuyaqiy...@gmail.com wrote: Is the disk on MDS used for journal? Does it has some other use? The MDS doesn't make any use of local disk space — it stores everything in RADOS. You need enough local disk to provide a configuration file, keyring, and debug

Re: OSD network failure

2012-11-19 Thread Gregory Farnum
On Fri, Nov 16, 2012 at 5:56 PM, Josh Durgin josh.dur...@inktank.com wrote: On 11/15/2012 01:51 AM, Gandalf Corvotempesta wrote: 2012/11/15 Josh Durgin josh.dur...@inktank.com: So basically you'd only need a single nic per storage node. Multiple can be useful to separate frontend and backend

Unused doc/images/.jpg files

2012-11-19 Thread Snider, Tim
Hi - There are several jpg files in the doc/images directory of the tarball that don't seem to be used in the html files or man pages after docs are built. If they are used somewhere - where is that what am I missing? Some of the .png files are used. root@84Server:~/ceph-ceph-fd4b839# ls

Re: deprecating mkcephfs (the arrival of light-weight deployment tools)

2012-11-19 Thread Sage Weil
On Mon, 19 Nov 2012, Isaac Otsiabah wrote: I am trying to understand ceph deployment direction because from this link http://ceph.com/docs/master/rados/deployment/ it is mentioned that mkcephfs is dreprecated. It also has the statement below which mentions light-weight deployment scripts

Re: some snapshot problems

2012-11-19 Thread Gregory Farnum
On Sun, Nov 11, 2012 at 11:02 PM, liu yaqi liuyaqiy...@gmail.com wrote: 2012/11/9 Sage Weil s...@inktank.com Lots of different snapshots: - librados lets you do 'selfmanaged snaps' in its API, which let an application control which snapshots apply to which objects. - you can create a

Re: rbd map command hangs for 15 minutes during system start up

2012-11-19 Thread Nick Bartos
Making 'mon clock drift allowed' very small (0.1) does not reliably reproduce the hang. I started looking at the code for 0.48.2 and it looks like this is only used in Paxos::warn_on_future_time, which only handles the warning, nothing else. On Fri, Nov 16, 2012 at 2:21 PM, Sage Weil

Re: librbd discard bug problems - i got it

2012-11-19 Thread Stefan Priebe
mhm qemu rbd block driver. Get's always these errors back. As rbd_aio_bh_cb is directly called from librbd the problem must be there. Strangely i can't find where rbd_aio_bh_cb get's called with -512. ANy further ideas? rbd_aio_bh_cb got error back. Code: -512 Error: 0 rbd_aio_bh_cb got error

Re: osd recovery extremely slow with current master

2012-11-19 Thread Gregory Farnum
Which version was this on? There was some fairly significant work to recovery done to introduce a reservation scheme and some other stuff that might need some different defaults. -Greg On Tue, Nov 13, 2012 at 12:33 PM, Stefan Priebe s.pri...@profihost.ag wrote: Hi list, osd recovery seems to

objectcacher lru eviction causes assert

2012-11-19 Thread Sam Lang
Hi All, We've been fixing a number of objectcacher bugs to handle races between slow osd commit replies and various other operations like truncate. I ran into another problem earlier today with a race between an object getting evicted from the lru cache (via readx - trim) and the osd

Re: Removed directory is back in the Ceph FS

2012-11-19 Thread Gregory Farnum
On Tue, Nov 13, 2012 at 3:23 AM, Franck Marchand fmarch...@agaetis.fr wrote: Hi, I have a weird pb. I remove a folder using a mounted fs partition. I did it and it worked well. What client are you using? How did you delete it? (rm -rf, etc?) Are you using multiple clients or one, and did you

Re: librbd discard bug problems - i got it

2012-11-19 Thread Josh Durgin
On 11/19/2012 03:16 PM, Stefan Priebe wrote: mhm qemu rbd block driver. Get's always these errors back. As rbd_aio_bh_cb is directly called from librbd the problem must be there. Strangely i can't find where rbd_aio_bh_cb get's called with -512. ANy further ideas? Two ideas: 1) Is

Re: ceph-osd crashing (os/FileStore.cc: 4500: FAILED assert(replaying))

2012-11-19 Thread Stefan Priebe
Am 20.11.2012 00:39, schrieb Samuel Just: Seems to be a truncated log file... That usually indicates filesystem corruption. Anything in dmesg? -Sam No. Everything fine. On Thu, Nov 15, 2012 at 1:07 PM, Stefan Priebe s.pri...@profihost.ag wrote: Hello list, actual master incl.

Re: librbd discard bug problems - i got it

2012-11-19 Thread Stefan Priebe
Am 20.11.2012 00:33, schrieb Josh Durgin: On 11/19/2012 03:16 PM, Stefan Priebe wrote: mhm qemu rbd block driver. Get's always these errors back. As rbd_aio_bh_cb is directly called from librbd the problem must be there. Strangely i can't find where rbd_aio_bh_cb get's called with -512. ANy

[PATCH, v2] rbd: do not allow remove of mounted-on image

2012-11-19 Thread Alex Elder
There is no check in rbd_remove() to see if anybody holds open the image being removed. That's not cool. Add a simple open count that goes up and down with opens and closes (releases) of the device, and don't allow an rbd image to be removed if the count is non-zero. Protect the updates of the

Re: ceph-osd crashing (os/FileStore.cc: 4500: FAILED assert(replaying))

2012-11-19 Thread Samuel Just
Can you restart one of the affected osds with debug osd = 20, debug filestore = 20, debug ms = 1 and post the log? -Sam On Mon, Nov 19, 2012 at 3:39 PM, Stefan Priebe s.pri...@profihost.ag wrote: Am 20.11.2012 00:39, schrieb Samuel Just: Seems to be a truncated log file... That usually

Re: ceph-osd crashing (os/FileStore.cc: 4500: FAILED assert(replaying))

2012-11-19 Thread Stefan Priebe
I've formatted the cluster since then. But i'll report back if this happens again. Stefan Am 20.11.2012 00:43, schrieb Samuel Just: Can you restart one of the affected osds with debug osd = 20, debug filestore = 20, debug ms = 1 and post the log? -Sam On Mon, Nov 19, 2012 at 3:39 PM, Stefan

Re: librbd discard bug problems - i got it

2012-11-19 Thread Josh Durgin
On 11/19/2012 03:42 PM, Stefan Priebe wrote: Am 20.11.2012 00:33, schrieb Josh Durgin: On 11/19/2012 03:16 PM, Stefan Priebe wrote: mhm qemu rbd block driver. Get's always these errors back. As rbd_aio_bh_cb is directly called from librbd the problem must be there. Strangely i can't find where

Re: librbd discard bug problems - i got it

2012-11-19 Thread Stefan Priebe
Hi Josh, i don't get it. Every debug line i print is a prositive fine value. BUt rbd_aio_bh_cb get's called with these values. As you can see that are not much values i copied all values 0 from log for discarding a whole 30GB device. Stefan Am 20.11.2012 00:47, schrieb Josh Durgin: On

Re: librbd discard bug problems - i got it

2012-11-19 Thread Josh Durgin
On 11/19/2012 04:00 PM, Stefan Priebe wrote: Hi Josh, i don't get it. Every debug line i print is a prositive fine value. BUt rbd_aio_bh_cb get's called with these values. As you can see that are not much values i copied all values 0 from log for discarding a whole 30GB device. Could you

Re: libcephfs create file with layout and replication

2012-11-19 Thread Gregory Farnum
On Sun, Nov 18, 2012 at 12:05 PM, Noah Watkins jayh...@cs.ucsc.edu wrote: Wanna have a look at a first pass on this patch? wip-client-open-layout Thanks, Noah Just glanced over this, and I'm curious: 1) Why symlink another reference to your file_layout.h? 2) There's already a

Re: Can't start ceph mon

2012-11-19 Thread Gregory Farnum
Also, if you still have it, could you zip up your monitor data directory and put it somewhere accessible to us? (I can provide you a drop point if necessary.) We'd like to look at the file layouts a bit since we thought we were properly handling ENOSPC-style issues. -Greg On Mon, Nov 19, 2012 at

Request to join mailing group

2012-11-19 Thread Pat Beadles
-- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: libcephfs create file with layout and replication

2012-11-19 Thread Noah Watkins
On Mon, Nov 19, 2012 at 5:04 PM, Gregory Farnum g...@inktank.com wrote: Just glanced over this, and I'm curious: 1) Why symlink another reference to your file_layout.h? I followed the same pattern as page.h in librados, but may have misunderstood its use. When libcephfs.h is installed, it

Re: libcephfs create file with layout and replication

2012-11-19 Thread Sage Weil
On Mon, 19 Nov 2012, Noah Watkins wrote: On Mon, Nov 19, 2012 at 5:04 PM, Gregory Farnum g...@inktank.com wrote: Just glanced over this, and I'm curious: 1) Why symlink another reference to your file_layout.h? I followed the same pattern as page.h in librados, but may have misunderstood

Re: Remote Ceph Install

2012-11-19 Thread Dan Mick
On 11/19/2012 11:42 AM, Blackwell, Edward wrote: Hi, I work for Harris Corporation, and we are investigating Ceph as a potential solution to a storage problem that one of our government customers is currently having. I've already created a two-node cluster on a couple of VMs with another

Re: RBD fio Performance concerns

2012-11-19 Thread Alexandre DERUMIER
Which iodepth did you use for those benchs? iodepth = 100 filesize = 1G, 10G, 30G , same result (3 nodes,8 cores 2,5GHZ,32GB ram, with 6 osd each (15k drive) + journal on tmpfs) Note that I can't get more than 6000 iops on a rbd device, but with more devices it's scale. (each fio is at