Hi Josh,
i got the following info from the qemu devs.
The discards get canceled by the client kernel as they take TOO long.
This happens due to the fact that ceph handle discards as buffered I/O.
I see that there are max pending 800 requests. And rbd returns success
first when there are no
sorry meant the building in this case. The building of 900 requests
takes too long. So the kernel starts to cancel these I/O requests.
void AioCompletion::finish_adding_requests(CephContext *cct)
{
ldout(cct, 20) AioCompletion::finish_adding_requests
(void*)this pending
Hi Josh,
sorry for the bunch of mails.
It turns out not to be a bug in RBD or ceph but a bug in the linux
kernel itself. Paolo from qemu told me the linux kernel should serialize
these requests instead of sending the whole bunch and then hoping that
all of them get's handling in miliseconds.
But strange enough this works fine with normal iscsi target... no idea why.
Stefan
Am 19.11.2012 11:15, schrieb Stefan Priebe - Profihost AG:
Hi Josh,
sorry for the bunch of mails.
It turns out not to be a bug in RBD or ceph but a bug in the linux
kernel itself. Paolo from qemu told me the
On 11/16/2012 07:14 PM, Josh Durgin wrote:
On 11/16/2012 06:36 AM, Constantinos Venetsanopoulos wrote:
Hello ceph team,
As you may already know, our team in GRNET is building a complete open
source cloud platform called Synnefo [1], which already powers our
production public cloud service
Hello Mark,
First of all, thank you again for another accurate answer :-).
I would have expected write aggregation and cylinder affinity to
have eliminated some seeks and improved rotational latency resulting
in better than theoretical random write throughput. Against those
expectations
If I remember, you use fio with 4MB block size for sequential.
So it's normal that you have less ios, but more bandwith.
That's correct for some of the benchmarks. However even with 4K for
seq, I still get less IOPS. See below my last fio:
# fio rbd-bench.fio
seq-read: (g=0): rw=read,
On Mon, 19 Nov 2012, S?bastien Han wrote:
If I remember, you use fio with 4MB block size for sequential.
So it's normal that you have less ios, but more bandwith.
That's correct for some of the benchmarks. However even with 4K for
seq, I still get less IOPS. See below my last fio:
Small
Recall:
1. RBD volumes are striped (4M wide) across RADOS objects
2. distinct writes to a single RADOS object are serialized
Your sequential 4K writes are direct, depth=256, so there are
(at all times) 256 writes queued to the same object. All of
your writes are waiting through a very
On Sat, Nov 17, 2012 at 1:50 PM, Sławomir Skowron szi...@gmail.com wrote:
Welcome,
I have a question. Is there, any way to support multiple domains names
in one radosgw on virtual host type connection in S3 ??
Are you aiming at having multiple virtual domain names pointing at the
same bucket?
@Sage, thanks for the info :)
@Mark:
If you want to do sequential I/O, you should do it buffered
(so that the writes can be aggregated) or with a 4M block size
(very efficient and avoiding object serialization).
The original benchmark has been performed with 4M block size. And as
you can see
Yes. I am looking for using domain x.com, and y.com with virtual host
buckets like b.x.com, c.y.com
But if it's not possible i can handle this with cname *.x.com and use
only b and c on x.com domain.
Thanks for response.
19 lis 2012 19:02, Yehuda Sadeh yeh...@inktank.com napisał(a):
On Sat,
Hi,
I work for Harris Corporation, and we are investigating Ceph as a potential
solution to a storage problem that one of our government customers is currently
having. I've already created a two-node cluster on a couple of VMs with
another VM acting as an administrative client. The cluster
Reviewed-by: Dan Mick dan.m...@inktank.com
On 11/16/2012 07:43 AM, Alex Elder wrote:
The functions rbd_get_dev() and rbd_put_dev() are trivial wrappers
that add no values, and their existence suggests they may do more
than what they do.
Get rid of them.
Signed-off-by: Alex Elder
From: Stefan Priebe s.pri...@profhost.ag
This one fixes a race qemu also had in iscsi block driver between
cancellation and io completition.
qemu_rbd_aio_cancel was not synchronously waiting for the end of
the command.
It also removes the useless cancelled flag and introduces instead
a status
From Stefan Priebe s.pri...@profihost.ag # This line is ignored.
From: Stefan Priebe s.pri...@profihost.ag
Cc: pve-de...@pve.proxmox.com
Cc: pbonz...@redhat.com
Cc: ceph-devel@vger.kernel.org
Subject: QEMU/PATCH: rbd block driver: fix race between completition and cancel
In-Reply-To:
Which iodepth did you use for those benchs?
I really don't understand why I can't get more rand read iops with 4K block
...
Me neither, hope to get some clarification from the Inktank guys. It
doesn't make any sense to me...
--
Bien cordialement.
Sébastien HAN.
On Mon, Nov 19, 2012 at 8:11
Hello Mark,
See below my benchmarks results:
-RADOS Bench with 4M block size write:
# rados -p bench bench 300 write -t 32 --no-cleanup
Maintaining 32 concurrent writes of 4194304 bytes for at least 300 seconds.
2012-11-19 21:35:01.722143min lat: 0.255396 max lat: 8.40212 avg lat: 1.14076
I have a problem in which I can't start my ceph monitor. The log is shown below.
The log shows version 0.54. I was running 0.52 when the problem arose, and I
moved to the latest in case the newer version fixed the problem.
The original failure happened a week or so ago, and could have been as
(Apologies if this is seen to be a repeat posting: I think that the last
attempt fell into the void).
I can't start my ceph monitor. The log is below.
Though this shows version 0.54, the problem arose whilst using 0.52. Something
may have become corrupted when the disk space ran out due to an
Hello Josh,
after digging three days around i got it.
The problem is in aio_discard in internal.cc. The i/o fails when AioZero
or AioTruncate is used.
It works fine with AioRemove. It seems to depend on overlapping.
Hopefully i'm able to provide a patch this nicht.
Greets,
Stefan
--
To
I can't start my ceph monitor, the log is attached below.
Whilst the log shows 0.54, the problem arose with 0.52, and may have been
caused when disk space ran out as a result of a huge set of ceph log files.
Is there a way to recover?
Ragards,
David
bash-4.1# cat
On Mon, Nov 19, 2012 at 1:08 PM, Dave Humphreys (Datatone)
d...@datatone.co.uk wrote:
I have a problem in which I can't start my ceph monitor. The log is shown
below.
The log shows version 0.54. I was running 0.52 when the problem arose, and I
moved to the latest in case the newer version
On Mon, Nov 19, 2012 at 7:55 AM, Drunkard Zhang gongfan...@gmail.com wrote:
I created a ceph cluster for test, here's mistake I made:
Add a second mds: mds.ab, executed 'ceph mds set_max_mds 2', then
removed the mds just added;
Then 'ceph mds set_max_mds 1', the first mds.aa crashed, and
On Sun, Nov 18, 2012 at 7:14 PM, liu yaqi liuyaqiy...@gmail.com wrote:
Is the disk on MDS used for journal? Does it has some other use?
The MDS doesn't make any use of local disk space — it stores
everything in RADOS. You need enough local disk to provide a
configuration file, keyring, and debug
On Fri, Nov 16, 2012 at 5:56 PM, Josh Durgin josh.dur...@inktank.com wrote:
On 11/15/2012 01:51 AM, Gandalf Corvotempesta wrote:
2012/11/15 Josh Durgin josh.dur...@inktank.com:
So basically you'd only need a single nic per storage node. Multiple
can be useful to separate frontend and backend
Hi - There are several jpg files in the doc/images directory of the tarball
that don't seem to be used in the html files or man pages after docs are built.
If they are used somewhere - where is that what am I missing?
Some of the .png files are used.
root@84Server:~/ceph-ceph-fd4b839# ls
On Mon, 19 Nov 2012, Isaac Otsiabah wrote:
I am trying to understand ceph deployment direction because from this link
http://ceph.com/docs/master/rados/deployment/
it
is mentioned that mkcephfs is dreprecated. It also has the statement
below which mentions light-weight deployment scripts
On Sun, Nov 11, 2012 at 11:02 PM, liu yaqi liuyaqiy...@gmail.com wrote:
2012/11/9 Sage Weil s...@inktank.com
Lots of different snapshots:
- librados lets you do 'selfmanaged snaps' in its API, which let an
application control which snapshots apply to which objects.
- you can create a
Making 'mon clock drift allowed' very small (0.1) does not
reliably reproduce the hang. I started looking at the code for 0.48.2
and it looks like this is only used in Paxos::warn_on_future_time,
which only handles the warning, nothing else.
On Fri, Nov 16, 2012 at 2:21 PM, Sage Weil
mhm qemu rbd block driver. Get's always these errors back. As
rbd_aio_bh_cb is directly called from librbd the problem must be there.
Strangely i can't find where rbd_aio_bh_cb get's called with -512.
ANy further ideas?
rbd_aio_bh_cb got error back. Code: -512 Error: 0
rbd_aio_bh_cb got error
Which version was this on? There was some fairly significant work to
recovery done to introduce a reservation scheme and some other stuff
that might need some different defaults.
-Greg
On Tue, Nov 13, 2012 at 12:33 PM, Stefan Priebe s.pri...@profihost.ag wrote:
Hi list,
osd recovery seems to
Hi All,
We've been fixing a number of objectcacher bugs to handle races between
slow osd commit replies and various other operations like truncate. I
ran into another problem earlier today with a race between an object
getting evicted from the lru cache (via readx - trim) and the osd
On Tue, Nov 13, 2012 at 3:23 AM, Franck Marchand fmarch...@agaetis.fr wrote:
Hi,
I have a weird pb. I remove a folder using a mounted fs partition. I
did it and it worked well.
What client are you using? How did you delete it? (rm -rf, etc?) Are
you using multiple clients or one, and did you
On 11/19/2012 03:16 PM, Stefan Priebe wrote:
mhm qemu rbd block driver. Get's always these errors back. As
rbd_aio_bh_cb is directly called from librbd the problem must be there.
Strangely i can't find where rbd_aio_bh_cb get's called with -512.
ANy further ideas?
Two ideas:
1) Is
Am 20.11.2012 00:39, schrieb Samuel Just:
Seems to be a truncated log file... That usually indicates filesystem
corruption. Anything in dmesg?
-Sam
No. Everything fine.
On Thu, Nov 15, 2012 at 1:07 PM, Stefan Priebe s.pri...@profihost.ag wrote:
Hello list,
actual master incl.
Am 20.11.2012 00:33, schrieb Josh Durgin:
On 11/19/2012 03:16 PM, Stefan Priebe wrote:
mhm qemu rbd block driver. Get's always these errors back. As
rbd_aio_bh_cb is directly called from librbd the problem must be there.
Strangely i can't find where rbd_aio_bh_cb get's called with -512.
ANy
There is no check in rbd_remove() to see if anybody holds open the
image being removed. That's not cool.
Add a simple open count that goes up and down with opens and closes
(releases) of the device, and don't allow an rbd image to be removed
if the count is non-zero.
Protect the updates of the
Can you restart one of the affected osds with debug osd = 20, debug
filestore = 20, debug ms = 1 and post the log?
-Sam
On Mon, Nov 19, 2012 at 3:39 PM, Stefan Priebe s.pri...@profihost.ag wrote:
Am 20.11.2012 00:39, schrieb Samuel Just:
Seems to be a truncated log file... That usually
I've formatted the cluster since then. But i'll report back if this
happens again.
Stefan
Am 20.11.2012 00:43, schrieb Samuel Just:
Can you restart one of the affected osds with debug osd = 20, debug
filestore = 20, debug ms = 1 and post the log?
-Sam
On Mon, Nov 19, 2012 at 3:39 PM, Stefan
On 11/19/2012 03:42 PM, Stefan Priebe wrote:
Am 20.11.2012 00:33, schrieb Josh Durgin:
On 11/19/2012 03:16 PM, Stefan Priebe wrote:
mhm qemu rbd block driver. Get's always these errors back. As
rbd_aio_bh_cb is directly called from librbd the problem must be there.
Strangely i can't find where
Hi Josh,
i don't get it. Every debug line i print is a prositive fine value. BUt
rbd_aio_bh_cb get's called with these values. As you can see that are
not much values i copied all values 0 from log for discarding a whole
30GB device.
Stefan
Am 20.11.2012 00:47, schrieb Josh Durgin:
On
On 11/19/2012 04:00 PM, Stefan Priebe wrote:
Hi Josh,
i don't get it. Every debug line i print is a prositive fine value. BUt
rbd_aio_bh_cb get's called with these values. As you can see that are
not much values i copied all values 0 from log for discarding a whole
30GB device.
Could you
On Sun, Nov 18, 2012 at 12:05 PM, Noah Watkins jayh...@cs.ucsc.edu wrote:
Wanna have a look at a first pass on this patch?
wip-client-open-layout
Thanks,
Noah
Just glanced over this, and I'm curious:
1) Why symlink another reference to your file_layout.h?
2) There's already a
Also, if you still have it, could you zip up your monitor data
directory and put it somewhere accessible to us? (I can provide you a
drop point if necessary.) We'd like to look at the file layouts a bit
since we thought we were properly handling ENOSPC-style issues.
-Greg
On Mon, Nov 19, 2012 at
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Nov 19, 2012 at 5:04 PM, Gregory Farnum g...@inktank.com wrote:
Just glanced over this, and I'm curious:
1) Why symlink another reference to your file_layout.h?
I followed the same pattern as page.h in librados, but may have
misunderstood its use. When libcephfs.h is installed, it
On Mon, 19 Nov 2012, Noah Watkins wrote:
On Mon, Nov 19, 2012 at 5:04 PM, Gregory Farnum g...@inktank.com wrote:
Just glanced over this, and I'm curious:
1) Why symlink another reference to your file_layout.h?
I followed the same pattern as page.h in librados, but may have
misunderstood
On 11/19/2012 11:42 AM, Blackwell, Edward wrote:
Hi,
I work for Harris Corporation, and we are investigating Ceph as a potential
solution to a storage problem that one of our government customers is currently
having. I've already created a two-node cluster on a couple of VMs with
another
Which iodepth did you use for those benchs?
iodepth = 100
filesize = 1G, 10G, 30G , same result
(3 nodes,8 cores 2,5GHZ,32GB ram, with 6 osd each (15k drive) + journal on
tmpfs)
Note that I can't get more than 6000 iops on a rbd device, but with more
devices it's scale. (each fio is at
50 matches
Mail list logo