I do not really understand that network latency argument.
If one can get 40K iops with iSCSI, why can't I get the same with rados/ceph?
Note: network latency is the same in both cases
What do I miss?
-Original Message-
From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
Create a helper function to check if a backing device requires stable page
writes and, if so, performs the necessary wait. Then, make it so that all
points in the memory manager that handle making pages writable use the helper
function. This should provide stable page write support to most
Hi all,
This patchset makes some key modifications to the original 'stable page writes'
patchset. First, it provides users (devices and filesystems) of a
backing_dev_info the ability to declare whether or not it is necessary to
ensure that page contents cannot change during writeout, whereas the
Am 01.11.2012 08:38, schrieb Dietmar Maurer:
I do not really understand that network latency argument.
If one can get 40K iops with iSCSI, why can't I get the same with rados/ceph?
Note: network latency is the same in both cases
What do I miss?
Good question. Also i've seen 20k iops on ceph
From: Yan, Zheng zheng.z@intel.com
The cap from non-auth mds doesn't have a meaningful max_size value.
Signed-off-by: Yan, Zheng zheng.z@intel.com
---
fs/ceph/caps.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index
From: Yan, Zheng zheng.z@intel.com
Both CInode and CDentry's versionlocks are of type LocalLock.
Acquiring LocalLock in replica object is useless and problematic.
For example, if two requests try acquiring a replica object's
versionlock, the first request succeeds, the second request
is added
From: Yan, Zheng zheng.z@intel.com
ceph_aio_write() has an optimization that marks cap EPH_CAP_FILE_WR
dirty before data is copied to page cache and inode size is updated.
If sceph_check_caps() flushes the dirty cap before the inode size is
updated, MDS can miss the new inode size. The fix is
From: Yan, Zheng zheng.z@intel.com
Unstable locks hold auth_pins on the object, it prevents the freezing
object become frozen and then unfreeze. So try_eval() should not wait
for freezing object
Signed-off-by: Yan, Zheng zheng.z@intel.com
---
src/mds/Locker.cc | 4 ++--
1 file changed,
I'm not sure that latency addition is quite correct. Most use cases
cases do multiple IOs at the same time, and good benchmarks tend to
reflect that.
I suspect the IO limitations here are a result of QEMU's storage
handling (or possibly our client layer) more than anything else — Josh
can talk
Am 01.11.2012 11:40, schrieb Gregory Farnum:
I'm not sure that latency addition is quite correct. Most use cases
cases do multiple IOs at the same time, and good benchmarks tend to
reflect that.
I suspect the IO limitations here are a result of QEMU's storage
handling (or possibly our client
On 10/31/2012 08:49 PM, Josh Durgin wrote:
I know you've got a queue of these already, but here's another:
rbd_dev_probe_update_spec() could definitely use some warnings
to distinguish its error cases.
Reviewed-by: Josh Durgin josh.dur...@inktank.com
Finally! I was going to accuse you of
On 10/31/2012 09:07 PM, Josh Durgin wrote:
This all makes sense, but it reminds me of another issue we'll need to
address:
http://www.tracker.newdream.net/issues/2533
I was not aware of that one. That's no good.
We don't need to watch the header of a parent snapshot, since it's
immutable
On Thu 01-11-12 00:58:13, Darrick J. Wong wrote:
This creates a per-backing-device counter that tracks the number of users
which
require pages to be held immutable during writeout. Eventually it will be
used
to waive wait_for_page_writeback() if nobody requires stable pages.
As I wrote
These are a handful of fairly minor cleanup items I've
been putting off sending out until some of the meatier
stuff got done.
-Alex
[PATCH 1/5] rbd: document rbd_spec structure
[PATCH 2/5] rbd: kill rbd_spec-image_name_len
[PATCH 3/5] rbd: kill
I promised Josh I would document whether there were any restrictions
needed for accessing fields of an rbd_spec structure. This adds a
big block of comments that documents the structure and how it is
used--including the fact that we don't attempt to synchronize access
to it.
Signed-off-by: Alex
This replaces two kmalloc()/memcpy() combinations with a single
call to kmemdup().
Signed-off-by: Alex Elder el...@inktank.com
---
drivers/block/rbd.c |7 ++-
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 3378963..cf7b405
It's kind of a silly macro, but ceph_encode_8_safe() is the only one
missing from an otherwise pretty complete set. It's not used, but
neither are a couple of the others in this set.
While in there, insert some whitespace to tidy up the alignment of
the line-terminating backslashes in some of
Currently ceph_pg_poolid_by_name() returns an int, which is used to
encode a ceph pool id. This could be a problem because a pool id
(at least in some cases) is a 64-bit value. We have a defined pool
id value that represents no pool, and that's a very sensible
return value here.
This patch
This series adds a utility function rbd_warn() that will
provide a central and unified way to generate warning messages
from rbd. It then fleshes out some warning messages in a
few areas. There is more to be done, but for now I'm just
getting the mechanism and these initial uses of it in place.
Define a new function rbd_warn() that produces a boilerplate warning
message, identifying in the resulting message the affected rbd
device in the best way available. Use it in a few places that now
use pr_warning().
Signed-off-by: Alex Elder el...@inktank.com
---
drivers/block/rbd.c | 43
Tell the user (via dmesg) what was wrong with the arguments provided
via /sys/bus/rbd/add.
Signed-off-by: Alex Elder el...@inktank.com
---
drivers/block/rbd.c | 24
1 file changed, 16 insertions(+), 8 deletions(-)
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
Add a warning in bio_chain_clone_range() to help a user determine
what exactly might have led to a failure. There is only one; please
say something if you disagree with the following reasoning.
There are three places this can return abnormally:
- Initially, if there is nothing to clone. It
Josh suggested adding warnings to this function to help users
diagnose problems.
Other than memory allocatino errors, there are two places where
errors can be returned. Both represent problems that should
have been caught earlier, and as such might well have been
handled with BUG_ON() calls.
In this case he's doing a direct random read, so the ios queue one at
a time on his various multipath channels. Be may have defined a depth
that sends a bunch at once, but they still get split up, he could run
a blktrace to verify. If they could merge he could maybe send
multiples, or perhaps he
Actually that didn't illustrate my point very well, since you see
individual requests being sent to the driver without waiting for
individual completion, but if you look at the full output you can see
that once the queue is full, you're at the mercy of waiting for
individual IOs to complete before
For the record, I'm not saying that it's the entire reason why the performance
is lower (obviously since iscsi is better), I'm just saying that when you're
talking about high iops, adding 100us (best case gigabit) to each request and
response is significant
iSCSI also uses the network (also
On 11/01/2012 12:58 AM, Darrick J. Wong wrote:
This creates a per-backing-device counter that tracks the number of users
which
require pages to be held immutable during writeout. Eventually it will be
used
to waive wait_for_page_writeback() if nobody requires stable pages.
There is two
On 11/01/2012 12:58 AM, Darrick J. Wong wrote:
Fix up the filesystems that provide their own -page_mkwrite handlers to
provide stable page writes if necessary.
Signed-off-by: Darrick J. Wong darrick.w...@oracle.com
---
fs/9p/vfs_file.c |1 +
fs/afs/write.c |4 ++--
Hi Noah -
What platform are you building on, and are you building with nss or cryptopp ?
Thanks,
Gary
On Oct 31, 2012, at 8:22 PM, Noah Watkins wrote:
Whoops, here is the original error:
CXXtest_idempotent_sequence.o
In file included from ./os/LFNIndex.h:27:0,
from
On Thu, 1 Nov 2012 11:43:26 -0700
Boaz Harrosh bharr...@panasas.com wrote:
On 11/01/2012 12:58 AM, Darrick J. Wong wrote:
Fix up the filesystems that provide their own -page_mkwrite handlers to
provide stable page writes if necessary.
Signed-off-by: Darrick J. Wong
On 11/01/2012 04:18 PM, Gandalf Corvotempesta wrote:
2012/10/31 Stefan Kleijkers ste...@kleijkers.nl:
As far as I know, this is correct. You get a ACK (on the write) back after
it landed on ALL three journals (or/and osds in case of BTRFS in parallel
mode). So If you lose one node, you still
On 11/01/2012 01:22 PM, Jeff Layton wrote:
Hmm...I don't know...
I've never been crazy about using the page lock for this, but in the
absence of a better way to guarantee stable pages, it was what I ended
up with at the time. cifs_writepages will hold the page lock until
kernel_sendmsg
On Thu 01 Nov 2012 11:22:59 AM CDT, Nathan Howell wrote:
We have a small (3 node) Ceph cluster that occasionally has issues. It
loses files and directories, truncates them or fills the contents with
NULL bytes. So far we haven't been able to build a repro case but it
seems to happen when bulk
I'm getting the following assertion failure when running a test that creates a
symlink and then tries to read it using ceph_readlink().
This is the failure, and the test is shown below (and is in wip-java-symlinks).
Also note that if the test below is altered to use relative paths for both
On Thu, Nov 01, 2012 at 04:22:54PM -0400, Jeff Layton wrote:
On Thu, 1 Nov 2012 11:43:26 -0700
Boaz Harrosh bharr...@panasas.com wrote:
On 11/01/2012 12:58 AM, Darrick J. Wong wrote:
Fix up the filesystems that provide their own -page_mkwrite handlers to
provide stable page writes if
On 11/01/2012 11:57 AM, Darrick J. Wong wrote:
On Thu, Nov 01, 2012 at 11:21:22AM -0700, Boaz Harrosh wrote:
On 11/01/2012 12:58 AM, Darrick J. Wong wrote:
This creates a per-backing-device counter that tracks the number of users
which
require pages to be held immutable during writeout.
On Thu, Nov 1, 2012 at 11:32 PM, Sam Lang sam.l...@inktank.com wrote:
On Thu 01 Nov 2012 11:22:59 AM CDT, Nathan Howell wrote:
We have a small (3 node) Ceph cluster that occasionally has issues. It
loses files and directories, truncates them or fills the contents with
NULL bytes. So far we
On 11/01/2012 05:38 PM, Noah Watkins wrote:
I'm getting the following assertion failure when running a test that creates a
symlink and then tries to read it using ceph_readlink().
This is the failure, and the test is shown below (and is in wip-java-symlinks).
Also note that if the test below
On Thu 01-11-12 15:56:34, Boaz Harrosh wrote:
On 11/01/2012 11:57 AM, Darrick J. Wong wrote:
On Thu, Nov 01, 2012 at 11:21:22AM -0700, Boaz Harrosh wrote:
On 11/01/2012 12:58 AM, Darrick J. Wong wrote:
This creates a per-backing-device counter that tracks the number of users
which
filepath path(relpath);
Inode *in;
- int r = path_walk(path, in);
+ int r = path_walk(path, in, false);
if (r 0)
return r;
Fixes both cases. Thanks!
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
On Thu, Nov 1, 2012 at 3:32 PM, Sam Lang sam.l...@inktank.com wrote:
Do the writes succeed? I.e. the programs creating the files don't get
errors back? Are you seeing any problems with the ceph mds or osd processes
crashing? Can you describe your I/O workload during these bulk loads? How
Hello list,
does rbd support trim / unmap? Or is it planned to support it?
Greets,
Stefan
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/01/2012 04:33 PM, Stefan Priebe wrote:
Hello list,
does rbd support trim / unmap? Or is it planned to support it?
Greets,
Stefan
librbd (and thus qemu) support it. The rbd kernel module does not yet.
See http://ceph.com/docs/master/rbd/qemu-rbd/#enabling-discard-trim
Josh
--
To
On Thu, 1 Nov 2012 15:47:30 -0700
Darrick J. Wong darrick.w...@oracle.com wrote:
On Thu, Nov 01, 2012 at 04:22:54PM -0400, Jeff Layton wrote:
On Thu, 1 Nov 2012 11:43:26 -0700
Boaz Harrosh bharr...@panasas.com wrote:
On 11/01/2012 12:58 AM, Darrick J. Wong wrote:
Fix up the
On 11/01/2012 06:22 PM, Noah Watkins wrote:
filepath path(relpath);
Inode *in;
- int r = path_walk(path, in);
+ int r = path_walk(path, in, false);
if (r 0)
return r;
Fixes both cases. Thanks!
I discovered a few more bugs in path_walk() for the symlink case while
On Fri, Nov 2, 2012 at 7:30 AM, Nathan Howell nathan.d.how...@gmail.com wrote:
On Thu, Nov 1, 2012 at 3:32 PM, Sam Lang sam.l...@inktank.com wrote:
Do the writes succeed? I.e. the programs creating the files don't get
errors back? Are you seeing any problems with the ceph mds or osd processes
I guess I'm going to have to retract this problem, as I can't
reproduce it today. No clue what happened :)
On Thu, Nov 1, 2012 at 12:05 PM, Gary Lowell gary.low...@inktank.com wrote:
Hi Noah -
What platform are you building on, and are you building with nss or cryptopp
?
Thanks,
Gary
Let me know if you see the problem again. It's probably something in the
autotools dependencies resolution.
Cheers,
Gary
On Nov 1, 2012, at 8:54 PM, Noah Watkins wrote:
I guess I'm going to have to retract this problem, as I can't
reproduce it today. No clue what happened :)
On Thu, Nov
48 matches
Mail list logo