Re: [PATCH 0/3] patches for rbd

2014-03-12 Thread Guangliang Zhao
On Wed, Mar 12, 2014 at 12:28:16AM -0500, Alex Elder wrote: > On 03/11/2014 11:24 PM, Guangliang Zhao wrote: Hi Alex, Thanks very much for the reviews, version 2 would coming soon. > > Hi, > > > > I am sorry that didn't notice Jean-Tiare's mail at all, just sage's > > reminds me. > > > > Ther

Re: Erasure code properties in OSDMap

2014-03-12 Thread John Spray
I am sure all of that will work, but it doesn't explain why these properties must be stored and named separately to crush rulesets. To flesh this out one also needs "get" and "list" operations for the sets of properties, which feels like overkill if there is an existing place we could be storing t

Re: [PATCH 0/3] patches for rbd

2014-03-12 Thread Jean-Tiare LE BIGOT
Hi, I must admit I've put it aside for the moment. So, I really don't mind, on the contrary, if you implement the DISCARD op. On 03/12/2014 05:24 AM, Guangliang Zhao wrote: I am sorry that didn't notice Jean-Tiare's mail at all, just sage's reminds me. -- Jean-Tiare, shared-hosting team --

Re: Erasure code properties in OSDMap

2014-03-12 Thread Loic Dachary
On 12/03/2014 13:39, John Spray wrote: > I am sure all of that will work, but it doesn't explain why these > properties must be stored and named separately to crush rulesets. To > flesh this out one also needs "get" and "list" operations for the sets > of properties, which feels like overkill if t

Re: Erasure code properties in OSDMap

2014-03-12 Thread John Spray
OK, in chatting about this I've been convinced that it's legitimately separate, because the CRUSH ruleset is mutable during the lifetime of a pool but the EC settings are not. I suppose the way we could explain the logical separation for users is to say that the CRUSH ruleset is mainly about locat

Re: Erasure code properties in OSDMap

2014-03-12 Thread Loic Dachary
On 12/03/2014 15:35, John Spray wrote: > OK, in chatting about this I've been convinced that it's legitimately > separate, because the CRUSH ruleset is mutable during the lifetime of > a pool but the EC settings are not. I suppose the way we could > explain the logical separation for users is to

Re: Erasure code properties in OSDMap

2014-03-12 Thread Sage Weil
On Wed, 12 Mar 2014, John Spray wrote: > I am sure all of that will work, but it doesn't explain why these > properties must be stored and named separately to crush rulesets. To > flesh this out one also needs "get" and "list" operations for the sets > of properties, which feels like overkill if t

Re: Erasure code properties in OSDMap

2014-03-12 Thread Sage Weil
On Wed, 12 Mar 2014, John Spray wrote: > OK, in chatting about this I've been convinced that it's legitimately > separate, because the CRUSH ruleset is mutable during the lifetime of > a pool but the EC settings are not. I suppose the way we could > explain the logical separation for users is to s

erasured PG always "peering"

2014-03-12 Thread Lluís Pàmies i Juárez
Hi ceph-devel, I have been playing with the new erasure code functionality and I have noticed that the erasure coded PG remains in "peering" state forever. Is that normal? I have a scenario with four servers each with two OSDs (total of eight OSDs). Then I define the extra crush rule to get four

Re: How to build ceph upon zfs filesystem.

2014-03-12 Thread sellers
ramu eppa gmail.com> writes: > > Hi Wiessner, > >Actually I created zpool and this is mounted to /dev/sdb and then set a > mountpint to osd.The space showing is 144GB.But the osd's are going down after > rbd image is while running. > > Thanks, > Ramu. > > -- > To unsubscribe from this l

Re: erasured PG always "peering"

2014-03-12 Thread Loic Dachary
Hi On 12/03/2014 18:40, Lluís Pàmies i Juárez wrote: > Hi ceph-devel, > > I have been playing with the new erasure code functionality and I have > noticed that the erasure coded PG remains in "peering" state forever. Is > that normal? > > I have a scenario with four servers each with two OSDs (t

O_DIRECT logic in CephFS, ceph-fuse / Performance

2014-03-12 Thread Kasper Dieter
The 'man 2 open' states ---snip--- The behaviour of O_DIRECT with NFS will differ from local file systems. (...) The NFS protocol does not support passing the flag to the server, so O_DIRECT I/O will bypass the page cache only on the client; the server may still cache the I/O. ---snip--- Q1:

Re: O_DIRECT logic in CephFS, ceph-fuse / Performance

2014-03-12 Thread Milosz Tanski
Kasper, I only know about the kernel cephfs... but there are special code paths for O_DIRECT read/writes. Both read and write bypass the page cache and send commands directly to OSDs for the objects, on the write case the object has a write lock with MDS. So unlike NFS this seams like it does the

Re: erasured PG always "peering"

2014-03-12 Thread Loic Dachary
Glad to hear it's working for you now ;-) There are important bug fixes daily: it is worth getting the latest Firefly from https://github.com/ceph/ceph/tree/firefly If you run into a problem again, it would be great if you could preserve the environment in which it happens and post a bug report

Re: O_DIRECT logic in CephFS, ceph-fuse / Performance

2014-03-12 Thread Sage Weil
Hi Kasper, In order to do what you want here, we need to make O_DIRECT-initiated requests on the client get a flag that tells the OSD to also bypass its cache. That doesn't happen right now. Assuming we do add that flag, we can either make the IO actually do O_DIRECT, or we can make it do som

[PATCH 2/3 v2] rbd: extend the operation type

2014-03-12 Thread Guangliang Zhao
It could only handle the read and write operations now, extend it for the coming discard support. Signed-off-by: Guangliang Zhao --- drivers/block/rbd.c | 96 +-- 1 file changed, 63 insertions(+), 33 deletions(-) diff --git a/drivers/block/rbd.c

[PATCH 1/3 v2] rbd: skip the copyup when an entire object writing

2014-03-12 Thread Guangliang Zhao
It need to copyup the parent's content when layered writing, but an entire object write would overwrite it, so skip it. Signed-off-by: Guangliang Zhao --- drivers/block/rbd.c | 49 ++--- 1 file changed, 34 insertions(+), 15 deletions(-) diff --git a

[PATCH 3/3 v2] rbd: add discard support for rbd

2014-03-12 Thread Guangliang Zhao
This patch add the discard support for rbd driver. There are there types operation in the driver: 1. The objects would be removed if they completely contained within the discard range. 2. The objects would be truncated if they partly contained within the discard range, and align with their b

Re: [PATCH 1/3 v2] rbd: skip the copyup when an entire object writing

2014-03-12 Thread Alex Elder
On 03/12/2014 10:21 PM, Guangliang Zhao wrote: > It need to copyup the parent's content when layered writing, > but an entire object write would overwrite it, so skip it. > > Signed-off-by: Guangliang Zhao > --- > drivers/block/rbd.c | 49 ++--- > 1

Re: [PATCH 2/3 v2] rbd: extend the operation type

2014-03-12 Thread Alex Elder
On 03/12/2014 10:21 PM, Guangliang Zhao wrote: > It could only handle the read and write operations now, > extend it for the coming discard support. Wow, it looks like you took all of my suggestions. This looks good. Reviewed-by: Alex Elder > Signed-off-by: Guangliang Zhao > --- > drivers/bl

Re: [PATCH 3/3 v2] rbd: add discard support for rbd

2014-03-12 Thread Alex Elder
On 03/12/2014 10:21 PM, Guangliang Zhao wrote: > This patch add the discard support for rbd driver. > > There are there types operation in the driver: > 1. The objects would be removed if they completely contained >within the discard range. > 2. The objects would be truncated if they partly co

Re: [PATCH 2/3 v2] rbd: extend the operation type

2014-03-12 Thread Guangliang Zhao
On Wed, Mar 12, 2014 at 11:47:36PM -0500, Alex Elder wrote: > On 03/12/2014 10:21 PM, Guangliang Zhao wrote: > > It could only handle the read and write operations now, > > extend it for the coming discard support. > > Wow, it looks like you took all of my suggestions. > > This looks good. Good