Re: [RFC] add rocksdb support

2014-06-30 Thread Haomai Wang
Hi Sushma, Thanks for your investigations! We already noticed the serializing risk on GenericObjectMap/DBObjectMap. In order to improve performance we add header cache to DBObjectMap. As for KeyValueStore, a cache branch is on the reviewing, it can greatly reduce lookup_header calls. Of course, r

RE: [RFC] add rocksdb support

2014-06-30 Thread Sushma Gurram
Hi Haomai/Greg, I tried to analyze this a bit more and it appears that the GenericObjectMap::header_lock is serializing the READ requests in the following path and hence the low performance numbers with KeyValueStore. ReplicatedPG::do_op() -> ReplicatedPG::find_object_context() -> ReplicatedPG:

Re: Calamari & Calamari-Clients mirrors ?

2014-06-30 Thread David Moreau Simard
OH, I didn’t know there was a Calamari mailing list. Subscribing now, thanks. On a semi-related note, iWeb mirrors a lot of Open source projects and Linux distributions: http://mirror.iweb.com/ I’d love to add Ceph to that list - is there any particular process ? If anything, I’d love to build a

Re: tracing objectstore virtual methods

2014-06-30 Thread Samuel Just
Hmm. You could replace do a s/method/_method/ kind of thing on the read methods and have method in ObjectStore ping the trace point and then call _method. The writes are slightly trickier/simpler. I suggest you add ObjectStore level _start_transaction(uint64_t seq, Transaction*) and _end_transac

Re: incomplete PG

2014-06-30 Thread Alexey Kurnosov
Hi,all. I finally was succeded. Maybe somebody will be intresting. A script read the content from a fuse-rbd files (i wonder, what is actual use case of fuse-rbd?) with "dd" and, in a case of timeout (alarmed by a background process), killed entire fuse daemon, remount fuse-rbd and resumed at

Re: teuthology task waiting for machines (> 8h)

2014-06-30 Thread Loic Dachary
Hi Zack, Thanks for the tip, I'll try it next time :-) Cheers On 30/06/2014 16:10, Zack Cerza wrote: > Hi Loic, > > At this point I don't really have a way to look back in time to see > what was going on, but in the future when jobs are blocked waiting for > machines for unreasonable periods it

Re: Calamari & Calamari-Clients mirrors ?

2014-06-30 Thread Sage Weil
On Mon, 30 Jun 2014, David Moreau Simard wrote: > Hi, > > Is there any short term plans for Inktank/Redhat to mirror packages for > Calamari and Calamari-Clients under ceph.com or is that currently in the > hands of the community ? John just sent an update about this yesterday to ceph-calamari

ceph branch status

2014-06-30 Thread ceph branch robot
-- All Branches -- Alfredo Deza 2013-09-27 10:33:52 -0400 wip-5900 Dan Mick 2013-07-16 23:00:06 -0700 wip-5634 David Zafman 2014-06-05 00:22:34 -0700 wip-8231 Greg Farnum 2013-02-13 14:46:38 -0800 wip-mds-snap-fix 2013-02-22 19:57:53 -0800 w

Re: [PATCH 09/14] libceph: introduce ceph_osdc_cancel_request()

2014-06-30 Thread Ilya Dryomov
On Mon, Jun 30, 2014 at 5:39 PM, Alex Elder wrote: > On 06/25/2014 12:16 PM, Ilya Dryomov wrote: >> Introduce ceph_osdc_cancel_request() intended for canceling requests >> from the higher layers (rbd and cephfs). Because higher layers are in >> charge and are supposed to know what and when they a

Re: [PATCH 07/14] libceph: unregister only registered linger requests

2014-06-30 Thread Ilya Dryomov
On Mon, Jun 30, 2014 at 5:50 PM, Alex Elder wrote: > On 06/25/2014 12:16 PM, Ilya Dryomov wrote: >> Linger requests that have not yet been registered should not be >> unregistered by __unregister_linger_request(). This messes up ref >> count and leads to use-after-free. >> >> Signed-off-by: Ilya

Re: teuthology task waiting for machines (> 8h)

2014-06-30 Thread Zack Cerza
Hi Loic, At this point I don't really have a way to look back in time to see what was going on, but in the future when jobs are blocked waiting for machines for unreasonable periods it's useful to know what is holding them: teuthology-lock --brief -a --machine-type plana | sort -k +4 Thanks, Zac

Re: teuthology task waiting for machines (> 8h)

2014-06-30 Thread Zack Cerza
On Sat, Jun 28, 2014 at 9:47 AM, Yuri Weinstein wrote: > altho 'teuthology-kill' is not > working for me today :( Uhoh, why not? -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kerne

Re: [PATCH 07/14] libceph: unregister only registered linger requests

2014-06-30 Thread Alex Elder
On 06/25/2014 12:16 PM, Ilya Dryomov wrote: > Linger requests that have not yet been registered should not be > unregistered by __unregister_linger_request(). This messes up ref > count and leads to use-after-free. > > Signed-off-by: Ilya Dryomov > --- > net/ceph/osd_client.c | 15 +++

Re: [PATCH 09/14] libceph: introduce ceph_osdc_cancel_request()

2014-06-30 Thread Alex Elder
On 06/25/2014 12:16 PM, Ilya Dryomov wrote: > Introduce ceph_osdc_cancel_request() intended for canceling requests > from the higher layers (rbd and cephfs). Because higher layers are in > charge and are supposed to know what and when they are canceling, the > request is not completed, only unref'

Re: FW: erasure code and coefficients

2014-06-30 Thread Loic Dachary
Hi, On 30/06/2014 14:59, Andreas Joachim Peters wrote: > Hi Loic, > > I was reading through the code and I have got the impression that they have > formulated something simple with a lot of mathematical formulas in this > paper. Indeed all ci = 1 ... which makes it simple ... It makes things

Re: [PATCH 08/14] libceph: fix linger request check in __unregister_request()

2014-06-30 Thread Alex Elder
On 06/25/2014 12:16 PM, Ilya Dryomov wrote: > We should check if request is on the linger request list of any of the > OSDs, not whether request is registered or not. > > Signed-off-by: Ilya Dryomov That was a difficult to spot bug. Very good. Reviewed-by: Alex Elder > --- > net/ceph/osd_cl

Re: [PATCH 07/14] libceph: unregister only registered linger requests

2014-06-30 Thread Alex Elder
On 06/25/2014 12:16 PM, Ilya Dryomov wrote: > Linger requests that have not yet been registered should not be > unregistered by __unregister_linger_request(). This messes up ref > count and leads to use-after-free. This makes sense. The problem can occur when updating the OSD map. An OSD *clien

FW: erasure code and coefficients

2014-06-30 Thread Andreas Joachim Peters
Hi Loic, I was reading through the code and I have got the impression that they have formulated something simple with a lot of mathematical formulas in this paper. Indeed all ci = 1 ... which makes it simple ... The code seems to work like this e..g 10 + 2 + 4 : Data = (1,2,3,4,5,6,7,8,9,10

Re: [PATCH 05/14] libceph: harden ceph_osdc_request_release() a bit

2014-06-30 Thread Alex Elder
On 06/25/2014 12:16 PM, Ilya Dryomov wrote: > Add some WARN_ONs to alert us when we try to destroy requests that are > still registered. > > Signed-off-by: Ilya Dryomov Good idea. Especially the RB_CLEAR_NODE() call. Reviewed-by: Alex Elder > --- > net/ceph/osd_client.c |7 +++ > 1

Re: [PATCH 06/14] libceph: assert both regular and lingering lists in __remove_osd()

2014-06-30 Thread Alex Elder
On 06/25/2014 12:16 PM, Ilya Dryomov wrote: > It is important that both regular and lingering requests lists are > empty when the OSD is removed. Looks good. Reviewed-by: Alex Elder > > Signed-off-by: Ilya Dryomov > --- > net/ceph/osd_client.c |2 ++ > 1 file changed, 2 insertions(+) >

Re: [PATCH 04/14] libceph: move and add dout()s to ceph_osdc_request_{get,put}()

2014-06-30 Thread Alex Elder
On 06/25/2014 12:16 PM, Ilya Dryomov wrote: > Add dout()s to ceph_osdc_request_{get,put}(). Also move them to .c and > turn kref release callback into a static function. You can pretty much take the identical comments from what I said on [PATCH 03/14]. Reviewed-by: Alex Elder > Signed-off-by:

Re: [PATCH 03/14] libceph: move and add dout()s to ceph_msg_{get,put}()

2014-06-30 Thread Alex Elder
On 06/25/2014 12:16 PM, Ilya Dryomov wrote: > Add dout()s to ceph_msg_{get,put}(). Also move them to .c and turn > kref release callback into a static function. > > Signed-off-by: Ilya Dryomov This is all very good. I have one suggestion though, below, but regardless: Reviewed-by: Alex Elder

Re: [PATCH 01/14] libceph: rename ceph_osd_request::r_linger_osd to r_linger_osd_item

2014-06-30 Thread Alex Elder
On 06/25/2014 12:16 PM, Ilya Dryomov wrote: > So that: > > req->r_osd_item --> osd->o_requests list > req->r_linger_osd_item --> osd->o_linger_requests list > > Signed-off-by: Ilya Dryomov This looks good and I prefer it too. Reviewed-by: Alex Elder > --- > include/linux/ceph/osd_client.h |

Re: [PATCH 02/14] libceph: add maybe_move_osd_to_lru() and switch to it

2014-06-30 Thread Alex Elder
On 06/25/2014 12:16 PM, Ilya Dryomov wrote: > Abstract out __move_osd_to_lru() logic from __unregister_request() and > __unregister_linger_request(). > > Signed-off-by: Ilya Dryomov Looks good. Reviewed-by: Alex Elder > --- > net/ceph/osd_client.c | 26 ++ > 1 file

Re: [PATCH] rbd: handle parent_overlap on writes correctly

2014-06-30 Thread Alex Elder
On 06/11/2014 11:40 AM, Ilya Dryomov wrote: > The following check in rbd_img_obj_request_submit() > > rbd_dev->parent_overlap <= obj_request->img_offset > > allows the fall through to the non-layered write case even if both > parent_overlap and obj_request->img_offset belong to the same RADOS

Re: erasure code and coefficients

2014-06-30 Thread Loic Dachary
Hi Andreas, TL;DR: which part of the code chooses the coefficients to maximize the fault tolerance of the code as suggested in the Xorbas paper ? If I understand correctly, the locality is computed (i.e. how many local parity chunks are created) with: https://github.com/madiator/HadoopUSC/blob

[PATCH] rbd: do not leak image_id in rbd_dev_v2_parent_info()

2014-06-30 Thread Ilya Dryomov
image_id is leaked if the parent happens to have been recorded already. Fix it. Signed-off-by: Ilya Dryomov --- drivers/block/rbd.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index d99aa81774f8..adedb393b374 100644 --- a/drivers/block/rbd.c

RE: erasure code and coefficients

2014-06-30 Thread Andreas Joachim Peters
Hi Loic, i think the best is to read along the sources. It is very readable! https://github.com/madiator/HadoopUSC/blob/developUSC/src/contrib/raid/src/java/org/apache/hadoop/raid/SimpleRegeneratingCode.java If there is a high interest in this, you could port the code from Java to C++ and use

Re: erasure code and coefficients

2014-06-30 Thread Loic Dachary
Hi koleosfuscus, It clarifies it enough to raise a question : where can I read code (or an algorithm if not code) that chose the coefficients desirable to implement what is suggested in the Xorbas paper ? Cheers On 30/06/2014 10:18, Koleos Fuscus wrote: > Hi Loic, > > I am happy to contribute

Re: erasure code and coefficients

2014-06-30 Thread Koleos Fuscus
Hi Loic, I am happy to contribute with some clarifications. In fact, erasure/reliability concepts are not blocking my progress with the reliability model at ceph. It is the ceph model itself that has some parts not clear to me, and nobody had time yet to review the state model diagram that I publi