Re: OSD memory leaks?

2013-03-12 Thread Vladislav Gorbunov
FYI I'm using 450 pgs for my pools. Please, can you show the number of object replicas? ceph osd dump | grep 'rep size' Vlad Gorbunov 2013/3/5 Sébastien Han han.sebast...@gmail.com: FYI I'm using 450 pgs for my pools. -- Regards, Sébastien Han. On Fri, Mar 1, 2013 at 8:10 PM, Sage Weil

Re: OSD memory leaks?

2013-03-12 Thread Sébastien Han
Replica count has been set to 2. Why? -- Regards, Sébastien Han. On Tue, Mar 12, 2013 at 12:45 PM, Vladislav Gorbunov vadi...@gmail.com wrote: FYI I'm using 450 pgs for my pools. Please, can you show the number of object replicas? ceph osd dump | grep 'rep size' Vlad Gorbunov 2013/3/5

Re: OSD memory leaks?

2013-03-12 Thread Vladislav Gorbunov
Sorry, i mean pg_num and pgp_num on all pools. Shown by the ceph osd dump | grep 'rep size' The default pg_num value 8 is NOT suitable for big cluster. 2013/3/13 Sébastien Han han.sebast...@gmail.com: Replica count has been set to 2. Why? -- Regards, Sébastien Han. On Tue, Mar 12, 2013

Re: OSD memory leaks?

2013-03-12 Thread Sébastien Han
Sorry, i mean pg_num and pgp_num on all pools. Shown by the ceph osd dump | grep 'rep size' Well it's still 450 each... The default pg_num value 8 is NOT suitable for big cluster. Thanks I know, I'm not new with Ceph. What's your point here? I already said that pg_num was 450... -- Regards,

Re: Comments on Ceph.com's blog article 'Ceph's New Monitor Changes'

2013-03-12 Thread Mark Kampe
It seems to me that the surviving OSDs still remember all of the osdmap and pgmap history back to last epoch started for all of their PGs. Isn't this enough to enable reconstruction of all of the pgmaps and osdmaps required to find any copy of currently stored object? My history has given me

Re: OSD memory leaks?

2013-03-12 Thread Dave Spano
Disregard my previous question. I found my answer in the post below. Absolutely brilliant! I thought I was screwed! http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/8924 Dave Spano Optogenics Systems Administrator - Original Message - From: Dave Spano

Re: OSD memory leaks?

2013-03-12 Thread Sébastien Han
Well to avoid un necessary data movement, there is also an _experimental_ feature to change on fly the number of PGs in a pool. ceph osd pool set poolname pg_num numpgs --allow-experimental-feature Cheers! -- Regards, Sébastien Han. On Tue, Mar 12, 2013 at 7:09 PM, Dave Spano

Release Cadence

2013-03-12 Thread Patrick McGarry
Hey all, Just wanted to link to a few words on the new Ceph release cadence. http://ceph.com/community/ceph-settles-in-to-aggressive-release-cadence/ Feel free to hit us with any questions. Thanks. Best Regards, -- Patrick McGarry Director, Community Inktank http://ceph.com ||

Re: OSD memory leaks?

2013-03-12 Thread Greg Farnum
On Tuesday, March 12, 2013 at 1:10 PM, Sébastien Han wrote: Well to avoid un necessary data movement, there is also an _experimental_ feature to change on fly the number of PGs in a pool. ceph osd pool set poolname pg_num numpgs --allow-experimental-feature Don't do that. We've got a set of

Re: OSD memory leaks?

2013-03-12 Thread Bryan K. Wright
han.sebast...@gmail.com said: Well to avoid un necessary data movement, there is also an _experimental_ feature to change on fly the number of PGs in a pool. ceph osd pool set poolname pg_num numpgs --allow-experimental-feature I've been following the instructions here:

Re: OSD memory leaks?

2013-03-12 Thread Dave Spano
I'd rather shut the cloud down and copy the pool to a new one than take any chances of corruption by using an experimental feature. My guess is that there cannot be any i/o to the pool while copying, otherwise you'll lose the changes that are happening during the copy, correct? Dave Spano

Re: OSD memory leaks?

2013-03-12 Thread Greg Farnum
Yeah. There's not anything intelligent about that cppool mechanism. :) -Greg On Tuesday, March 12, 2013 at 2:15 PM, Dave Spano wrote: I'd rather shut the cloud down and copy the pool to a new one than take any chances of corruption by using an experimental feature. My guess is that there

Re: CephFS Space Accounting and Quotas

2013-03-12 Thread Jim Schutt
On 03/11/2013 02:40 PM, Jim Schutt wrote: If you want I can attempt to duplicate my memory of the first test I reported, writing the files today and doing the strace tomorrow (with timestamps, this time). Also, would it be helpful to write the files with minimal logging, in hopes of

More Messenger Patches

2013-03-12 Thread Alex Elder
I'm about to post another set of patches. As usual I've tried to group them logically, and in this case there are several single patches. This series continues moving the messenger toward supporting multiple chunks of data in a single message. We need this to support osd client requests which

[PATCH] libceph: drop pages parameter

2013-03-12 Thread Alex Elder
The pages parameter in read_partial_message_pages() is unused, so get rid of it. Signed-off-by: Alex Elder el...@inktank.com --- net/ceph/messenger.c |8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c index 997dacc..0d54ca4

[PATCH] libceph: no outbound zero data

2013-03-12 Thread Alex Elder
There is handling in write_partial_message_data() for the case where only the length of--and no other information about--the data to be sent has been specified. It uses the zero page as the source of data to send in this case. This case doesn't occur. All message senders set up a page array,

[PATCH] libceph: record residual bytes for all message data types

2013-03-12 Thread Alex Elder
All of the data types can use this, not just the page array. Until now, only the bio type doesn't have it available, and only the initiator of the request (the rbd client) is able to supply the length of the full request without re-scanning the bio list. Change the cursor init routines so the

[PATCH 0/4] libceph: use cursor for incoming data

2013-03-12 Thread Alex Elder
This series changes the incoming data path for the messenger to use the new data item cursors. -Alex [PATCH 1/4] libceph: use cursor for bio reads [PATCH 2/4] libceph: kill ceph message bio_iter, bio_seg [PATCH 3/4] libceph: use cursor for inbound data

[PATCH 1/4] libceph: use cursor for bio reads

2013-03-12 Thread Alex Elder
Replace the use of the information in con-in_msg_pos for incoming bio data. The old in_msg_pos and the new cursor mechanism do basically the same thing, just slightly differently. The main functional difference is that in_msg_pos keeps track of the length of the complete bio list, and assumed it

[PATCH 2/4] libceph: kill ceph message bio_iter, bio_seg

2013-03-12 Thread Alex Elder
The bio_iter and bio_seg fields in a message are no longer used, we use the cursor instead. So get rid of them and the functions that operate on them them. This is related to: http://tracker.ceph.com/issues/4428 Signed-off-by: Alex Elder el...@inktank.com --- include/linux/ceph/messenger.h

[PATCH 3/4] libceph: use cursor for inbound data pages

2013-03-12 Thread Alex Elder
The cursor code for a page array selects the right page, page offset, and length to use for a ceph_tcp_recvpage() call, so we can use it to replace a block in read_partial_message_pages(). This partially resolves: http://tracker.ceph.com/issues/4428 Signed-off-by: Alex Elder

[PATCH 4/4] libceph: get rid of read helpers

2013-03-12 Thread Alex Elder
Now that read_partial_message_pages() and read_partial_message_bio() are literally identical functions we can factor them out. They're pretty simple as well, so just move their relevant content into read_partial_msg_data(). This is and previous patches together resolve:

[PATCH] libceph: collapse all data items into one

2013-03-12 Thread Alex Elder
It turns out that only one of the data item types is ever used at any one time in a single message (currently). - A page array is used by the osd client (on behalf of the file system) and by rbd. Only one osd op (and therefore at most one data item) is ever used at a time by rbd.

[PATCH 0/4] libceph: get rid of ceph_msg_pos

2013-03-12 Thread Alex Elder
Previously a ceph_msg_pos structure contained information for iterating through the data in a message. Now we use information in a data item's cursor for that purpose. This series completes the switchover to use of the cursor and then eliminates that definition of ceph_msg_pos and the functions

[PATCH 1/4] libceph: use cursor resid for loop condition

2013-03-12 Thread Alex Elder
Use the resid field of a cursor rather than finding when the message data position has moved up to meet the data length to determine when all data has been sent or received in write_partial_message_data() and read_partial_msg_data(). This is cleanup of old code related to:

[PATCH 2/4] libceph: kill most of ceph_msg_pos

2013-03-12 Thread Alex Elder
All but one of the fields in the ceph_msg_pos structure are now never used (only assigned), so get rid of them. This allows several small blocks of code to go away. This is cleanup of old code related to: http://tracker.ceph.com/issues/4428 Signed-off-by: Alex Elder el...@inktank.com ---

[PATCH 3/4] libceph: kill last of ceph_msg_pos

2013-03-12 Thread Alex Elder
The only remaining field in the ceph_msg_pos structure is did_page_crc. In the new cursor model of things that flag (or something like it) belongs in the cursor. Define a new field need_crc in the cursor (which applies to all types of data) and initialize it to true whenever a cursor is

[PATCH 4/4] libceph: use only ceph_msg_data_advance()

2013-03-12 Thread Alex Elder
The *_msg_pos_next() functions do little more than call ceph_msg_data_advance(). Replace those wrapper functions with a simple call to ceph_msg_data_advance(). This cleanup is related to: http://tracker.ceph.com/issues/4428 Signed-off-by: Alex Elder el...@inktank.com ---

[PATCH] libceph: make message data be a pointer

2013-03-12 Thread Alex Elder
Begin the transition from a single message data item to a list of them by replacing the data structure in a message with a pointer to a ceph_msg_data structure. A null pointer will indicate the message has no data; replace the use of ceph_msg_has_data() with a simple check for a null pointer.

Re: [PATCH 2/2] ceph: use i_release_count to indicate dir's completeness

2013-03-12 Thread Greg Farnum
On Monday, March 11, 2013 at 5:42 AM, Yan, Zheng wrote: From: Yan, Zheng zheng.z@intel.com Current ceph code tracks directory's completeness in two places. ceph_readdir() checks i_release_count to decide if it can set the I_COMPLETE flag in i_ceph_flags. All other places check the

Re: [PATCH 2/2] ceph: use i_release_count to indicate dir's completeness

2013-03-12 Thread Yehuda Sadeh
On Tue, Mar 12, 2013 at 9:50 PM, Yan, Zheng zheng.z@intel.com wrote: On 03/13/2013 09:24 AM, Greg Farnum wrote: On Monday, March 11, 2013 at 5:42 AM, Yan, Zheng wrote: From: Yan, Zheng zheng.z@intel.com Current ceph code tracks directory's completeness in two places. ceph_readdir()

Re: [PATCH 2/2] ceph: use i_release_count to indicate dir's completeness

2013-03-12 Thread Gregory Farnum
On Tuesday, March 12, 2013 at 9:50 PM, Yan, Zheng wrote: On 03/13/2013 09:24 AM, Greg Farnum wrote: On Monday, March 11, 2013 at 5:42 AM, Yan, Zheng wrote: From: Yan, Zheng zheng.z@intel.com (mailto:zheng.z@intel.com) Current ceph code tracks directory's completeness in two