Recommended number of pools, one Q. ever wanted to ask

2012-02-28 Thread Oliver Francke
Hi *, well, there was once a comment on our layout in means of too many pools. Our setup is to have a pool per customer, to simplify the view on used storage capacity. So, if we have - in a couple of months, we hope - more then some hundred customers, this setup was not recommended, cause the

Re: Recommended number of pools, one Q. ever wanted to ask

2012-02-28 Thread Wido den Hollander
Hi, On 02/28/2012 10:35 AM, Oliver Francke wrote: Hi *, well, there was once a comment on our layout in means of too many pools. Our setup is to have a pool per customer, to simplify the view on used storage capacity. So, if we have - in a couple of months, we hope - more then some hundred

Re: Recommended number of pools, one Q. ever wanted to ask

2012-02-28 Thread Oliver Francke
Well, On 02/28/2012 10:42 AM, Wido den Hollander wrote: Hi, On 02/28/2012 10:35 AM, Oliver Francke wrote: Hi *, well, there was once a comment on our layout in means of too many pools. Our setup is to have a pool per customer, to simplify the view on used storage capacity. So, if we have -

[PATCH] init: Actually do start the daemons when 'service ceph start type' is specified

2012-02-28 Thread Wido den Hollander
A bug in my previous patch prevented any daemon with auto_start set to false from starting. This patch allows: * /etc/init.d/ceph start osd|mds|mon * service ceph start osd|mds|mon It however does not start daemons if auto_start is disabled when you invoke: * /etc/init.d/ceph start * service

RadosGW problems with copy in s3

2012-02-28 Thread Sławomir Skowron
After some parallel copy command via botto for many files everything, going to slow down, and eventualy got timeout from nginx@radosgw. # ceph -s 2012-02-28 12:16:57.818566pg v20743: 8516 pgs: 8516 active+clean; 2154 MB data, 53807 MB used, 20240 GB / 21379 GB avail 2012-02-28 12:16:57.845274

Re: ceph v0.42 sync

2012-02-28 Thread Wido den Hollander
Hi, On 02/27/2012 07:01 AM, Laszlo Boszormenyi wrote: On Sun, 2012-02-26 at 21:23 -0800, Sage Weil wrote: v0.42.2 for now.. v0.43 should be out either Friday or the following Monday. OK, will get v0.42.2 from git, as the homepage has only v0.42 for download. I've just submitted a patch

Re: [WRN] map e### wrongly marked me down or wrong addr

2012-02-28 Thread Székelyi Szabolcs
On 2012. February 27. 09:03:11 Sage Weil wrote: On Mon, 27 Feb 2012, Székelyi Szabolcs wrote: whenever I restart osd.0 I see a pair of messages like 2012-02-27 17:26:00.132666 mon.0 osd_1_ip:6789/0 106 : [INF] osd.0 osd_0_ip:6801/29931 failed (by osd.1 osd_1_ip:6806/20125) 2012-02-27

Re: Repeated messages of heartbeat_check: no heartbeat from

2012-02-28 Thread Wido den Hollander
Hi, On 02/24/2012 06:18 AM, Gregory Farnum wrote: On Thu, Feb 23, 2012 at 2:45 AM, Wido den Hollanderw...@widodh.nl wrote: Hi, On 02/22/2012 07:08 PM, Gregory Farnum wrote: Wido, Sorry we lost track of this last week — we were all distracted by FAST 12! :) No problem! So it looks

Re: Recommended number of pools, one Q. ever wanted to ask

2012-02-28 Thread Wido den Hollander
Hi, On 02/28/2012 10:50 AM, Oliver Francke wrote: Well, On 02/28/2012 10:42 AM, Wido den Hollander wrote: Hi, On 02/28/2012 10:35 AM, Oliver Francke wrote: Hi *, well, there was once a comment on our layout in means of too many pools. Our setup is to have a pool per customer, to simplify

Re: [WRN] map e### wrongly marked me down or wrong addr

2012-02-28 Thread Gregory Farnum
2012/2/28 Székelyi Szabolcs szeke...@niif.hu: On 2012. February 27. 09:03:11 Sage Weil wrote: On Mon, 27 Feb 2012, Székelyi Szabolcs wrote: whenever I restart osd.0 I see a pair of messages like 2012-02-27 17:26:00.132666 mon.0 osd_1_ip:6789/0 106 : [INF] osd.0 osd_0_ip:6801/29931 failed

Re: Recommended number of pools, one Q. ever wanted to ask

2012-02-28 Thread Sage Weil
On Tue, 28 Feb 2012, Wido den Hollander wrote: Hi, On 02/28/2012 10:50 AM, Oliver Francke wrote: Well, On 02/28/2012 10:42 AM, Wido den Hollander wrote: Hi, On 02/28/2012 10:35 AM, Oliver Francke wrote: Hi *, well, there was once a comment on our layout in means

Re: RadosGW problems with copy in s3

2012-02-28 Thread Yehuda Sadeh Weinraub
On Tue, Feb 28, 2012 at 3:43 AM, Sławomir Skowron slawomir.skow...@gmail.com wrote: After some parallel copy command via botto for many files everything, going to slow down, and eventualy got timeout from nginx@radosgw. # ceph -s 2012-02-28 12:16:57.818566    pg v20743: 8516 pgs: 8516

Re: ceph config without ctrfs

2012-02-28 Thread Sage Weil
On Mon, 27 Feb 2012, Tommi Virtanen wrote: On Mon, Feb 27, 2012 at 15:29, Gregory Farnum gregory.far...@dreamhost.com wrote: On Mon, Feb 27, 2012 at 3:18 PM, Matt Weil mw...@genome.wustl.edu wrote: How do you do this type of config (specifying multiple devices per server) on xfs not btrfs?

Re: [WRN] map e### wrongly marked me down or wrong addr

2012-02-28 Thread Székelyi Szabolcs
On 2012. February 28. 08:16:34 Gregory Farnum wrote: 2012/2/28 Székelyi Szabolcs szeke...@niif.hu: On 2012. February 27. 09:03:11 Sage Weil wrote: On Mon, 27 Feb 2012, Székelyi Szabolcs wrote: whenever I restart osd.0 I see a pair of messages like 2012-02-27 17:26:00.132666 mon.0

Implication of using rados_ioctx_locator_set_key

2012-02-28 Thread Noah Watkins
I'm curious about what performance implications there may be when using rados_ioctx_locator_set_key. If a large number of objects are forced into a single PG using a fixed locator key, are there performance implications for subsequent look-ups by object name? Is some type of index structure

Re: Implication of using rados_ioctx_locator_set_key

2012-02-28 Thread Gregory Farnum
On Tue, Feb 28, 2012 at 10:07 AM, Noah Watkins jayh...@cs.ucsc.edu wrote: I'm curious about what performance implications there may be when using rados_ioctx_locator_set_key. If a large number of objects are forced into a single PG using a fixed locator key, are there performance

Re: RadosGW problems with copy in s3

2012-02-28 Thread Yehuda Sadeh Weinraub
(resending to list) On Tue, Feb 28, 2012 at 11:53 AM, Sławomir Skowron slawomir.skow...@gmail.com wrote: 2012/2/28 Yehuda Sadeh Weinraub yehuda.sa...@dreamhost.com: On Tue, Feb 28, 2012 at 3:43 AM, Sławomir Skowron slawomir.skow...@gmail.com wrote: After some parallel copy command via

Patch Bomb!

2012-02-28 Thread Alex Elder
Over the next few hours I plan to send out a lot of patches for review. Most of these have been around and under test for a month or more, and at the time I originally committed them I wasn't sure how best to get them reviewed. We decided that sending them all out to the list was a good way to

Re: Patch Bomb!

2012-02-28 Thread Alex Elder
On 02/28/2012 06:14 PM, Alex Elder wrote: Over the next few hours I plan to send out a lot of patches for review. Most of these have been around and under test for a month or more, and at the time I originally committed them I wasn't sure how best to get them reviewed. We decided that sending

[PATCH] ceph: don't reset s_cap_ttl to zero

2012-02-28 Thread Alex Elder
Avoid the need to check for a special zero s_cap_ttl value by just using (jiffies - 1) as the value assigned to indicate sometime in the past. Signed-off-by: Alex Elder el...@dreamhost.com Reviewed-by: Sage Weil s...@newdream.net --- fs/ceph/mds_client.c |7 +++ 1 files changed, 3

[PATCH 3/4] ceph: eliminate some needless casts

2012-02-28 Thread Alex Elder
This eliminates type casts in some places where they are not required. Signed-off-by: Alex Elder el...@newdream.net --- net/ceph/messenger.c | 21 ++--- 1 files changed, 10 insertions(+), 11 deletions(-) diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c index

[PATCH 4/4] ceph: eliminate some abusive casts

2012-02-28 Thread Alex Elder
This fixes some spots where a type cast to (void *) was used as as a universal type hiding mechanism. Instead, properly cast the type to the intended target type. Signed-off-by: Alex Elder el...@newdream.net --- net/ceph/messenger.c |8 1 files changed, 4 insertions(+), 4

[PATCH 1/4] ceph: make use of else where appropriate

2012-02-28 Thread Alex Elder
Rearrange ceph_tcp_connect() a bit, making use of else rather than re-testing a value with consecutive if statements. Don't record a connection's socket pointer unless the connect operation is successful. Signed-off-by: Alex Elder el...@dreamhost.com --- net/ceph/messenger.c | 11 ---

[PATCH 2/4] ceph: kill addr_str_lock spinlock; use atomic instead

2012-02-28 Thread Alex Elder
A spinlock is used to protect a value used for selecting an array index for a string used for formatting a socket address for human consumption. The index is reset to 0 if it ever reaches the maximum index value. Instead, use an ever-increasing atomic variable as a sequence number, and compute

[PATCH] ceph: don't null-terminate xattr values

2012-02-28 Thread Alex Elder
For some reason, ceph_setxattr() allocates an extra byte in which a '\0' is stored past the end of an extended attribute value. This is not needed, and is potentially misleading, so get rid of it. Signed-off-by: Alex Elder el...@dreamhost.com --- fs/ceph/xattr.c |4 +--- 1 files changed, 1

[PATCH] ceph: pass inode rather than table to ceph_match_vxattr()

2012-02-28 Thread Alex Elder
All callers of ceph_match_vxattr() determine what to pass as the first argument by calling ceph_inode_vxattrs(inode). Just do that inside ceph_match_vxattr() itself, changing it to take an inode rather than the vxattr pointer as its first argument. Also ensure the function works correctly for

[PATCH 0/6] ceph: virtual extended attribute cleanup

2012-02-28 Thread Alex Elder
This series cleans up some code involving ceph's virtual extended attributes. Three of them define some simple macros are set up to help ensure the attributes are defined in a consistent way. One makes the size of certain constant values get defined at startup time rather than repeatedly, and

[PATCH 1/6] ceph: use a symbolic name for ceph. extended attribute namespace

2012-02-28 Thread Alex Elder
Use symbolic constants to define the top-level prefix for ceph. extended attribute names. Signed-off-by: Alex Elder el...@dreamhost.com --- fs/ceph/xattr.c | 25 ++--- 1 files changed, 14 insertions(+), 11 deletions(-) diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c index

[PATCH 2/6] ceph: use macros to normalize vxattr table definitions

2012-02-28 Thread Alex Elder
Entries in the ceph virtual extended attribute tables all follow a distinct pattern in their definition. Enforce this pattern through the use of a macro. Also, a null name field signals the end of the table, so make that be the first field in the ceph_vxattr_cb structure. Signed-off-by: Alex

[PATCH 3/6] ceph: drop _cb from name of struct ceph_vxattr_cb

2012-02-28 Thread Alex Elder
A struct ceph_vxattr_cb does not represent a callback at all, but rather a virtual extended attribute itself. Drop the _cb suffix from its name to reflect that. Signed-off-by: Alex Elder el...@dreamhost.com --- fs/ceph/xattr.c | 20 ++-- 1 files changed, 10 insertions(+), 10

[PATCH 4/6] ceph: encode type in vxattr callback routines

2012-02-28 Thread Alex Elder
The names of the callback functions used for virtual extended attributes are based only on the last component of the attribute name. Because of the way these are defined, this precludes allowing a single (lowest) attribute name for different callbacks, dependent on the type of file being

[PATCH 6/6] ceph: make ceph_setxattr() and ceph_removexattr() more alike

2012-02-28 Thread Alex Elder
This patch just rearranges a few bits of code to make more portions of ceph_setxattr() and ceph_removexattr() identical. Signed-off-by: Alex Elder el...@dreamhost.com --- fs/ceph/xattr.c | 15 --- 1 files changed, 8 insertions(+), 7 deletions(-) diff --git a/fs/ceph/xattr.c

[PATCH 0/4] rbd: miscellaneous cleanups

2012-02-28 Thread Alex Elder
This series makes a few small unrelated changes to the rbd code. The first is a set of simple cleanups. The second makes ceph_parse_options() return a pointer to make the interface a little more obvious. The third gets rid of a duplicate copy of the pointer to a ceph_client held in an

[PATCH 3/4] rbd: do not duplicate ceph_client pointer in rbd_device

2012-02-28 Thread Alex Elder
The rbd_device structure maintains a duplicate copy of the ceph_client pointer maintained in its rbd_client structure. There appears to be no good reason for this, and its presence presents a risk of them getting out of synch or otherwise misused. So kill it off, and use the rbd_client copy

[PATCH 4/4] rbd: use a single value of snap_name to mean no snap

2012-02-28 Thread Alex Elder
From Josh Durgin josh.dur...@dreamhost.com There's already a constant for this anyway. (I changed Josh's code to use memcmp() and memcpy() instead. -Alex) Signed-off-by: Alex Elder el...@dreamhost.com --- drivers/block/rbd.c |8 +++- 1 files changed, 3 insertions(+), 5 deletions(-)

[PATCH 0/5] rbd: improve how rbd ids are selected

2012-02-28 Thread Alex Elder
New rbd devices are granted a unique identifiers based on how many devices are already in existence. This series rearranges how that is done, switching from using a spinlock to using an atomic variable to select the next rbd id to use. In the process a bit of the code got a bit more isolated.

[PATCH 2/5] rbd: encapsulate new rbd id selection

2012-02-28 Thread Alex Elder
Move the loop that finds a new unique rbd id to use into its own helper function. Signed-off-by: Alex Elder el...@dreamhost.com --- drivers/block/rbd.c | 30 +++--- 1 files changed, 19 insertions(+), 11 deletions(-) diff --git a/drivers/block/rbd.c

[PATCH 2/5] rbd: rework calculation of new rbd id's

2012-02-28 Thread Alex Elder
In order to select a new unique identifier for an added rbd device, the list of all existing ones is searched and a value one greater than the highest id is used. The list search can be avoided by using an atomic variable that keeps track of the current highest id. Using a get/put model for

[PATCH 3/5] rbd: protect the rbd_dev_list with a spinlock

2012-02-28 Thread Alex Elder
The rbd_dev_list is just a simple list of all the current rbd_devices. Using the ctl_mutex as a concurrency guard is overkill. Instead, use a spinlock for that specific purpose. This also reduces the window that the ctl_mutex needs to be held in rbd_add(). Signed-off-by: Alex Elder

[PATCH 4/5] rbd: tie rbd_dev_list changes to rbd_id operations

2012-02-28 Thread Alex Elder
The only time entries are added to or removed from the global rbd_dev_list is exactly when a put or get operation is being performed on a rbd_dev's id. So just move the list management code into get/put routines. Signed-off-by: Alex Elder el...@dreamhost.com --- drivers/block/rbd.c | 47

[PATCH 5/5] rbd: restore previous rbd id sequence behavior

2012-02-28 Thread Alex Elder
It used to be that selecting a new unique identifier for an added rbd device required searching all existing ones to find the highest id is used. A recent change made that unnecessary, but made it so that id's used were monotonically non-decreasing. It's a bit more pleasant to have smaller rbd

[PATCH 0/4] rbd: client list locking improvements

2012-02-28 Thread Alex Elder
This series reduces the window during which the client list lock is held, gives it a more meaningful name, and moves the locking calls closer to the places they're really needed. -Alex -- To unsubscribe from this list: send the line unsubscribe ceph-devel

[PATCH 1/4] rbd: release client list lock sooner

2012-02-28 Thread Alex Elder
In rbd_get_client(), if a client is reused, a number of things get done while still holding the list lock unnecessarily. This just moves a few things that need no lock protection outside the lock. Signed-off-by: Alex Elder el...@dreamhost.com --- drivers/block/rbd.c | 10 ++ 1 files

[PATCH 2/4] rbd: move ctl_mutex lock inside rbd_get_client()

2012-02-28 Thread Alex Elder
Since rbd_get_client() is only called in one place, move the acquisition of the mutex around that call inside that function. Furthermore, within rbd_get_client(), it appears the mutex only needs to be held while calling rbd_client_create(). (Moving the lock inside that function will wait for

[PATCH 4/4] rbd: rename node_lock

2012-02-28 Thread Alex Elder
The spinlock used to protect rbd_client_list is named node_lock. Rename it to rbd_client_list_lock to make it more obvious what it's for. Signed-off-by: Alex Elder el...@dreamhost.com --- drivers/block/rbd.c | 20 ++-- 1 files changed, 10 insertions(+), 10 deletions(-) diff

[PATCH] rbd: a few simple changes

2012-02-28 Thread Alex Elder
Here are a few very simple cleanups: - Add a RBD_ prefix to the two driver name string definitions. - Move the definition of struct rbd_request below struct rbd_req_coll to avoid the need for an empty declaration of the latter. - Move and group the definitions of

[PATCH 0/3] rbd: minor cleanups in rbd_add()

2012-02-28 Thread Alex Elder
This series affects rbd_add(). It calls rbd_get_client(), and by making that function return a client pointer it makes it more obvious that the client pointer is getting assigned. It also reduces the amount of memory allocated to hold the monitor address and options passed from the user. And

[PATCH 1/3] rbd: have rbd_get_client() return a rbd_client

2012-02-28 Thread Alex Elder
Since rbd_get_client() currently returns an error code. It assigns the rbd_client field of the rbd_device structure it is passed if successful. Instead, have it return the created rbd_client structure and return a pointer-coded error if there is an error. This makes the assignment of the client

[PATCH 2/3] rbd: reduce memory used for rbd_dev fields

2012-02-28 Thread Alex Elder
The length of the string containing the monitor address specification(s) will never exceed the length of the string passed in to rbd_add(). The same holds true for the ceph + rbd options string. So reduce the amount of memory allocated for these to that length rather than the maximum (1024

[PATCH 3/3] rbd: simplify error handling in rbd_add()

2012-02-28 Thread Alex Elder
If a couple pointers are initialized to NULL then a single out_nomem label can be used for all of the memory allocation failure cases in rbd_add(). Also, get rid of the irc local variable there. There is no real need for rc to be type ssize_t, and it can be used in the spot irc was.

[PATCH 0/5] rbd: cleanups related to argument parsing

2012-02-28 Thread Alex Elder
This series affects the way arguments are parsed in rbd_add(). It first encapsulates the code into its own helper function. Then it uses a few simple tokenization functions instead of sscanf() to parse the string provided, which makes it possible to do a better job of error checking the input.

[PATCH 1/5] rbd: encapsulate argument parsing for rbd_add()

2012-02-28 Thread Alex Elder
Move the code that parses the arguments provided to rbd_add() (which are supplied via /sys/bus/rbd/add) into a separate function. Also rename the mon_dev_name variable in rbd_add() to be mon_addrs. The variable represents a list of one or more comma-separated monitor IP addresses, each with an

[PATCH 2/5] rbd: don't use sscanf() in rbd_add_parse_args()

2012-02-28 Thread Alex Elder
Make use of a few simple helper routines to parse the arguments rather than sscanf(). This will treat both missing and too-long arguments as invalid input (rather than silently truncating the input in the too-long case). In time this can also be used by rbd_add() to use the passed-in buffer in

[PATCH 3/5] rbd: do a few checks at build time

2012-02-28 Thread Alex Elder
This is a bit gratuitous, but there are a few things that can be verified at build time rather than run time, so do that. Signed-off-by: Alex Elder el...@dreamhost.com --- drivers/block/rbd.c | 15 --- 1 files changed, 12 insertions(+), 3 deletions(-) diff --git

[PATCH 4/5] rbd: have rbd_parse_args() report found mon_addrs size

2012-02-28 Thread Alex Elder
The argument parsing routine already computes the size of the mon_addrs buffer it extracts from the command. Pass it to the caller so it can use it to provide the length to rbd_get_client(). Signed-off-by: Alex Elder el...@dreamhost.com --- drivers/block/rbd.c | 20 +--- 1

[PATCH 5/5] rbd: don't allocate mon_addrs buffer in rbd_add()

2012-02-28 Thread Alex Elder
The mon_addrs buffer in rbd_add is used to hold a copy of the monitor IP addresses supplied via /sys/bus/rbd/add. That is passed to rbd_get_client(), which never modifies it (nor do any of the functions it gets passed to thereafter)--the mon_addr parameter to rbd_get_client() is a pointer to

[PATCH] ceph: avoid panic with mismatched symlink sizes in fill_inode()

2012-02-28 Thread Alex Elder
Return -EINVAL rather than panic if iinfo-symlink_len and inode-i_size do not match. Also use kstrndup rather than kmalloc/memcpy. Signed-off-by: Xi Wang xi.w...@gmail.com Reviewed-by: Alex Elder el...@dreamhost.com --- fs/ceph/inode.c | 11 ++- 1 files changed, 6 insertions(+), 5

[PATCH] rbd: fix module sysfs setup/teardown code

2012-02-28 Thread Alex Elder
Once rbd_bus_type is registered, it allows an add operation via the /sys/bus/rbd/add bus attribute, and adding a new rbd device that way establishes a connection between the device and rbd_root_dev. But rbd_root_dev is not registered until after the rbd_bus_type registration is complete. This

[PATCH 0/2] rbd: more miscellaneous cleanups

2012-02-28 Thread Alex Elder
The patch messages explain the changes in detail. -Alex -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ceph: use a shared zero page rather than one per messenger

2012-02-28 Thread Christoph Hellwig
On Tue, Feb 28, 2012 at 07:06:22PM -0800, Alex Elder wrote: Each messenger allocates a page to be used when writing zeroes out in the event of error or other abnormal condition. Just allocate one at initialization time and have them all share it. Any reason you don't simply use the

Re: [PATCH 0/6] ceph: virtual extended attribute cleanup

2012-02-28 Thread Christoph Hellwig
On Tue, Feb 28, 2012 at 07:17:41PM -0800, Alex Elder wrote: This series cleans up some code involving ceph's virtual extended attributes. Three of them define some simple macros are set up to help ensure the attributes are defined in a consistent way. One makes the size of certain constant

Re: [PATCH] ceph: pass inode rather than table to ceph_match_vxattr()

2012-02-28 Thread Yehuda Sadeh Weinraub
On Tue, Feb 28, 2012 at 7:13 PM, Alex Elder el...@dreamhost.com wrote: All callers of ceph_match_vxattr() determine what to pass as the first argument by calling ceph_inode_vxattrs(inode).  Just do that inside ceph_match_vxattr() itself, changing it to take an inode rather than the vxattr

[PATCH 2/2] rbd: small changes

2012-02-28 Thread Alex Elder
Here is another set of small code tidy-ups: - Define SECTOR_SHIFT and SECTOR_SIZE, and use these symbolic names throughout. Tell the blk_queue system our physical block size, in the (unlikely) event we want to use something other than the default. - Delete the

[PATCH 1/2] rbd: do some refactoring

2012-02-28 Thread Alex Elder
A few blocks of code are rearranged a bit here: - In rbd_header_from_disk(): - Don't bother computing snap_count until we're sure the on-disk header starts with a good signature. - Move a few independent lines of code so they are *after* a check for a

[PATCH 0/2] rbd: support additional Boolean rbd_dev flags

2012-02-28 Thread Alex Elder
This series actually added another flag to the rbd_dev, but the need for that went away. Still, I thought these two changes added value so I kept them. -Alex -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message

[PATCH 2/2] rbd: convert to using flags field

2012-02-28 Thread Alex Elder
To allow having more than just read-only as Boolean state associated with an rbd_device, replace the read_only field with a more general flags field. The non-atomic versions of bit operations are fine for our purposes. Signed-off-by: Alex Elder el...@dreamhost.com --- drivers/block/rbd.c |

[PATCH] rbd: don't drop the rbd_id too early

2012-02-28 Thread Alex Elder
Currently an rbd device's id is released when it is removed, but it is done before the code is run to clean up sysfs-related files (such as /sys/bus/rbd/devices/1). It's possible that an rbd is still in use after the rbd_remove() call has been made. It's essentially the same as an active inode

[PATCH] libceph: move prepare_write_banner()

2012-02-28 Thread Alex Elder
One of the arguments to prepare_write_connect() indicates whether it is being called immediately after a call to prepare_write_banner(). Move the prepare_write_banner() call inside prepare_write_connect(), and reinterpret (and rename) the after_banner argument so it indicates that

[PATCH] libceph: encapsulate connection kvec operations

2012-02-28 Thread Alex Elder
Encapsulate the operation of adding a new chunk of data to the next open slot in a ceph_connection's out_kvec array. Also add a reset operation to make subsequent add operations start at the beginning of the array again. Use these routines throughout, avoiding duplicate code and ensuring all

[PATCH 0/4] libceph: miscellaneous cleanups

2012-02-28 Thread Alex Elder
More cleanups, this time in the messenger code. -Alex -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH 2/4] libceph: encapsulate some messenger cleanup code

2012-02-28 Thread Alex Elder
Define a helper function to perform various cleanup operations. Use it both in the exit routine and in the init routine in the event of an error. Signed-off-by: Alex Elder el...@dreamhost.com --- net/ceph/messenger.c | 38 -- 1 files changed, 20

[PATCH 3/4] libceph: make ceph_tcp_connect() return int

2012-02-28 Thread Alex Elder
There is no real need for ceph_tcp_connect() to return the socket pointer it creates, since it already assigns it to con-sock, which is visible to the caller. Instead, have it return an error code, which tidies things up a bit. Signed-off-by: Alex Elder el...@dreamhost.com ---

[PATCH 4/4] libceph: a few small changes

2012-02-28 Thread Alex Elder
This gathers a number of very minor changes: - use %hu when formatting the a socket address's address family - null out the ceph_msgr_wq pointer after the queue has been destroyed - drop a needless cast in ceph_write_space() - add a WARN() call in ceph_state_change() in the

[PATCH 1/4] libceph: make ceph_msgr_wq private

2012-02-28 Thread Alex Elder
The messenger workqueue has no need to be public. So give it static scope. Signed-off-by: Alex Elder el...@dreamhost.com --- include/linux/ceph/messenger.h |2 -- net/ceph/messenger.c |2 +- 2 files changed, 1 insertions(+), 3 deletions(-) diff --git

Re: [PATCH 5/6] ceph: avoid repeatedly computing the size of constant vxattr names

2012-02-28 Thread Yehuda Sadeh Weinraub
On Tue, Feb 28, 2012 at 7:21 PM, Alex Elder el...@dreamhost.com wrote: All names defined in the directory and file virtual extended attribute tables are constant, and the size of each is known at compile time.  So there's no need to compute their length every time any file's attribute is

[PATCH 0/3] libceph: clean up some code involving CRCs

2012-02-28 Thread Alex Elder
Rename a few variables so it's clearer which indicate that a CRC should be computed and which hold an actual CRC value. Separate CRC calculation from byte swapping to improve readability. And move some code that runs on only on the last pass through a couple of loops outside the loop body.

[PATCH 1/3] libceph: use do in CRC-related Boolean variables

2012-02-28 Thread Alex Elder
Change the name (and type) of a few CRC-related Boolean local variables so they contain the word do, to distingish their purpose from variables used for holding an actual CRC value. Note that in the process of doing this I identified a fairly serious logic error in write_partial_msg_pages():

[PATCH 2/3] libceph: separate CRC calculation from byte swapping

2012-02-28 Thread Alex Elder
Calculate CRC in a separate step from rearranging the byte order of the result, to improve clarity and readability. Use offsetof() to determine the number of bytes to include in the CRC calculation. In read_partial_message(), switch which value gets byte-swapped, since the just-computed CRC is

[PATCH 3/3] libceph: do crc calculations outside loop

2012-02-28 Thread Alex Elder
Move blocks of code out of loops in read_partial_message_section() and read_partial_message(). They were only was getting called at the end of the last iteration of the loop anyway. Signed-off-by: Alex Elder el...@dreamhost.com --- net/ceph/messenger.c | 26 -- 1

[PATCH 0/2] libceph: more miscellaneous cleanups

2012-02-28 Thread Alex Elder
Messaging code again. The biggest improvement here is encapsulating the code that puts data into the write vectors. The second patch just adds more simple cleanups. -Alex -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body

Re: [PATCH 4/4] rbd: use a single value of snap_name to mean no snap

2012-02-28 Thread Yehuda Sadeh Weinraub
On Tue, Feb 28, 2012 at 7:35 PM, Alex Elder el...@dreamhost.com wrote: From Josh Durgin josh.dur...@dreamhost.com There's already a constant for this anyway. (I changed Josh's code to use memcmp() and memcpy() instead. -Alex) Signed-off-by: Alex Elder el...@dreamhost.com ---  

Re: [PATCH 0/2] libceph: more miscellaneous cleanups

2012-02-28 Thread Alex Elder
On 02/28/2012 08:52 PM, Alex Elder wrote: Messaging code again. The biggest improvement here is encapsulating the code that puts data into the write vectors. The second patch just adds more simple cleanups. What I meant to say was... These are two simple patches, the first is just a simple

[PATCH 1/2] libceph: small refactor in write_partial_kvec()

2012-02-28 Thread Alex Elder
Make a small change in the code that counts down kvecs consumed by a ceph_tcp_sendmsg() call. Same functionality, just blocked out a little differently. Signed-off-by: Alex Elder el...@dreamhost.com --- net/ceph/messenger.c | 23 --- 1 files changed, 12 insertions(+), 11

[PATCH 2/2] libceph: some simple changes

2012-02-28 Thread Alex Elder
Nothing too big here. - define the size of the buffer used for consuming ignored incoming data using a symbolic constant - simplify the condition determining whether to unmap the page in write_partial_msg_pages(): do it for crc but not if the page is the zero page

[PATCH] libceph: fix overflow check in crush_decode()

2012-02-28 Thread Alex Elder
The existing overflow check (n ULONG_MAX / b) didn't work, because n = ULONG_MAX / b would both bypass the check and still overflow the allocation size a + n * b. The correct check should be (n (ULONG_MAX - a) / b). Signed-off-by: Xi Wang xi.w...@gmail.com Signed-off-by: Sage Weil

[PATCH] ceph: fix overflow check in build_snap_context()

2012-02-28 Thread Alex Elder
The overflow check for a + n * b should be (n (ULONG_MAX - a) / b), rather than (n ULONG_MAX / b - a). Signed-off-by: Xi Wang xi.w...@gmail.com Signed-off-by: Sage Weil s...@newdream.net --- fs/ceph/snap.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git

Re: Patch Bomb!

2012-02-28 Thread Alex Elder
On 02/28/2012 06:14 PM, Alex Elder wrote: Over the next few hours I plan to send out a lot of patches for review. Most of these have been around and under test for a month or more, and at the time I originally committed them I wasn't sure how best to get them reviewed. We decided that sending

Fwd: Re: [PATCH] ceph: use a shared zero page rather than one per messenger

2012-02-28 Thread Alex Elder
Neglected to copy the list on my response. -Alex Original Message Subject: Re: [PATCH] ceph: use a shared zero page rather than one per messenger Date: Tue, 28 Feb 2012 21:04:07 -0800 From: Alex Elder el...@dreamhost.com To: Christoph Hellwig h...@infradead.org On

Re: [PATCH 0/6] ceph: virtual extended attribute cleanup

2012-02-28 Thread Alex Elder
On 02/28/2012 08:20 PM, Christoph Hellwig wrote: On Tue, Feb 28, 2012 at 07:17:41PM -0800, Alex Elder wrote: This series cleans up some code involving ceph's virtual extended attributes. Three of them define some simple macros are set up to help ensure the attributes are defined in a

Re: [PATCH 4/4] rbd: use a single value of snap_name to mean no snap

2012-02-28 Thread Alex Elder
On 02/28/2012 08:53 PM, Yehuda Sadeh Weinraub wrote: On Tue, Feb 28, 2012 at 7:35 PM, Alex Elderel...@dreamhost.com wrote: From Josh Durginjosh.dur...@dreamhost.com There's already a constant for this anyway. (I changed Josh's code to use memcmp() and memcpy() instead. -Alex)