Hi *,
well, there was once a comment on our layout in means of too many pools.
Our setup is to have a pool per customer, to simplify the view on used
storage
capacity.
So, if we have - in a couple of months, we hope - more then some hundred
customers, this setup was not recommended, cause the
Hi,
On 02/28/2012 10:35 AM, Oliver Francke wrote:
Hi *,
well, there was once a comment on our layout in means of too many pools.
Our setup is to have a pool per customer, to simplify the view on used
storage
capacity.
So, if we have - in a couple of months, we hope - more then some hundred
Well,
On 02/28/2012 10:42 AM, Wido den Hollander wrote:
Hi,
On 02/28/2012 10:35 AM, Oliver Francke wrote:
Hi *,
well, there was once a comment on our layout in means of too many
pools.
Our setup is to have a pool per customer, to simplify the view on used
storage
capacity.
So, if we have -
A bug in my previous patch prevented any daemon with auto_start set to false
from starting.
This patch allows:
* /etc/init.d/ceph start osd|mds|mon
* service ceph start osd|mds|mon
It however does not start daemons if auto_start is disabled when you invoke:
* /etc/init.d/ceph start
* service
After some parallel copy command via botto for many files everything,
going to slow down, and eventualy got timeout from nginx@radosgw.
# ceph -s
2012-02-28 12:16:57.818566pg v20743: 8516 pgs: 8516 active+clean;
2154 MB data, 53807 MB used, 20240 GB / 21379 GB avail
2012-02-28 12:16:57.845274
Hi,
On 02/27/2012 07:01 AM, Laszlo Boszormenyi wrote:
On Sun, 2012-02-26 at 21:23 -0800, Sage Weil wrote:
v0.42.2 for now.. v0.43 should be out either Friday or the following
Monday.
OK, will get v0.42.2 from git, as the homepage has only v0.42 for
download.
I've just submitted a patch
On 2012. February 27. 09:03:11 Sage Weil wrote:
On Mon, 27 Feb 2012, Székelyi Szabolcs wrote:
whenever I restart osd.0 I see a pair of messages like
2012-02-27 17:26:00.132666 mon.0 osd_1_ip:6789/0 106 : [INF] osd.0
osd_0_ip:6801/29931 failed (by osd.1 osd_1_ip:6806/20125)
2012-02-27
Hi,
On 02/24/2012 06:18 AM, Gregory Farnum wrote:
On Thu, Feb 23, 2012 at 2:45 AM, Wido den Hollanderw...@widodh.nl wrote:
Hi,
On 02/22/2012 07:08 PM, Gregory Farnum wrote:
Wido,
Sorry we lost track of this last week — we were all distracted by FAST 12!
:)
No problem!
So it looks
Hi,
On 02/28/2012 10:50 AM, Oliver Francke wrote:
Well,
On 02/28/2012 10:42 AM, Wido den Hollander wrote:
Hi,
On 02/28/2012 10:35 AM, Oliver Francke wrote:
Hi *,
well, there was once a comment on our layout in means of too many
pools.
Our setup is to have a pool per customer, to simplify
2012/2/28 Székelyi Szabolcs szeke...@niif.hu:
On 2012. February 27. 09:03:11 Sage Weil wrote:
On Mon, 27 Feb 2012, Székelyi Szabolcs wrote:
whenever I restart osd.0 I see a pair of messages like
2012-02-27 17:26:00.132666 mon.0 osd_1_ip:6789/0 106 : [INF] osd.0
osd_0_ip:6801/29931 failed
On Tue, 28 Feb 2012, Wido den Hollander wrote:
Hi,
On 02/28/2012 10:50 AM, Oliver Francke wrote:
Well,
On 02/28/2012 10:42 AM, Wido den Hollander wrote:
Hi,
On 02/28/2012 10:35 AM, Oliver Francke wrote:
Hi *,
well, there was once a comment on our layout in means
On Tue, Feb 28, 2012 at 3:43 AM, Sławomir Skowron
slawomir.skow...@gmail.com wrote:
After some parallel copy command via botto for many files everything,
going to slow down, and eventualy got timeout from nginx@radosgw.
# ceph -s
2012-02-28 12:16:57.818566 pg v20743: 8516 pgs: 8516
On Mon, 27 Feb 2012, Tommi Virtanen wrote:
On Mon, Feb 27, 2012 at 15:29, Gregory Farnum
gregory.far...@dreamhost.com wrote:
On Mon, Feb 27, 2012 at 3:18 PM, Matt Weil mw...@genome.wustl.edu wrote:
How do you do this type of config (specifying multiple devices per server)
on xfs not btrfs?
On 2012. February 28. 08:16:34 Gregory Farnum wrote:
2012/2/28 Székelyi Szabolcs szeke...@niif.hu:
On 2012. February 27. 09:03:11 Sage Weil wrote:
On Mon, 27 Feb 2012, Székelyi Szabolcs wrote:
whenever I restart osd.0 I see a pair of messages like
2012-02-27 17:26:00.132666 mon.0
I'm curious about what performance implications there may be when using
rados_ioctx_locator_set_key.
If a large number of objects are forced into a single PG using a fixed locator
key, are there performance implications for subsequent look-ups by object name?
Is some type of index structure
On Tue, Feb 28, 2012 at 10:07 AM, Noah Watkins jayh...@cs.ucsc.edu wrote:
I'm curious about what performance implications there may be when using
rados_ioctx_locator_set_key.
If a large number of objects are forced into a single PG using a fixed
locator key, are there performance
(resending to list)
On Tue, Feb 28, 2012 at 11:53 AM, Sławomir Skowron
slawomir.skow...@gmail.com wrote:
2012/2/28 Yehuda Sadeh Weinraub yehuda.sa...@dreamhost.com:
On Tue, Feb 28, 2012 at 3:43 AM, Sławomir Skowron
slawomir.skow...@gmail.com wrote:
After some parallel copy command via
Over the next few hours I plan to send out a lot of patches
for review. Most of these have been around and under test
for a month or more, and at the time I originally committed
them I wasn't sure how best to get them reviewed.
We decided that sending them all out to the list was a
good way to
On 02/28/2012 06:14 PM, Alex Elder wrote:
Over the next few hours I plan to send out a lot of patches
for review. Most of these have been around and under test
for a month or more, and at the time I originally committed
them I wasn't sure how best to get them reviewed.
We decided that sending
Avoid the need to check for a special zero s_cap_ttl value by just
using (jiffies - 1) as the value assigned to indicate sometime in
the past.
Signed-off-by: Alex Elder el...@dreamhost.com
Reviewed-by: Sage Weil s...@newdream.net
---
fs/ceph/mds_client.c |7 +++
1 files changed, 3
This eliminates type casts in some places where they are not
required.
Signed-off-by: Alex Elder el...@newdream.net
---
net/ceph/messenger.c | 21 ++---
1 files changed, 10 insertions(+), 11 deletions(-)
diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
index
This fixes some spots where a type cast to (void *) was used as
as a universal type hiding mechanism. Instead, properly cast the
type to the intended target type.
Signed-off-by: Alex Elder el...@newdream.net
---
net/ceph/messenger.c |8
1 files changed, 4 insertions(+), 4
Rearrange ceph_tcp_connect() a bit, making use of else rather than
re-testing a value with consecutive if statements. Don't record a
connection's socket pointer unless the connect operation is
successful.
Signed-off-by: Alex Elder el...@dreamhost.com
---
net/ceph/messenger.c | 11 ---
A spinlock is used to protect a value used for selecting an array
index for a string used for formatting a socket address for human
consumption. The index is reset to 0 if it ever reaches the maximum
index value.
Instead, use an ever-increasing atomic variable as a sequence
number, and compute
For some reason, ceph_setxattr() allocates an extra byte in which a
'\0' is stored past the end of an extended attribute value. This is
not needed, and is potentially misleading, so get rid of it.
Signed-off-by: Alex Elder el...@dreamhost.com
---
fs/ceph/xattr.c |4 +---
1 files changed, 1
All callers of ceph_match_vxattr() determine what to pass as the
first argument by calling ceph_inode_vxattrs(inode). Just do that
inside ceph_match_vxattr() itself, changing it to take an inode
rather than the vxattr pointer as its first argument.
Also ensure the function works correctly for
This series cleans up some code involving ceph's virtual extended
attributes. Three of them define some simple macros are set up to
help ensure the attributes are defined in a consistent way. One
makes the size of certain constant values get defined at startup
time rather than repeatedly, and
Use symbolic constants to define the top-level prefix for ceph.
extended attribute names.
Signed-off-by: Alex Elder el...@dreamhost.com
---
fs/ceph/xattr.c | 25 ++---
1 files changed, 14 insertions(+), 11 deletions(-)
diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c
index
Entries in the ceph virtual extended attribute tables all follow a
distinct pattern in their definition. Enforce this pattern through
the use of a macro.
Also, a null name field signals the end of the table, so make that
be the first field in the ceph_vxattr_cb structure.
Signed-off-by: Alex
A struct ceph_vxattr_cb does not represent a callback at all, but
rather a virtual extended attribute itself. Drop the _cb suffix
from its name to reflect that.
Signed-off-by: Alex Elder el...@dreamhost.com
---
fs/ceph/xattr.c | 20 ++--
1 files changed, 10 insertions(+), 10
The names of the callback functions used for virtual extended
attributes are based only on the last component of the attribute
name. Because of the way these are defined, this precludes allowing
a single (lowest) attribute name for different callbacks, dependent
on the type of file being
This patch just rearranges a few bits of code to make more
portions of ceph_setxattr() and ceph_removexattr() identical.
Signed-off-by: Alex Elder el...@dreamhost.com
---
fs/ceph/xattr.c | 15 ---
1 files changed, 8 insertions(+), 7 deletions(-)
diff --git a/fs/ceph/xattr.c
This series makes a few small unrelated changes to the rbd code.
The first is a set of simple cleanups. The second makes
ceph_parse_options() return a pointer to make the interface
a little more obvious. The third gets rid of a duplicate
copy of the pointer to a ceph_client held in an
The rbd_device structure maintains a duplicate copy of the
ceph_client pointer maintained in its rbd_client structure. There
appears to be no good reason for this, and its presence presents a
risk of them getting out of synch or otherwise misused. So kill it
off, and use the rbd_client copy
From Josh Durgin josh.dur...@dreamhost.com
There's already a constant for this anyway.
(I changed Josh's code to use memcmp() and memcpy() instead. -Alex)
Signed-off-by: Alex Elder el...@dreamhost.com
---
drivers/block/rbd.c |8 +++-
1 files changed, 3 insertions(+), 5 deletions(-)
New rbd devices are granted a unique identifiers based on how many
devices are already in existence. This series rearranges how that
is done, switching from using a spinlock to using an atomic variable
to select the next rbd id to use. In the process a bit of the code
got a bit more isolated.
Move the loop that finds a new unique rbd id to use into
its own helper function.
Signed-off-by: Alex Elder el...@dreamhost.com
---
drivers/block/rbd.c | 30 +++---
1 files changed, 19 insertions(+), 11 deletions(-)
diff --git a/drivers/block/rbd.c
In order to select a new unique identifier for an added rbd device,
the list of all existing ones is searched and a value one greater
than the highest id is used.
The list search can be avoided by using an atomic variable that
keeps track of the current highest id. Using a get/put model for
The rbd_dev_list is just a simple list of all the current
rbd_devices. Using the ctl_mutex as a concurrency guard is
overkill. Instead, use a spinlock for that specific purpose.
This also reduces the window that the ctl_mutex needs to be held in
rbd_add().
Signed-off-by: Alex Elder
The only time entries are added to or removed from the global
rbd_dev_list is exactly when a put or get operation is being
performed on a rbd_dev's id. So just move the list management code
into get/put routines.
Signed-off-by: Alex Elder el...@dreamhost.com
---
drivers/block/rbd.c | 47
It used to be that selecting a new unique identifier for an added
rbd device required searching all existing ones to find the highest
id is used. A recent change made that unnecessary, but made it
so that id's used were monotonically non-decreasing. It's a bit
more pleasant to have smaller rbd
This series reduces the window during which the client list
lock is held, gives it a more meaningful name, and moves the
locking calls closer to the places they're really needed.
-Alex
--
To unsubscribe from this list: send the line unsubscribe ceph-devel
In rbd_get_client(), if a client is reused, a number of things
get done while still holding the list lock unnecessarily.
This just moves a few things that need no lock protection outside
the lock.
Signed-off-by: Alex Elder el...@dreamhost.com
---
drivers/block/rbd.c | 10 ++
1 files
Since rbd_get_client() is only called in one place, move the
acquisition of the mutex around that call inside that function.
Furthermore, within rbd_get_client(), it appears the mutex only
needs to be held while calling rbd_client_create(). (Moving
the lock inside that function will wait for
The spinlock used to protect rbd_client_list is named node_lock.
Rename it to rbd_client_list_lock to make it more obvious what
it's for.
Signed-off-by: Alex Elder el...@dreamhost.com
---
drivers/block/rbd.c | 20 ++--
1 files changed, 10 insertions(+), 10 deletions(-)
diff
Here are a few very simple cleanups:
- Add a RBD_ prefix to the two driver name string definitions.
- Move the definition of struct rbd_request below struct rbd_req_coll
to avoid the need for an empty declaration of the latter.
- Move and group the definitions of
This series affects rbd_add(). It calls rbd_get_client(), and by
making that function return a client pointer it makes it more
obvious that the client pointer is getting assigned. It also
reduces the amount of memory allocated to hold the monitor
address and options passed from the user. And
Since rbd_get_client() currently returns an error code. It assigns
the rbd_client field of the rbd_device structure it is passed if
successful. Instead, have it return the created rbd_client
structure and return a pointer-coded error if there is an error.
This makes the assignment of the client
The length of the string containing the monitor address
specification(s) will never exceed the length of the string passed
in to rbd_add(). The same holds true for the ceph + rbd options
string. So reduce the amount of memory allocated for these to
that length rather than the maximum (1024
If a couple pointers are initialized to NULL then a single
out_nomem label can be used for all of the memory allocation
failure cases in rbd_add().
Also, get rid of the irc local variable there. There is no
real need for rc to be type ssize_t, and it can be used in
the spot irc was.
This series affects the way arguments are parsed in rbd_add().
It first encapsulates the code into its own helper function.
Then it uses a few simple tokenization functions instead of
sscanf() to parse the string provided, which makes it possible
to do a better job of error checking the input.
Move the code that parses the arguments provided to rbd_add() (which
are supplied via /sys/bus/rbd/add) into a separate function.
Also rename the mon_dev_name variable in rbd_add() to be
mon_addrs. The variable represents a list of one or more
comma-separated monitor IP addresses, each with an
Make use of a few simple helper routines to parse the arguments
rather than sscanf(). This will treat both missing and too-long
arguments as invalid input (rather than silently truncating the
input in the too-long case). In time this can also be used by
rbd_add() to use the passed-in buffer in
This is a bit gratuitous, but there are a few things that can be
verified at build time rather than run time, so do that.
Signed-off-by: Alex Elder el...@dreamhost.com
---
drivers/block/rbd.c | 15 ---
1 files changed, 12 insertions(+), 3 deletions(-)
diff --git
The argument parsing routine already computes the size of the
mon_addrs buffer it extracts from the command. Pass it to the
caller so it can use it to provide the length to rbd_get_client().
Signed-off-by: Alex Elder el...@dreamhost.com
---
drivers/block/rbd.c | 20 +---
1
The mon_addrs buffer in rbd_add is used to hold a copy of the
monitor IP addresses supplied via /sys/bus/rbd/add. That is
passed to rbd_get_client(), which never modifies it (nor do
any of the functions it gets passed to thereafter)--the mon_addr
parameter to rbd_get_client() is a pointer to
Return -EINVAL rather than panic if iinfo-symlink_len and inode-i_size
do not match.
Also use kstrndup rather than kmalloc/memcpy.
Signed-off-by: Xi Wang xi.w...@gmail.com
Reviewed-by: Alex Elder el...@dreamhost.com
---
fs/ceph/inode.c | 11 ++-
1 files changed, 6 insertions(+), 5
Once rbd_bus_type is registered, it allows an add operation via
the /sys/bus/rbd/add bus attribute, and adding a new rbd device that
way establishes a connection between the device and rbd_root_dev.
But rbd_root_dev is not registered until after the rbd_bus_type
registration is complete. This
The patch messages explain the changes in detail. -Alex
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Feb 28, 2012 at 07:06:22PM -0800, Alex Elder wrote:
Each messenger allocates a page to be used when writing zeroes
out in the event of error or other abnormal condition. Just
allocate one at initialization time and have them all share it.
Any reason you don't simply use the
On Tue, Feb 28, 2012 at 07:17:41PM -0800, Alex Elder wrote:
This series cleans up some code involving ceph's virtual extended
attributes. Three of them define some simple macros are set up to
help ensure the attributes are defined in a consistent way. One
makes the size of certain constant
On Tue, Feb 28, 2012 at 7:13 PM, Alex Elder el...@dreamhost.com wrote:
All callers of ceph_match_vxattr() determine what to pass as the
first argument by calling ceph_inode_vxattrs(inode). Just do that
inside ceph_match_vxattr() itself, changing it to take an inode
rather than the vxattr
Here is another set of small code tidy-ups:
- Define SECTOR_SHIFT and SECTOR_SIZE, and use these symbolic
names throughout. Tell the blk_queue system our physical
block size, in the (unlikely) event we want to use something
other than the default.
- Delete the
A few blocks of code are rearranged a bit here:
- In rbd_header_from_disk():
- Don't bother computing snap_count until we're sure the
on-disk header starts with a good signature.
- Move a few independent lines of code so they are *after* a
check for a
This series actually added another flag to the rbd_dev, but the
need for that went away. Still, I thought these two changes
added value so I kept them.
-Alex
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message
To allow having more than just read-only as Boolean state associated
with an rbd_device, replace the read_only field with a more
general flags field. The non-atomic versions of bit operations
are fine for our purposes.
Signed-off-by: Alex Elder el...@dreamhost.com
---
drivers/block/rbd.c |
Currently an rbd device's id is released when it is removed, but it
is done before the code is run to clean up sysfs-related files (such
as /sys/bus/rbd/devices/1).
It's possible that an rbd is still in use after the rbd_remove()
call has been made. It's essentially the same as an active inode
One of the arguments to prepare_write_connect() indicates whether it
is being called immediately after a call to prepare_write_banner().
Move the prepare_write_banner() call inside prepare_write_connect(),
and reinterpret (and rename) the after_banner argument so it
indicates that
Encapsulate the operation of adding a new chunk of data to the next
open slot in a ceph_connection's out_kvec array. Also add a reset
operation to make subsequent add operations start at the beginning
of the array again.
Use these routines throughout, avoiding duplicate code and ensuring
all
More cleanups, this time in the messenger code. -Alex
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Define a helper function to perform various cleanup operations. Use
it both in the exit routine and in the init routine in the event of
an error.
Signed-off-by: Alex Elder el...@dreamhost.com
---
net/ceph/messenger.c | 38 --
1 files changed, 20
There is no real need for ceph_tcp_connect() to return the socket
pointer it creates, since it already assigns it to con-sock, which
is visible to the caller. Instead, have it return an error code,
which tidies things up a bit.
Signed-off-by: Alex Elder el...@dreamhost.com
---
This gathers a number of very minor changes:
- use %hu when formatting the a socket address's address family
- null out the ceph_msgr_wq pointer after the queue has been
destroyed
- drop a needless cast in ceph_write_space()
- add a WARN() call in ceph_state_change() in the
The messenger workqueue has no need to be public. So give it static
scope.
Signed-off-by: Alex Elder el...@dreamhost.com
---
include/linux/ceph/messenger.h |2 --
net/ceph/messenger.c |2 +-
2 files changed, 1 insertions(+), 3 deletions(-)
diff --git
On Tue, Feb 28, 2012 at 7:21 PM, Alex Elder el...@dreamhost.com wrote:
All names defined in the directory and file virtual extended
attribute tables are constant, and the size of each is known at
compile time. So there's no need to compute their length every
time any file's attribute is
Rename a few variables so it's clearer which indicate that a
CRC should be computed and which hold an actual CRC value.
Separate CRC calculation from byte swapping to improve
readability. And move some code that runs on only on
the last pass through a couple of loops outside the loop
body.
Change the name (and type) of a few CRC-related Boolean local
variables so they contain the word do, to distingish their purpose
from variables used for holding an actual CRC value.
Note that in the process of doing this I identified a fairly serious
logic error in write_partial_msg_pages():
Calculate CRC in a separate step from rearranging the byte order
of the result, to improve clarity and readability.
Use offsetof() to determine the number of bytes to include in the
CRC calculation.
In read_partial_message(), switch which value gets byte-swapped,
since the just-computed CRC is
Move blocks of code out of loops in read_partial_message_section()
and read_partial_message(). They were only was getting called at
the end of the last iteration of the loop anyway.
Signed-off-by: Alex Elder el...@dreamhost.com
---
net/ceph/messenger.c | 26 --
1
Messaging code again. The biggest improvement here is encapsulating
the code that puts data into the write vectors. The second patch just
adds more simple cleanups.
-Alex
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body
On Tue, Feb 28, 2012 at 7:35 PM, Alex Elder el...@dreamhost.com wrote:
From Josh Durgin josh.dur...@dreamhost.com
There's already a constant for this anyway.
(I changed Josh's code to use memcmp() and memcpy() instead. -Alex)
Signed-off-by: Alex Elder el...@dreamhost.com
---
On 02/28/2012 08:52 PM, Alex Elder wrote:
Messaging code again. The biggest improvement here is encapsulating
the code that puts data into the write vectors. The second patch just
adds more simple cleanups.
What I meant to say was...
These are two simple patches, the first is just a simple
Make a small change in the code that counts down kvecs consumed by
a ceph_tcp_sendmsg() call. Same functionality, just blocked out
a little differently.
Signed-off-by: Alex Elder el...@dreamhost.com
---
net/ceph/messenger.c | 23 ---
1 files changed, 12 insertions(+), 11
Nothing too big here.
- define the size of the buffer used for consuming ignored
incoming data using a symbolic constant
- simplify the condition determining whether to unmap the page
in write_partial_msg_pages(): do it for crc but not if the
page is the zero page
The existing overflow check (n ULONG_MAX / b) didn't work, because
n = ULONG_MAX / b would both bypass the check and still overflow the
allocation size a + n * b.
The correct check should be (n (ULONG_MAX - a) / b).
Signed-off-by: Xi Wang xi.w...@gmail.com
Signed-off-by: Sage Weil
The overflow check for a + n * b should be (n (ULONG_MAX - a) / b),
rather than (n ULONG_MAX / b - a).
Signed-off-by: Xi Wang xi.w...@gmail.com
Signed-off-by: Sage Weil s...@newdream.net
---
fs/ceph/snap.c |2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git
On 02/28/2012 06:14 PM, Alex Elder wrote:
Over the next few hours I plan to send out a lot of patches
for review. Most of these have been around and under test
for a month or more, and at the time I originally committed
them I wasn't sure how best to get them reviewed.
We decided that sending
Neglected to copy the list on my response. -Alex
Original Message
Subject: Re: [PATCH] ceph: use a shared zero page rather than one per
messenger
Date: Tue, 28 Feb 2012 21:04:07 -0800
From: Alex Elder el...@dreamhost.com
To: Christoph Hellwig h...@infradead.org
On
On 02/28/2012 08:20 PM, Christoph Hellwig wrote:
On Tue, Feb 28, 2012 at 07:17:41PM -0800, Alex Elder wrote:
This series cleans up some code involving ceph's virtual extended
attributes. Three of them define some simple macros are set up to
help ensure the attributes are defined in a
On 02/28/2012 08:53 PM, Yehuda Sadeh Weinraub wrote:
On Tue, Feb 28, 2012 at 7:35 PM, Alex Elderel...@dreamhost.com wrote:
From Josh Durginjosh.dur...@dreamhost.com
There's already a constant for this anyway.
(I changed Josh's code to use memcmp() and memcpy() instead. -Alex)
90 matches
Mail list logo