Hi Sage,
On 05/31/2013 06:00 PM, Sage Weil wrote:
On Fri, 31 May 2013, Jim Schutt wrote:
Hi Sage,
On 05/29/2013 03:07 PM, Sage Weil wrote:
Hi all-
I have a couple of branches (wip-5176 and wip-5176-cuttlefish) that try to
make the leveldb compaction on the monitor less expensive by doing
On 06/05/2013 01:05 PM, Mark Nelson wrote:
FWIW, I've been fighting with some mon/leveldb issues on 24-node test
cluster causing high CPU utilization, constant reads, laggy osdmap
updates, and mons dropping out of quorum. Work is going on in
wip-mon. Should have some more testing done
Hi Sage,
On 05/13/2013 06:35 PM, Sage Weil wrote:
Hi Jim-
You mentioned the other day your concerns about the uniformity of the PG
and data distribution. There are several ways to attack it (including
increasing the number of PGs), but one that we haven't tested much yet is
the
Hi Sage,
On 05/29/2013 03:07 PM, Sage Weil wrote:
Hi all-
I have a couple of branches (wip-5176 and wip-5176-cuttlefish) that try to
make the leveldb compaction on the monitor less expensive by doing it in
an async thread and compaction only the trimmed range. If anyone who is
On 05/14/2013 09:23 PM, Chen, Xiaoxi wrote:
How responsive generally is the machine under load? Is there available CPU?
The machine works well, and the issued OSDs are likely the same, seems
because they have relative slower disk( disk type are the same but the
latency is a bit higher
where a mon
drops out of quorum, then comes back in on the next election,
I've found that to be a sign that my mons are too busy.
-- Jim
发自我的 iPhone
在 2013-5-15,23:07,Jim Schutt jasc...@sandia.gov 写道:
On 05/14/2013 09:23 PM, Chen, Xiaoxi wrote:
How responsive generally is the machine
/include/ceph_fs.h.
Jim Schutt (3):
ceph: fix up comment for ceph_count_locks() as to which lock to hold
ceph: add missing cpu_to_le32() calls when encoding a reconnect capability
ceph: ceph_pagelist_append might sleep while atomic
fs/ceph/locks.c | 75
Signed-off-by: Jim Schutt jasc...@sandia.gov
---
fs/ceph/locks.c |2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/fs/ceph/locks.c b/fs/ceph/locks.c
index 202dd3d..ffc86cb 100644
--- a/fs/ceph/locks.c
+++ b/fs/ceph/locks.c
@@ -169,7 +169,7 @@ int ceph_flock(struct file
and
src/include/encoding.h in the Ceph server code (git://github.com/ceph/ceph).
I also checked the server side for flock_len decoding, and I believe that
also happens correctly, by virtue of having been declared __le32 in
struct ceph_mds_cap_reconnect, in src/include/ceph_fs.h.
Signed-off-by: Jim
success
[13490.720032] ceph: mds0 caps stale
[13501.235257] ceph: mds0 recovery completed
[13501.300419] ceph: mds0 caps renewed
Fix it up by encoding locks into a buffer first, and when the
number of encoded locks is stable, copy that into a ceph_pagelist.
Signed-off-by: Jim Schutt jasc...@sandia.gov
On 05/15/2013 10:49 AM, Alex Elder wrote:
On 05/15/2013 11:38 AM, Jim Schutt wrote:
Ceph's encode_caps_cb() worked hard to not call __page_cache_alloc() while
holding a lock, but it's spoiled because ceph_pagelist_addpage() always
calls kmap(), which might sleep. Here's the result
[resent to list because I missed that Cc:]
Hi Sage,
On 05/13/2013 06:35 PM, Sage Weil wrote:
Hi Jim-
You mentioned the other day your concerns about the uniformity of the PG
and data distribution. There are several ways to attack it (including
increasing the number of PGs), but one that
On 05/14/2013 10:44 AM, Alex Elder wrote:
On 05/09/2013 09:42 AM, Jim Schutt wrote:
Ceph's encode_caps_cb() worked hard to not call __page_cache_alloc while
holding a lock, but it's spoiled because ceph_pagelist_addpage() always
calls kmap(), which might sleep. Here's the result:
I finally
success
[13490.720032] ceph: mds0 caps stale
[13501.235257] ceph: mds0 recovery completed
[13501.300419] ceph: mds0 caps renewed
Fix it up by encoding locks into a buffer first, and when the
number of encoded locks is stable, copy that into a ceph_pagelist.
Signed-off-by: Jim Schutt jasc...@sandia.gov
Hi Greg,
On 04/10/2013 06:39 PM, Gregory Farnum wrote:
Jim,
I took this patch as a base for setting up config options which people
can tune manually and have pushed those changes to wip-leveldb-config.
I was out of the office unexpectedly for a few days,
so I'm just now taking a look.
at this block size.
Signed-off-by: Jim Schutt jasc...@sandia.gov
---
src/common/config_opts.h |4
src/os/LevelDBStore.cc |9 +
src/os/LevelDBStore.h|3 +++
3 files changed, 16 insertions(+), 0 deletions(-)
diff --git a/src/common/config_opts.h b/src/common/config_opts.h
index
On 04/03/2013 04:51 PM, Gregory Farnum wrote:
On Wed, Apr 3, 2013 at 3:40 PM, Jim Schutt jasc...@sandia.gov wrote:
On 04/03/2013 12:25 PM, Sage Weil wrote:
Sorry, guess I forgot some of the history since this piece at least is
resolved now. I'm surprised if 30-second timeouts are causing
On 04/04/2013 08:15 AM, Jim Schutt wrote:
On 04/03/2013 04:51 PM, Gregory Farnum wrote:
On Wed, Apr 3, 2013 at 3:40 PM, Jim Schutt jasc...@sandia.gov wrote:
On 04/03/2013 12:25 PM, Sage Weil wrote:
Sorry, guess I forgot some of the history since this piece at least is
resolved now. I'm
On 04/03/2013 04:40 PM, Jim Schutt wrote:
On 04/03/2013 12:25 PM, Sage Weil wrote:
Sorry, guess I forgot some of the history since this piece at
least is
resolved now. I'm surprised if 30-second timeouts are causing
issues
without those overloads you were seeing; have you
.
Signed-off-by: Jim Schutt jasc...@sandia.gov
---
src/os/LevelDBStore.cc |3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/src/os/LevelDBStore.cc b/src/os/LevelDBStore.cc
index 3d94096..1b6ae7d 100644
--- a/src/os/LevelDBStore.cc
+++ b/src/os/LevelDBStore.cc
@@ -16,6 +16,9
Hi Joao,
I alluded in an earlier thread about an issue I've been recently
having with starting a new filesystem, which I thought I had
tracked into the paxos subsystem. I believe I started having
this trouble when I started testing v0.59, and it's still there
in v0.60.
The basic configuration
Hi Sage,
On 04/03/2013 09:58 AM, Sage Weil wrote:
Hi Jim,
What happens if you change 'osd mon ack timeout = 300' (from the
default of 30)? I suspect part of the problem is that the mons are just
slow enough that the osd's resend the same thing again and it snowballs
into more work for
On 04/03/2013 11:49 AM, Gregory Farnum wrote:
On Wed, Apr 3, 2013 at 10:14 AM, Gregory Farnum g...@inktank.com wrote:
On Wed, Apr 3, 2013 at 10:09 AM, Jim Schutt jasc...@sandia.gov wrote:
Hi Sage,
On 04/03/2013 09:58 AM, Sage Weil wrote:
Hi Jim,
What happens if you change 'osd mon ack
On 04/03/2013 12:25 PM, Sage Weil wrote:
Sorry, guess I forgot some of the history since this piece at least is
resolved now. I'm surprised if 30-second timeouts are causing issues
without those overloads you were seeing; have you seen this issue
without your high debugging levels and
On 04/02/2013 09:42 AM, Joao Eduardo Luis wrote:
On 04/01/2013 10:14 PM, Jim Schutt wrote:
Hi,
I've been having trouble starting a new file system
created using the current next branch (most recently,
commit 3b5f663f11).
I believe the trouble is related to how long it takes paxos
On 04/02/2013 12:28 PM, Joao Luis wrote:
Right. I'll push a patch to bump that sort of output to 30 when I get home.
Thanks - but FWIW, I don't think it's the root cause of my
issue -- more below
If you're willing, try reducing the paxos debug level to 0 and let us know
if those delays
On 04/02/2013 01:16 PM, Jim Schutt wrote:
On 04/02/2013 12:28 PM, Joao Luis wrote:
Right. I'll push a patch to bump that sort of output to 30 when I get home.
Thanks - but FWIW, I don't think it's the root cause of my
issue -- more below
OK, I see now that you're talking about
Hi,
I've been having trouble starting a new file system
created using the current next branch (most recently,
commit 3b5f663f11).
I believe the trouble is related to how long it takes paxos to
process a pgmap proposal.
For a configuration with 1 mon, 1 mds, and 576 osds, using
pg_bits = 3 and
On 03/15/2013 05:17 PM, Greg Farnum wrote:
[Putting list back on cc]
On Friday, March 15, 2013 at 4:11 PM, Jim Schutt wrote:
On 03/15/2013 04:23 PM, Greg Farnum wrote:
As I come back and look at these again, I'm not sure what the context
for these logs is. Which test did they come from
On 03/11/2013 02:40 PM, Jim Schutt wrote:
If you want I can attempt to duplicate my memory of the first
test I reported, writing the files today and doing the strace
tomorrow (with timestamps, this time).
Also, would it be helpful to write the files with minimal logging, in
hopes
On 03/08/2013 07:05 PM, Greg Farnum wrote:
On Friday, March 8, 2013 at 2:45 PM, Jim Schutt wrote:
On 03/07/2013 08:15 AM, Jim Schutt wrote:
On 03/06/2013 05:18 PM, Greg Farnum wrote:
On Wednesday, March 6, 2013 at 3:14 PM, Jim Schutt wrote:
[snip]
Do you want the MDS log at 10
On 03/11/2013 09:48 AM, Greg Farnum wrote:
On Monday, March 11, 2013 at 7:47 AM, Jim Schutt wrote:
On 03/08/2013 07:05 PM, Greg Farnum wrote:
On Friday, March 8, 2013 at 2:45 PM, Jim Schutt wrote:
On 03/07/2013 08:15 AM, Jim Schutt wrote:
On 03/06/2013 05:18 PM, Greg Farnum wrote
Hi Bryan,
On 03/11/2013 09:10 AM, Bryan K. Wright wrote:
s...@inktank.com said:
On Thu, 7 Mar 2013, Bryan K. Wright wrote:
s...@inktank.com said:
- pg log trimming (probably a conservative subset) to avoid memory bloat
Anything that reduces the size of OSD processes would be appreciated.
On 03/11/2013 10:57 AM, Greg Farnum wrote:
On Monday, March 11, 2013 at 9:48 AM, Jim Schutt wrote:
On 03/11/2013 09:48 AM, Greg Farnum wrote:
On Monday, March 11, 2013 at 7:47 AM, Jim Schutt wrote:
For this run, the MDS logging slowed it down enough to cause the
client caps to occasionally
On 03/07/2013 08:15 AM, Jim Schutt wrote:
On 03/06/2013 05:18 PM, Greg Farnum wrote:
On Wednesday, March 6, 2013 at 3:14 PM, Jim Schutt wrote:
[snip]
Do you want the MDS log at 10 or 20?
More is better. ;)
OK, thanks.
I've sent some mds logs via private email...
-- Jim
On 03/06/2013 05:18 PM, Greg Farnum wrote:
On Wednesday, March 6, 2013 at 3:14 PM, Jim Schutt wrote:
When I'm doing these stat operations the file system is otherwise
idle.
What's the cluster look like? This is just one active MDS and a couple
hundred clients?
1 mds, 1 mon, 576 osds, 198
On 03/05/2013 12:33 PM, Sage Weil wrote:
Running 'du' on each directory would be much faster with Ceph since it
accounts tracks the subdirectories and shows their total size with an 'ls
-al'.
Environments with 100k users also tend to be very dynamic with adding and
removing users all
On 03/06/2013 12:13 PM, Greg Farnum wrote:
On Wednesday, March 6, 2013 at 11:07 AM, Jim Schutt wrote:
On 03/05/2013 12:33 PM, Sage Weil wrote:
Running 'du' on each directory would be much faster with Ceph since it
accounts tracks the subdirectories and shows their total size with an 'ls
-al
On 03/06/2013 01:21 PM, Greg Farnum wrote:
Also, this issue of stat on files created on other clients seems
like it's going to be problematic for many interactions our users
will have with the files created by their parallel compute jobs -
any suggestion on how to avoid or fix it?
Hi Sage,
On 02/26/2013 12:36 PM, Sage Weil wrote:
On Tue, 26 Feb 2013, Jim Schutt wrote:
I think the right solution is to make an option that will setsockopt on
SO_RECVBUF to some value (say, 256KB). I pushed a branch that does this,
wip-tcp. Do you mind checking to see if this addresses
Hi Sage,
On 02/20/2013 05:12 PM, Sage Weil wrote:
Hi Jim,
I'm resurrecting an ancient thread here, but: we've just observed this on
another big cluster and remembered that this hasn't actually been fixed.
Sorry for the delayed reply - I missed this in a backlog
of unread email...
I
On 01/31/2013 05:43 AM, Sage Weil wrote:
Hi-
Can you reproduce this with logs? It looks like there are a few ops that
are hanging for a very long time, but there isn't enough information here
except to point to osds 610, 612, 615, and 68...
FWIW, I have a small pile of disks with bad
On 01/31/2013 01:00 PM, Sage Weil wrote:
On Thu, 31 Jan 2013, Jim Schutt wrote:
On 01/31/2013 05:43 AM, Sage Weil wrote:
Hi-
Can you reproduce this with logs? It looks like there are a few ops that
are hanging for a very long time, but there isn't enough information here
except to point
Hi Sage,
On 01/15/2013 07:55 PM, Sage Weil wrote:
Hi Jim-
I just realized this didn't make it into our tree. It's now in testing,
and will get merged in the next window. D'oh!
That's great news - thanks for the update.
-- Jim
sage
--
To unsubscribe from this list: send the line
Hi,
I'm looking at commit e3ed28eb2 in the next branch,
and I have a question.
Shouldn't the limit be pg_num 65536, because
PGs are numbered 0 thru pg_num-1?
If not, what am I missing?
FWIW, up through yesterday I've been using the next branch and this:
ceph osd pool set data pg_num 65536
On 12/14/2012 09:59 AM, Joao Eduardo Luis wrote:
On 12/14/2012 03:41 PM, Jim Schutt wrote:
Hi,
I'm looking at commit e3ed28eb2 in the next branch,
and I have a question.
Shouldn't the limit be pg_num 65536, because
PGs are numbered 0 thru pg_num-1?
If not, what am I missing?
FWIW, up
On 12/11/2012 06:37 PM, Liu Bo wrote:
On Tue, Dec 11, 2012 at 09:33:15AM -0700, Jim Schutt wrote:
On 12/09/2012 07:04 AM, Liu Bo wrote:
On Wed, Dec 05, 2012 at 09:07:05AM -0700, Jim Schutt wrote:
Hi Jim,
Could you please apply the following patch to test if it works?
Hi,
So far
On 12/09/2012 07:04 AM, Liu Bo wrote:
On Wed, Dec 05, 2012 at 09:07:05AM -0700, Jim Schutt wrote:
Hi,
I'm hitting a btrfs locking issue with 3.7.0-rc8.
The btrfs filesystem in question is backing a Ceph OSD
under a heavy write load from many cephfs clients.
I reported
On 12/05/2012 09:07 AM, Jim Schutt wrote:
Hi,
I'm hitting a btrfs locking issue with 3.7.0-rc8.
The btrfs filesystem in question is backing a Ceph OSD
under a heavy write load from many cephfs clients.
I reported this issue a while ago:
http://www.spinics.net/lists/linux-btrfs/msg19370.html
Hi,
I'm hitting a btrfs locking issue with 3.7.0-rc8.
The btrfs filesystem in question is backing a Ceph OSD
under a heavy write load from many cephfs clients.
I reported this issue a while ago:
http://www.spinics.net/lists/linux-btrfs/msg19370.html
when I was testing what I thought might be
be relatively rare.
Signed-off-by: Jim Schutt jasc...@sandia.gov
---
include/linux/ceph/ceph_features.h |4 +++-
include/linux/crush/crush.h|2 ++
net/ceph/crush/mapper.c| 13 ++---
net/ceph/osdmap.c |6 ++
4 files changed, 21 insertions
On 11/28/2012 09:11 AM, Caleb Miles wrote:
Hey Jim,
Running the third test with tunable chooseleaf_descend_once 0 with no
devices marked out yields the following result
(999.827397, 0.48667056652539997)
so chi squared value is 999 with a corresponding p value of 0.487 so that
the
Hi Caleb,
On 11/26/2012 07:28 PM, caleb miles wrote:
Hello all,
Here's what I've done to try and validate the new chooseleaf_descend_once
tunable first described in commit f1a53c5e80a48557e63db9c52b83f39391bc69b8 in
the wip-crush branch of ceph.git.
First I set the new tunable to it's
On 10/26/2012 02:52 PM, Gregory Farnum wrote:
Wanted to touch base on this patch again. If Sage and Sam agree that
we don't want to play any tricks with memory accounting, we should
pull this patch in. I'm pretty sure we want it for Bobtail!
I've been running with it since I posted it.
I think
Signed-off-by: Jim Schutt jasc...@sandia.gov
---
src/os/FileJournal.cc |5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/src/os/FileJournal.cc b/src/os/FileJournal.cc
index d1c92dc..2254720 100644
--- a/src/os/FileJournal.cc
+++ b/src/os/FileJournal.cc
@@ -945,9
ops.
Signed-off-by: Jim Schutt jasc...@sandia.gov
---
src/osd/ReplicatedPG.cc |4
1 files changed, 0 insertions(+), 4 deletions(-)
diff --git a/src/osd/ReplicatedPG.cc b/src/osd/ReplicatedPG.cc
index a64abda..80bec2a 100644
--- a/src/osd/ReplicatedPG.cc
+++ b/src/osd/ReplicatedPG.cc
On 09/27/2012 04:07 PM, Gregory Farnum wrote:
Have you tested that this does what you want? If it does, I think
we'll want to implement this so that we actually release the memory,
but continue accounting it.
Yes. I have diagnostic patches where I add an advisory option
to Throttle, and apply
On 09/27/2012 04:27 PM, Gregory Farnum wrote:
On Thu, Sep 27, 2012 at 3:23 PM, Jim Schuttjasc...@sandia.gov wrote:
On 09/27/2012 04:07 PM, Gregory Farnum wrote:
Have you tested that this does what you want? If it does, I think
we'll want to implement this so that we actually release the
Hi,
I was testing on 288 OSDs with pg_bits=8, for 73984 PGs/pool,
221952 total PGs.
Writing from CephFS clients generates lots of messages like this:
2012-08-28 14:53:33.772344 osd.235 [WRN] client.4533 172.17.135.45:0/1432642641
misdirected client.4533.1:124 pg 0.8b9d12d4 to osd.235 in e7,
On 08/09/2012 10:26 AM, Tommi Virtanen wrote:
mkcephfs is not a viable route forward. For example, it is unable to
expand a pre-existing cluster.
The new OSD hotplugging style init is much, much nicer. And does
more than just mkfs mount.
I'm embarrassed to admit I haven't been keeping up
On 08/08/2012 07:13 PM, Alex Elder wrote:
On 08/08/2012 11:09 AM, Jim Schutt wrote:
Because the Ceph client messenger uses a non-blocking connect, it is possible
for the sending of the client banner to race with the arrival of the banner
sent by the peer.
This is possible because the server
of prepare_write_connect() to its
callers at all locations except the one where the banner might still need to
be sent.
Signed-off-by: Jim Schutt jasc...@sandia.gov
---
net/ceph/messenger.c | 11 +--
1 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/net/ceph/messenger.c b/net/ceph
On 07/30/2012 06:24 PM, Gregory Farnum wrote:
On Mon, Jul 30, 2012 at 3:47 PM, Jim Schuttjasc...@sandia.gov wrote:
Above you mentioned that you are seeing these issues as you scaled
out a storage cluster, but none of the solutions you mentioned
address scaling. Let's assume your preferred
Hi Greg,
Thanks for the write-up. I have a couple questions below.
On 07/30/2012 12:46 PM, Gregory Farnum wrote:
As Ceph gets deployed on larger clusters our most common scaling
issues have related to
1) our heartbeat system, and
2) handling the larger numbers of OSDMaps that get generated by
On 07/17/2012 06:03 PM, Samuel Just wrote:
master should now have a fix for that, let me know how it goes. I opened
bug #2798 for this issue.
Hmmm, it seems handle_osd_ping() now runs into a case
where for the first ping it gets, service.osdmap can be empty?
0 2012-07-18
On 07/18/2012 12:03 PM, Samuel Just wrote:
Sorry, master has a fix now for that also.
76efd9772c60b93bbf632e3ecc3b9117dc081427
-Sam
That got things running for me.
Thanks for the quick reply.
-- Jim
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a
Hi,
Recent master branch is asserting for me like this:
ceph version 0.48argonaut-404-gabe05a3
(commit:abe05a3fbbb120d8d354623258d9104584db66f7)
1: (OSDMap::get_cluster_inst(int) const+0xc9) [0x58cde9]
2: (OSD::handle_osd_ping(MOSDPing*)+0x8cf) [0x5d4b4f]
3:
On 07/17/2012 03:44 PM, Samuel Just wrote:
Not quite. OSDService::get_osdmap() returns the most recently
published osdmap. Generally, OSD::osdmap is safe to use when you are
holding the osd lock. Otherwise, OSDService::get_osdmap() should be
used. There are a few other things that should be
On 07/01/2012 01:57 PM, Stefan Priebe wrote:
thanks for sharing. Which btrfs mount options did you use?
-o noatime
is all I use.
-- Jim
Am 29.06.2012 00:37, schrieb Jim Schutt:
Hi,
Lots of trouble reports go by on the list - I thought
it would be useful to report a success.
Using
On 07/02/2012 08:07 AM, Stefan Priebe - Profihost AG wrote:
Am 02.07.2012 16:04, schrieb Jim Schutt:
On 07/01/2012 01:57 PM, Stefan Priebe wrote:
thanks for sharing. Which btrfs mount options did you use?
-o noatime
is all I use.
Thanks. Have you ever measured random I/O performance
On 06/28/2012 04:53 PM, Mark Nelson wrote:
On 06/28/2012 05:37 PM, Jim Schutt wrote:
Hi,
Lots of trouble reports go by on the list - I thought
it would be useful to report a success.
Using a patch (https://lkml.org/lkml/2012/6/28/446)
on top of 2.5-rc4 for my OSD servers, the same kernel
On 06/28/2012 05:36 AM, Mel Gorman wrote:
On Wed, Jun 27, 2012 at 03:59:19PM -0600, Jim Schutt wrote:
Hi,
I'm running into trouble with systems going unresponsive,
and perf suggests it's excessive CPU usage by isolate_freepages().
I'm currently testing 3.5-rc4, but I think this problem may
On 06/28/2012 09:45 AM, Alexandre DERUMIER wrote:
Definitely. Seeing perf/oprofile/whatever results for the osd under that
workload would be very interesting! We need to get perf going in our
testing environment...
I'm not an expert, but if you give me command line, I'll do it ;)
Thanks to
Hi,
Lots of trouble reports go by on the list - I thought
it would be useful to report a success.
Using a patch (https://lkml.org/lkml/2012/6/28/446)
on top of 2.5-rc4 for my OSD servers, the same kernel
for my Linux clients, and a recent master branch
tip (git://github.com/ceph/ceph commit
Hi Mark,
On 06/27/2012 07:55 AM, Mark Nelson wrote:
For what it's worth, I've got a pair of Dell R515 setup with a single 2.8GHz
6-core 4184 Opteron, 16GB of RAM, and 10 SSDs that are capable of about 200MB/s
each. Currently I'm topping out at about 600MB/s with rados bench using half
of
On 06/27/2012 09:19 AM, Stefan Priebe wrote:
Am 27.06.2012 16:55, schrieb Jim Schutt:
This is my current best tuning for my hardware, which uses
24 SAS drives/server, and 1 OSD/drive with a journal partition
on the outer tracks and btrfs for the data store.
Which raid level do you use
On 06/27/2012 11:54 AM, Stefan Priebe wrote:
Am 27.06.2012 um 19:23 schrieb Jim Schuttjasc...@sandia.gov:
On 06/27/2012 09:19 AM, Stefan Priebe wrote:
Am 27.06.2012 16:55, schrieb Jim Schutt:
This is my current best tuning for my hardware, which uses
24 SAS drives/server, and 1 OSD/drive
On 06/27/2012 12:48 PM, Stefan Priebe wrote:
Am 27.06.2012 20:38, schrieb Jim Schutt:
Actually, when my 166-client test is running,
ps -o pid,nlwp,args -C ceph-osd
tells me that I typically have ~1200 threads/OSD.
huh i see only 124 threads per OSD even with your settings.
FWIW:
2 threads
Hi,
I'm running into trouble with systems going unresponsive,
and perf suggests it's excessive CPU usage by isolate_freepages().
I'm currently testing 3.5-rc4, but I think this problem may have
first shown up in 3.4. I'm only just learning how to use perf,
so I only currently have results to
Hi,
In my testing I make repeated use of the manual mkcephfs
sequence described in the man page:
master# mkdir /tmp/foo
master# mkcephfs -c /etc/ceph/ceph.conf --prepare-monmap -d /tmp/foo
osdnode# mkcephfs --init-local-daemons osd -d /tmp/foo
mdsnode# mkcephfs --init-local-daemons
On 05/24/2012 03:13 PM, Sage Weil wrote:
Hi Jim,
On Thu, 24 May 2012, Jim Schutt wrote:
Hi,
In my testing I make repeated use of the manual mkcephfs
sequence described in the man page:
master# mkdir /tmp/foo
master# mkcephfs -c /etc/ceph/ceph.conf --prepare-monmap -d /tmp/foo
to find an in device during spread re-replication around
mode. This makes it less likely we'll give up when the
storage cluster has many failed devices.
Signed-off-by: Jim Schutt jasc...@sandia.gov
---
src/crush/mapper.c |4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/src
Signed-off-by: Jim Schutt jasc...@sandia.gov
---
include/trace/events/ceph.h | 67 +++
net/ceph/messenger.c|9 +-
net/ceph/osd_client.c |1 +
3 files changed, 76 insertions(+), 1 deletions(-)
diff --git a/include/trace/events
Trace callers of ceph_osdc_start_request, so that call locations
are identified implicitly.
Put the tracepoints after calls to ceph_osdc_start_request,
since it fills in the request transaction ID and request OSD.
Signed-off-by: Jim Schutt jasc...@sandia.gov
---
fs/ceph/addr.c
Hi Alex,
I ran across tracker #2374 today - I've been carrying these two
tracepoint patches for a while. Perhaps you'll find them useful.
Jim Schutt (2):
ceph: add tracepoints for message submission on read/write requests
ceph: add tracepoints for message send queueing and completion, reply
.
So, this is looking fairly solid to me so far. What do you think?
Thanks -- Jim
Jim Schutt (2):
ceph: retry CRUSH map descent before retrying bucket
ceph: retry CRUSH map descent from root if leaf is failed
src/crush/mapper.c | 30 ++
1 files changed, 22
.
Signed-off-by: Jim Schutt jasc...@sandia.gov
---
src/crush/mapper.c | 20 ++--
1 files changed, 14 insertions(+), 6 deletions(-)
diff --git a/src/crush/mapper.c b/src/crush/mapper.c
index 8857577..e5dc950 100644
--- a/src/crush/mapper.c
+++ b/src/crush/mapper.c
@@ -350,8 +350,7
, if the primary OSD
in a placement group has failed, choosing a replacement may result in
one of the other OSDs in the PG colliding with the new primary. This
requires that OSD's data for that PG to need moving as well. This
seems unavoidable but should be relatively rare.
Signed-off-by: Jim Schutt jasc
On 04/30/2012 11:12 AM, Samuel Just wrote:
There is a (unfortunately non-optional at the moment) feature in crush
where we retry in the same bucket a few times before restarting the
descent when hitting an out leaf. The result of this is to localise
recovery at the expense of inadequately
On 04/26/2012 06:09 PM, Tommi Virtanen wrote:
Now, here are my actual questions:
1. What should the relative names of the branches be? stable vs
latest etc. I especially don't like integration, but I do see a
time where it is not ready for stable but still needs to branch off
of latest.
2. Do
Hi,
I've been experimenting with failure scenarios to make sure
I understand what happens when an OSD drops out. In particular,
I've been using ceph osd out n and watching my all my OSD
servers to see where the data from the removed OSD ends up
after recovery. I've been doing this testing with
On 03/09/2012 04:21 PM, Jim Schutt wrote:
On 03/09/2012 12:39 PM, Jim Schutt wrote:
On 03/08/2012 05:26 PM, Sage Weil wrote:
On Thu, 8 Mar 2012, Jim Schutt wrote:
Hi,
I've been trying to scale up a Ceph filesystem to as big
as I have hardware for - up to 288 OSDs right now.
(I'm using
On 04/10/2012 10:39 AM, Sage Weil wrote:
On Tue, 10 Apr 2012, Jim Schutt wrote:
On 03/09/2012 04:21 PM, Jim Schutt wrote:
On 03/09/2012 12:39 PM, Jim Schutt wrote:
On 03/08/2012 05:26 PM, Sage Weil wrote:
On Thu, 8 Mar 2012, Jim Schutt wrote:
Hi,
I've been trying to scale up a Ceph
Signed-off-by: Jim Schutt jasc...@sandia.gov
---
src/Makefile.am |5 -
1 files changed, 4 insertions(+), 1 deletions(-)
diff --git a/src/Makefile.am b/src/Makefile.am
index cdfb43d..2062d1c 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -48,7 +48,7 @@ if LINUX
ceph_osd_LDADD
On 03/21/2012 05:16 AM, Plaetinck, Dieter wrote:
Hello,
Ceph/Rados looks very well designed and engineered.
I would like to build a cluster to test the rados distributed object storage
(not the distributed FS or block devices)
I've seen the list of dependencies on the wiki, but it doesn't
On 03/08/2012 05:26 PM, Sage Weil wrote:
On Thu, 8 Mar 2012, Jim Schutt wrote:
Hi,
I've been trying to scale up a Ceph filesystem to as big
as I have hardware for - up to 288 OSDs right now.
(I'm using commit ed0f605365e - tip of master branch from
a few days ago.)
My problem is that I
On 03/09/2012 12:39 PM, Jim Schutt wrote:
On 03/08/2012 05:26 PM, Sage Weil wrote:
On Thu, 8 Mar 2012, Jim Schutt wrote:
Hi,
I've been trying to scale up a Ceph filesystem to as big
as I have hardware for - up to 288 OSDs right now.
(I'm using commit ed0f605365e - tip of master branch from
Hi,
I've been trying to scale up a Ceph filesystem to as big
as I have hardware for - up to 288 OSDs right now.
(I'm using commit ed0f605365e - tip of master branch from
a few days ago.)
My problem is that I cannot get a 288 OSD filesystem to go active
(that's with 1 mon and 1 MDS). Pretty
Hi Alex,
On 02/02/2012 07:07 PM, Alex Elder wrote:
On Wed, 2012-02-01 at 08:59 -0700, Jim Schutt wrote:
The Ceph messenger would sometimes queue multiple work items to write
data to a socket when the socket buffer was full.
Fix this problem by making ceph_write_space() use SOCK_NOSPACE
On 02/29/2012 08:47 AM, Alex Elder wrote:
On 02/29/2012 07:30 AM, Jim Schutt wrote:
Hi Alex,
On 02/02/2012 07:07 PM, Alex Elder wrote:
On Wed, 2012-02-01 at 08:59 -0700, Jim Schutt wrote:
The Ceph messenger would sometimes queue multiple work items to write
data to a socket when the socket
1 - 100 of 240 matches
Mail list logo