Re: slow performance even when using SSDs

2012-05-10 Thread Calvin Morrow
I was getting roughly the same results of your tmpfs test using spinning disks for OSDs with a 160GB Intel 320 SSD being used for the journal. Theoretically the 520 SSD should give better performance than my 320s. Keep in mind that even with balance-alb, multiple GigE connections will only be use

Re: Ceph on btrfs 3.4rc

2012-05-10 Thread Josef Bacik
On Fri, Apr 27, 2012 at 01:02:08PM +0200, Christian Brunner wrote: > Am 24. April 2012 18:26 schrieb Sage Weil : > > On Tue, 24 Apr 2012, Josef Bacik wrote: > >> On Fri, Apr 20, 2012 at 05:09:34PM +0200, Christian Brunner wrote: > >> > After running ceph on XFS for some time, I decided to try btrfs

Re: Can I use btrfs-restore to restore ceph osds?

2012-05-10 Thread Tommi Virtanen
On Wed, May 9, 2012 at 10:15 AM, Guido Winkelmann wrote: > I'm currently trying to re-enable my experimental ceph cluster that has been > offline for a few months. Unfortunately, it appears that, out of the six btrfs > volumes involved, only one can still be mounted, the other five are broken > so

Re: Compile error in rgw/rgw_xml.h in 0.46

2012-05-10 Thread Tommi Virtanen
On Wed, May 9, 2012 at 6:46 AM, Guido Winkelmann wrote: > Compiling Ceph 0.46 fails at rgw/rgw_dencoder.cc with the following errors: > > In file included from rgw/rgw_dencoder.cc:7: > rgw/rgw_acl_s3.h:9:19: error: expat.h: No such file or directory > In file included from rgw/rgw_acl_s3.h:11, >  

Re: Compile error in rgw/rgw_xml.h in 0.46

2012-05-10 Thread Yehuda Sadeh
Oops, missed posting it to the list (seeing Tommi's comment). On Wed, May 9, 2012 at 11:04 AM, Yehuda Sadeh wrote: > > On Wed, May 9, 2012 at 6:46 AM, Guido Winkelmann > wrote: > > Compiling Ceph 0.46 fails at rgw/rgw_dencoder.cc with the following > > errors: > > > > In file included from rgw/r

Re: Always creating PGs

2012-05-10 Thread Tommi Virtanen
On Thu, May 10, 2012 at 2:17 AM, Tomoki BENIYA wrote: > I found that PGs of 'creating' status are never finished. > > # ceph pg stat > v17596: 1204 pgs: 8 creating, 1196 active+clean; 25521 MB data, 77209 MB > used, 2223 GB / 2318 GB avail >                  ~~ >                  always c

[RFC PATCH 0/2] Distribute re-replicated objects evenly after OSD failure

2012-05-10 Thread Jim Schutt
Hi Sage, I've been trying to solve the issue mentioned in tracker #2047, which I think is the same as I described in http://www.spinics.net/lists/ceph-devel/msg05824.html The attached patches seem to fix it for me. I also attempted to address the local search issue you mentioned in #2047. I

[RFC PATCH 1/2] ceph: retry CRUSH map descent before retrying bucket

2012-05-10 Thread Jim Schutt
For the first few rejections or collisions, we'll retry the descent to keep objects spread across the cluster. After that, we'll fall back to exhaustive search of the bucket to avoid trying forever in the event a bucket has only a few in items and the hash doesn't do a good job of finding them. S

[RFC PATCH 2/2] ceph: retry CRUSH map descent from root if leaf is failed

2012-05-10 Thread Jim Schutt
When an object is re-replicated after a leaf failure, the remapped replica ends up under the bucket that held the failed leaf. This causes uneven data distribution across the storage cluster, to the point that when all the leaves of a bucket but one fail, that remaining leaf holds all the data fro

Re: "converting" btrfs osds to xfs?

2012-05-10 Thread Nick Bartos
After I run the 'ceph osd out 123' command, is there a specific ceph command I can poll so I know when it's OK to kill the OSD daemon and begin the reformat process? On Tue, May 8, 2012 at 12:38 PM, Sage Weil wrote: > On Tue, 8 May 2012, Tommi Virtanen wrote: >> On Tue, May 8, 2012 at 8:39 AM, Ni

[PATCH 0/3] ceph: messenger: read_partial() cleanups

2012-05-10 Thread Alex Elder
This short series adds the use of read_partial() in a few places that it is not already. It also gets rid of the in/out "to" argument (which continues to cause confusion every time I see it), using an in-only "end" argument in its place. -Alex -- To unsubs

Re: "converting" btrfs osds to xfs?

2012-05-10 Thread Tommi Virtanen
On Thu, May 10, 2012 at 3:44 PM, Nick Bartos wrote: > After I run the 'ceph osd out 123' command, is there a specific ceph > command I can poll so I know when it's OK to kill the OSD daemon and > begin the reformat process? Good question! "ceph -s" will show you that. This is from a run where I r

Re: "converting" btrfs osds to xfs?

2012-05-10 Thread Tommi Virtanen
On Thu, May 10, 2012 at 5:23 PM, Tommi Virtanen wrote: > Good question! "ceph -s" will show you that. This is from a run where > I ran "ceph osd out 1" on a cluster of 3 osds. See the active+clean > counts going up and active+recovering counts going down, and the > "degraded" percentage dropping.

[PATCH 1/3] ceph: messenger: use read_partial() in read_partial_message()

2012-05-10 Thread Alex Elder
There are two blocks of code in read_partial_message()--those that read the header and footer of the message--that can be replaced by a call to read_partial(). Do that. Signed-off-by: Alex Elder --- net/ceph/messenger.c | 30 ++ 1 file changed, 10 insertions(+), 2

[PATCH 2/3] ceph: messenger: update "to" in read_partial() caller

2012-05-10 Thread Alex Elder
read_partial() always increases whatever "to" value is supplied by adding the requested size to it. That's the only thing it does with that pointed-to value. Do that pointer advance in the caller (and then only when the updated value will be subsequently used), and change the "to" parameter to b

[PATCH 3/3] ceph: messenger: change read_partial() to take "end" arg

2012-05-10 Thread Alex Elder
Make the second argument to read_partial() be the ending input byte position rather than the beginning offset it now represents. This amounts to moving the addition "to + size" into the caller. Signed-off-by: Alex Elder --- net/ceph/messenger.c | 59 -

Always creating PGs

2012-05-10 Thread Tomoki BENIYA
Hi, all I created Ceph file system using Debian7 of 64bit. I found that PGs of 'creating' status are never finished. # ceph pg stat v17596: 1204 pgs: 8 creating, 1196 active+clean; 25521 MB data, 77209 MB used, 2223 GB / 2318 GB avail ~~ always creati

slow performance even when using SSDs

2012-05-10 Thread Stefan Priebe - Profihost AG
Dear List, i'm doing a testsetup with ceph v0.46 and wanted to know how fast ceph is. my testsetup: 3 servers with Intel Xeon X3440, 180GB SSD Intel 520 Series, 4GB RAM, 2x 1Gbit/s LAN each All 3 are running as mon a-c and osd 0-2. Two of them are also running as mds.2 and mds.3 (has 8GB RAM ins

Re: slow performance even when using SSDs

2012-05-10 Thread Stefan Priebe - Profihost AG
OK, here some retests. I had the SDDs conected to an old Raid controller even i did used them as JBODs (oops). Here are two new Tests (using kernel 3.4-rc6) it would be great if someone could tell me if they're fine or bad. New tests with all 3 SSDs connected to the mainboard. #~ rados -p rbd be

Designing a cluster guide

2012-05-10 Thread Stefan Priebe - Profihost AG
Hi, the "Designing a cluster guide" http://wiki.ceph.com/wiki/Designing_a_cluster is pretty good but it still leaves some questions unanswered. It mentions for example "Fast CPU" for the mds system. What does fast mean? Just the speed of one core? Or is ceph designed to use multi core? Is multi c

[PATCH 2/2] ceph: add tracepoints for message send queueing and completion, reply handling

2012-05-10 Thread Jim Schutt
Signed-off-by: Jim Schutt --- include/trace/events/ceph.h | 67 +++ net/ceph/messenger.c|9 +- net/ceph/osd_client.c |1 + 3 files changed, 76 insertions(+), 1 deletions(-) diff --git a/include/trace/events/ceph.h b/include/tra

[PATCH 1/2] ceph: add tracepoints for message submission on read/write requests

2012-05-10 Thread Jim Schutt
Trace callers of ceph_osdc_start_request, so that call locations are identified implicitly. Put the tracepoints after calls to ceph_osdc_start_request, since it fills in the request transaction ID and request OSD. Signed-off-by: Jim Schutt --- fs/ceph/addr.c |8 fs/ceph/fi

[PATCH 0/2] Ceph tracepoints

2012-05-10 Thread Jim Schutt
Hi Alex, I ran across tracker #2374 today - I've been carrying these two tracepoint patches for a while. Perhaps you'll find them useful. Jim Schutt (2): ceph: add tracepoints for message submission on read/write requests ceph: add tracepoints for message send queueing and completion, reply

Re: [PATCH 0/2] Ceph tracepoints

2012-05-10 Thread Alex Elder
On 05/10/2012 09:35 AM, Jim Schutt wrote: Hi Alex, I ran across tracker #2374 today - I've been carrying these two tracepoint patches for a while. Perhaps you'll find them useful. GREAT! I haven't looked at them but I will as soon as I get the chance. I don't expect there's any reason not to

Re: Ceph on btrfs 3.4rc

2012-05-10 Thread Josef Bacik
On Fri, Apr 27, 2012 at 01:02:08PM +0200, Christian Brunner wrote: > Am 24. April 2012 18:26 schrieb Sage Weil : > > On Tue, 24 Apr 2012, Josef Bacik wrote: > >> On Fri, Apr 20, 2012 at 05:09:34PM +0200, Christian Brunner wrote: > >> > After running ceph on XFS for some time, I decided to try btrfs

Re: Ceph kernel client - kernel craches

2012-05-10 Thread Giorgos Kappes
Sorry for my late response. I reproduced the above bug with the Linux kernel 3.3.4 and without using XEN: uname -a Linux node33 3.3.4 #1 SMP Wed May 9 13:00:07 EEST 2012 x86_64 GNU/Linux The trace is shown below: [  763.984023] kernel tried to