RE: Slow requests

2013-01-15 Thread Chen, Xiaoxi
Hi, I have also seen the same warning even when I use v0.56.1 (both kernel rbd and OSD side) when write stress is high enough(Say I have 3 osds but having 4~5 clients doing dd on top of the rbd). 2013-01-15 15:54:05.990052 7ff97dd0c700 0 log [WRN] : slow request 32.545624 seconds old,

Re: [PATCH] configure.ac: check for org.junit.rules.ExternalResource

2013-01-15 Thread James Page
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 12/01/13 16:36, Noah Watkins wrote: On Thu, Jan 10, 2013 at 9:13 PM, Gary Lowell gary.low...@inktank.com wrote: Thanks Danny. Installing sharutils solved that minor issue. We now get though the build just fine on opensuse 12, but sles

Re: [PATCH] configure.ac: check for org.junit.rules.ExternalResource

2013-01-15 Thread Danny Al-Gaaf
Am 15.01.2013 10:04, schrieb James Page: On 12/01/13 16:36, Noah Watkins wrote: On Thu, Jan 10, 2013 at 9:13 PM, Gary Lowell gary.low...@inktank.com wrote: Thanks Danny. Installing sharutils solved that minor issue. We now get though the build just fine on opensuse 12, but sles 11sp2 gives

Re: CephFS issue

2013-01-15 Thread Joshua J. Kugler
On Monday, January 14, 2013 08:51:57 Alexis GÜNST HORN wrote: At the end, the client mountpoint become unresponsive, and the only way is to force reboot. I am going to throw this out there as I've seen something similar, but not with ceph. Back in 2005-ish I was experimenting with ATA over

Re: mon down

2013-01-15 Thread jie sun
Hi all, Forgot to say, my ceph version is 0.48.2, and os is opensuse 11.4(kernel 2.6.37). Thank you. 2013/1/15 jie sun 0maid...@gmail.com: Hi all, I have done some tests nowadays. I made a ceph cluster with 1 mon 1 mds and 10 osds. Then I made a block device(use rbd create/rbd

Re: code coverage and teuthology

2013-01-15 Thread Loic Dachary
On 01/14/2013 06:26 PM, Josh Durgin wrote: Looking at how it's run automatically might help: https://github.com/ceph/teuthology/blob/master/teuthology/coverage.py#L88 You should also add 'coverage: true' for the ceph task overrides. This way daemons are killed with SIGTERM, and the atexit

Re: Rack Awareness

2013-01-15 Thread Wido den Hollander
Hi, On 01/15/2013 11:17 AM, Gandalf Corvotempesta wrote: Hi all, is ceph able to distribute datas across multiple racks like MooseFS does with Rack Awareness features? For example, let's imagine a ceph cluster made with multiple OSDs distributed in multiple server on multiple racks. Is

Re: Rack Awareness

2013-01-15 Thread Wido den Hollander
On 01/15/2013 11:34 AM, Gandalf Corvotempesta wrote: 2013/1/15 Wido den Hollander w...@widodh.nl: Yes, no problem at all! That's where the crushmap is for. This way you can tell Ceph exactly how to distribute your data. Cool. If I understood properly, I have to configure ceph with OSD in a

Re: [PATCH] configure.ac: check for org.junit.rules.ExternalResource

2013-01-15 Thread Noah Watkins
On Tue, Jan 15, 2013 at 1:32 AM, Danny Al-Gaaf danny.al-g...@bisect.de wrote: Am 15.01.2013 10:04, schrieb James Page: On 12/01/13 16:36, Noah Watkins wrote: On Thu, Jan 10, 2013 at 9:13 PM, Gary Lowell gary.low...@inktank.com wrote: I would also prefer to not add another huge build

Re: code coverage and teuthology

2013-01-15 Thread Josh Durgin
On 01/15/2013 02:10 AM, Loic Dachary wrote: On 01/14/2013 06:26 PM, Josh Durgin wrote: Looking at how it's run automatically might help: https://github.com/ceph/teuthology/blob/master/teuthology/coverage.py#L88 You should also add 'coverage: true' for the ceph task overrides. This way

Re: [PATCH] configure.ac: check for org.junit.rules.ExternalResource

2013-01-15 Thread Gregory Farnum
On Tue, Jan 15, 2013 at 6:55 AM, Noah Watkins jayh...@cs.ucsc.edu wrote: On Tue, Jan 15, 2013 at 1:32 AM, Danny Al-Gaaf danny.al-g...@bisect.de wrote: Am 15.01.2013 10:04, schrieb James Page: On 12/01/13 16:36, Noah Watkins wrote: On Thu, Jan 10, 2013 at 9:13 PM, Gary Lowell

Grid data placement

2013-01-15 Thread Dimitri Maziuk
Hi everyone, quick question: can I get ceph to replicate a bunch of files to every host in compute cluster and then have those hosts read those files from local disk? TFM looks like a custom crush map should get the files to [osd on] every host, but I'm not clear on the read step: do I need an

Re: OSD nodes with =8 spinners, SSD-backed journals, and their performance impact

2013-01-15 Thread Mark Nelson
On 01/15/2013 03:31 AM, Gandalf Corvotempesta wrote: 2013/1/14 Mark Nelson mark.nel...@inktank.com: The advice that I usually give people is that if performance is a big concern, try to match filestore disk and journal performance is nearly matched. In my test setup, I use 1 intel 520 SSD to

Patch rbd: kill create_snap sysfs entry has been added to the 3.4-stable tree

2013-01-15 Thread gregkh
This is a note to let you know that I've just added the patch titled rbd: kill create_snap sysfs entry to the 3.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is:

Re: Grid data placement

2013-01-15 Thread Dimitri Maziuk
On 01/15/2013 12:36 PM, Gregory Farnum wrote: On Tue, Jan 15, 2013 at 10:33 AM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote: At the start of the batch #cores-in-the-cluster processes try to mmap the same 2GB and start reading it from SEEK_SET at the same time. I won't know until I try but I

Re: Grid data placement

2013-01-15 Thread Gregory Farnum
On Tue, Jan 15, 2013 at 11:00 AM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote: On 01/15/2013 12:36 PM, Gregory Farnum wrote: On Tue, Jan 15, 2013 at 10:33 AM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote: At the start of the batch #cores-in-the-cluster processes try to mmap the same 2GB and

Re: Another rbd compatibility issue between 0.48.2argonaut-2 and 0.56.1 ?

2013-01-15 Thread Josh Durgin
On 01/13/2013 11:16 PM, Simon Frerichs | Fremaks GmbH wrote: Hi, we've updated ceph on one node from argonaut to 0.56.1. The osds are working fine but i see the following error: rbd info kvm1395 rbd: error opening image kvm1395: (5) Input/output error 2013-01-14 06:09:10.162206 7efff3aed760 -1

Re: code coverage and teuthology

2013-01-15 Thread Dan Mick
It would not surprise me at all if gcov files are *highly* version dependent. I don't know one way or the other, but it seems very possible. On 01/15/2013 09:21 AM, Josh Durgin wrote: On 01/15/2013 02:10 AM, Loic Dachary wrote: On 01/14/2013 06:26 PM, Josh Durgin wrote: Looking at how it's

test

2013-01-15 Thread Dan Mick
-- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] rbd: define flags field, use it for exists flag

2013-01-15 Thread Dan Mick
Reviewed-by: Dan Mick dan.m...@inktank.com On 01/14/2013 10:50 AM, Alex Elder wrote: Define a new rbd device flags field, manipulated using atomic bit operations. Replace the use of the current exists flag with a bit in this new flags field. Signed-off-by: Alex Elder el...@inktank.com ---

Re: OSD nodes with =8 spinners, SSD-backed journals, and their performance impact

2013-01-15 Thread Mark Nelson
On 01/15/2013 03:24 PM, Gandalf Corvotempesta wrote: 2013/1/15 Mark Nelson mark.nel...@inktank.com: I think the 12 bay supermicro 2U A chassis with 12 spinning disks, 10GbE, and two controllers is potentially a really nice balanced combination. Which chassis are your referring to ? I don't

REMINDER: all argonaut users should upgrade to v0.48.3argonaut

2013-01-15 Thread Sage Weil
That there are some critical bugs that are fixed in v0.48.3, including one that can lead to data loss in power loss or kernel panic situations. Please upgrade if you have not already done so! sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message

Re: Test infrastructure: 2 or more servers?

2013-01-15 Thread Xing Lin
You can change the number replicas in runtime with the following command: $ ceph osd pool set {poolname} size {num-replicas} Xing On 01/15/2013 03:00 PM, Gandalf Corvotempesta wrote: Is possible to change the number of replicas in realtime ? -- To unsubscribe from this list: send the line

Re: Test infrastructure: 2 or more servers?

2013-01-15 Thread Xing Lin
It seems to be: Ceph will shuffle data to rebalance in situations such as when we change the replica num or when some nodes or disks are down. Xing On 01/15/2013 03:26 PM, Gandalf Corvotempesta wrote: 2013/1/15 Xing Lin xing...@cs.utah.edu: You can change the number replicas in runtime with

Re: [PATCH REPOST] libceph: reformat __reset_osd()

2013-01-15 Thread Josh Durgin
On 01/03/2013 11:02 AM, Alex Elder wrote: Reformat __reset_osd() into three distinct blocks of code handling the three return cases. Signed-off-by: Alex Elder el...@inktank.com --- Looks good. With one small duplication removed, Reviewed-by: Josh Durgin josh.dur...@inktank.com

mds: first stab at lookup-by-ino problem/soln description

2013-01-15 Thread Sage Weil
One of the first things we need to fix in the MDS is how we support lookup-by-ino. It's important for fsck, NFS reexport, and (insofar as there are limitations to the current anchor table design) hard links and snapshots. Below is a description of the problem and a rough sketch of my proposed

Re: [PATCH REPOST 1/2] rbd: encapsulate handling for a single request

2013-01-15 Thread Josh Durgin
On 01/03/2013 02:43 PM, Alex Elder wrote: In rbd_rq_fn(), requests are fetched from the block layer and each request is processed, looping through the request's list of bio's until they've all been consumed. Separate the handling for a single request into its own function to make it a bit

Re: [PATCH REPOST 2/2] rbd: a little more cleanup of rbd_rq_fn()

2013-01-15 Thread Josh Durgin
On 01/03/2013 02:43 PM, Alex Elder wrote: Now that a big hunk in the middle of rbd_rq_fn() has been moved into its own routine we can simplify it a little more. Signed-off-by: Alex Elder el...@inktank.com --- Reviewed-by: Josh Durgin josh.dur...@inktank.com drivers/block/rbd.c | 50

Re: [PATCH REPOST] rbd: end request on error in rbd_do_request() caller

2013-01-15 Thread Josh Durgin
On 01/03/2013 02:51 PM, Alex Elder wrote: Only one of the three callers of rbd_do_request() provide a collection structure to aggregate status. If an error occurs in rbd_do_request(), have the caller take care of calling rbd_coll_end_req() if necessary in that one spot. Signed-off-by: Alex

Re: Test infrastructure: 2 or more servers?

2013-01-15 Thread Xing Lin
It seems to be: Ceph will shuffle data to rebalance in situations such as when we change the replica num or when some nodes or disks are down. Xing On 01/15/2013 03:26 PM, Gandalf Corvotempesta wrote: So it's absolutely safe to start with just 2 server, make all the necessary tests and when

Re: [PATCH REPOST 1/2] rbd: make exists flag atomic

2013-01-15 Thread Josh Durgin
On 01/03/2013 02:53 PM, Alex Elder wrote: The rbd_device-exists field can be updated asynchronously, changing from set to clear if a mapped snapshot disappears from the base image's snapshot context. Currently, value of the exists flag is only read and modified under protection of the header

Ceph slow request unstable issue

2013-01-15 Thread Chen, Xiaoxi
Hi list, We are suffering from OSD or OS down when there is continuing high pressure on the Ceph rack. Basically we are on Ubuntu 12.04+ Ceph 0.56.1, 6 nodes, in each nodes with 20 * spindles + 4* SSDs as journal.(120 spindles in total) We create a lots of RBD volumes

Re: [PATCH 1/2] rbd: define flags field, use it for exists flag

2013-01-15 Thread Josh Durgin
On 01/14/2013 01:23 PM, Alex Elder wrote: On 01/14/2013 02:32 PM, Dan Mick wrote: I see that set_bit is atomic, but I don't see that test_bit is. Am I missing a subtlety? That's an interesting observation. I'm certain it's safe, but I needed to research it a bit, and I still haven't

Re: [PATCH REPOST 2/2] rbd: only get snap context for write requests

2013-01-15 Thread Josh Durgin
On 01/03/2013 02:54 PM, Alex Elder wrote: Right now we get the snapshot context for an rbd image (under protection of the header semaphore) for every request processed. There's no need to get the snap context if we're doing a read, so avoid doing so in that case. Note that we no longer need to

Re: [PATCH REPOST 0/4] rbd: four minor patches

2013-01-15 Thread Josh Durgin
On 01/03/2013 11:04 AM, Alex Elder wrote: I'm re-posting my patch backlog, in chunks that may or may not match how they got posted before. This series contains some pretty fairly straightforward changes. -Alex [PATCH REPOST 1/4] rbd: document rbd_spec

Re: [PATCH REPOST 0/2] rbd: standardize some variable names

2013-01-15 Thread Josh Durgin
On 01/03/2013 02:37 PM, Alex Elder wrote: This series just makes the names of variables for certain objects follow a consistent naming convention. -Alex [PATCH REPOST 1/2] rbd: standardize rbd_request variable names [PATCH REPOST 2/2] rbd: standardize

Re: [PATCH REPOST 0/5] rbd: drop some unneeded parameters

2013-01-15 Thread Josh Durgin
On 01/03/2013 03:17 PM, Alex Elder wrote: This series cleans up some parameter lists, eliminating parameters that don't need to be used. -Alex [PATCH REPOST 1/5] rbd: drop oid parameters from ceph_osdc_build_request() [PATCH REPOST 2/5] rbd: drop snapid

Re: [PATCH] libceph: for chooseleaf rules, retry CRUSH map descent from root if leaf is failed

2013-01-15 Thread Sage Weil
Hi Jim- I just realized this didn't make it into our tree. It's now in testing, and will get merged in the next window. D'oh! sage On Fri, 30 Nov 2012, Jim Schutt wrote: Add libceph support for a new CRUSH tunable recently added to Ceph servers. Consider the CRUSH rule step

Adding flashcache for data disk to cache Ceph metadata writes

2013-01-15 Thread Chen, Xiaoxi
Hi List, I have introduced flashcache (https://github.com/facebook/flashcache) aim at reduce Ceph metadata IOs to OSD's disk. Basically, for every data writes, ceph need to write 3 things: Pg log Pg info Actual data First 2 requests are small, but for non-btrfs filesystem, the