Re: RGW in Bobtail

2012-10-31 Thread Yehuda Sadeh
Following on my own message: On Tue, Oct 30, 2012 at 10:36 AM, Yehuda Sadeh yeh...@inktank.com wrote: - Keystone This is not completely implemented yet, but it is likely that it will make it to Bobtail. We'll make it so that Swift authentication (and user management) will be able to go

slow fio random read benchmark, need help

2012-10-31 Thread Alexandre DERUMIER
Hello, I'm doing some tests with fio from a qemu 1.2 guest (virtio disk,cache=none), randread, with 4K block size on a small size of 1G (so it can be handle by the buffer cache on ceph cluster) fio --filename=/dev/vdb -rw=randread --bs=4K --size=1000M --iodepth=40 --group_reporting

Re: [PATCH 5/6] rbd: get additional info in parent spec

2012-10-31 Thread Alex Elder
On 10/30/2012 08:49 PM, Alex Elder wrote: When a layered rbd image has a parent, that parent is identified only by its pool id, image id, and snapshot id. Images that have been mapped also record *names* for those three id's. Add code to look up these names for parent images so they match

Re: slow fio random read benchmark, need help

2012-10-31 Thread Sage Weil
On Wed, 31 Oct 2012, Alexandre DERUMIER wrote: Hello, I'm doing some tests with fio from a qemu 1.2 guest (virtio disk,cache=none), randread, with 4K block size on a small size of 1G (so it can be handle by the buffer cache on ceph cluster) fio --filename=/dev/vdb -rw=randread --bs=4K

new build dep on master

2012-10-31 Thread Sage Weil
apt-get install libboost-program-options-dev on debian-based distros; not sure what the rpm equivalent is yet. sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at

Re: slow fio random read benchmark, need help

2012-10-31 Thread Alexandre DERUMIER
Have you tried increasing the iodepth? Yes, I have try with 100 and 200, same results. I have also try directly from the host, with /dev/rbd1, and I have same result. I have also try with 3 differents hosts, with differents cpus models. (note: I can reach around 40.000 iops with same fio config

Re: slow fio random read benchmark, need help

2012-10-31 Thread Alexandre DERUMIER
Also, I have same results with 8K or 16K block size Don't know if it's help, here a extract of perf dump of 1 mon and 1 osd ceph --admin-daemon ceph-mon.a.asok perf dump

Re: slow fio random read benchmark, need help

2012-10-31 Thread Marcus Sorensen
5000 is actually really good, if you ask me. Assuming everything is connected via gigabit. If you get 40k iops locally, you add the latency of tcp, as well as that of the ceph services and VM layer, and that's what you get. On my network I get about a .1ms round trip on gigabit over the same

active MDS and disk write cache

2012-10-31 Thread Matt Weil
I have a system with a bunch or ram that I want to remain the active MDS but still have a backup. This config doesn't seem to be working. I can make linuscs92 the active by stopping and starting the mds on linuscs95. It would be nice for linuscs92 to be the active from the start.

Re: slow fio random read benchmark, need help

2012-10-31 Thread Alexandre DERUMIER
Hi, I use a small file size (1G), to be sure it can be handle in buffer. (I don't see any read access on disks with iostat during the test) But I think the problem is not the disk hardware ios, but a bottleneck somewhere in the ceph protocol (All benchs I have see in the ceph mailing never reach

Re: new build dep on master

2012-10-31 Thread Gary Lowell
Hi Sage - Sam may have the build machines updated. I'll double check that, and take care of any packaging changes. Cheers, Gary On Oct 31, 2012, at 9:03 AM, Sage Weil wrote: apt-get install libboost-program-options-dev on debian-based distros; not sure what the rpm equivalent is yet.

Re: slow fio random read benchmark, need help

2012-10-31 Thread Alexandre DERUMIER
Thanks Marcus, indeed gigabit ethernet. note that my iscsi results (40k)was with multipath, so multiple gigabit links. I have also done tests with a netapp array, with nfs, single link, I'm around 13000 iops I will do more tests with multiples vms, from differents hosts, and with

Re: slow fio random read benchmark, need help

2012-10-31 Thread Marcus Sorensen
Yes, I was going to say that the most I've ever seen out of gigabit is about 15k iops, with parallel tests and NFS (or iSCSI). Multipathing may not really parallelize the io for you. It can send an io down one path, then move to the next path and send the next io without necessarily waiting for

Re: [PATCH 1/4] rbd: don't pass rbd_dev to rbd_get_client()

2012-10-31 Thread Josh Durgin
Reviewed-by: Josh Durgin josh.dur...@inktank.com On 10/30/2012 02:14 PM, Alex Elder wrote: The only reason rbd_dev is passed to rbd_get_client() is so its rbd_client field can get assigned. Instead, just return the rbd_client pointer as a result and have the caller do the assignment. Change

Re: slow fio random read benchmark, need help

2012-10-31 Thread Alexandre DERUMIER
Yes, I think you are right, round trip with mon must cut by half the performance. I have just done test with 2 parallel fio bench, from 2 differents host, I get 2 x 5000 iops so it must be related to network latency. I have also done tests with --numjob 1000, it doesn't help, same results.

Re: slow fio random read benchmark, need help

2012-10-31 Thread Marcus Sorensen
Come to think of it that 15k iops I mentioned was on 10G ethernet with NFS. I have tried infiniband with ipoib and tcp, it's similar to 10G ethernet. You will need to get creative. What you're asking for really is to have local latencies with remote storage. Just off of the top of my head you may

Re: active MDS and disk write cache

2012-10-31 Thread Sam Lang
On 10/31/2012 12:02 PM, Matt Weil wrote: I have a system with a bunch or ram that I want to remain the active MDS but still have a backup. This config doesn't seem to be working. I can make linuscs92 the active by stopping and starting the mds on linuscs95. It would be nice for linuscs92 to be

Re: active MDS and disk write cache

2012-10-31 Thread Matt Weil
Hi Matt, Can you post your ceph config? Once you startup your ceph cluster, you see that linuscs92 is the standby and linuscs95 is the active? How are you starting your cluster? service ceph -a start and yes linuscs95 comes out as active. [global] ; enable secure authentication

Re: slow fio random read benchmark, need help

2012-10-31 Thread Josh Durgin
On 10/31/2012 11:56 AM, Alexandre DERUMIER wrote: Yes, I think you are right, round trip with mon must cut by half the performance. I just want to note that the monitors aren't in the data path. The client knows how to reach the osds and which osds to talk to based on the osdmap. This is

bobtail timing

2012-10-31 Thread Sage Weil
I would like to freeze v0.55, the bobtail stable release, at the end of next week. If there is any functionality you are working on that should be included, we need to get it into master (preferably well) before that. There will be several weeks of testing in the 'next' branch after that

Re: [PATCH 2/4] rbd: consolidate rbd_dev init in rbd_add()

2012-10-31 Thread Josh Durgin
Reviewed-by: Josh Durgin josh.dur...@inktank.com On 10/30/2012 02:14 PM, Alex Elder wrote: Group the allocation and initialization of fields of the rbd device structure created in rbd_add(). Move the grouped code down later in the function, just prior to the call to rbd_dev_probe(). This is

Re: [PATCH 4/4] rbd: encapsulate last part of probe

2012-10-31 Thread Josh Durgin
Reviewed-by: Josh Durgin josh.dur...@inktank.com On 10/30/2012 02:14 PM, Alex Elder wrote: Group the activities that now take place after an rbd_dev_probe() call into a single function, and move the call to that function into rbd_dev_probe() itself. Signed-off-by: Alex Elder el...@inktank.com

Re: [PATCH 1/6] rbd: skip getting image id if known

2012-10-31 Thread Josh Durgin
Reviewed-by: Josh Durgin josh.dur...@inktank.com On 10/30/2012 06:49 PM, Alex Elder wrote: We will know the image id for format 2 parent images, but won't initially know its image name. Avoid making the query for an image id in rbd_dev_image_id() if it's already known. Signed-off-by: Alex

Re: [PATCH 2/6] rbd: allow null image name

2012-10-31 Thread Josh Durgin
Reviewed-by: Josh Durgin josh.dur...@inktank.com On 10/30/2012 06:49 PM, Alex Elder wrote: Format 2 parent images are partially identified by their image id, but it may not be possible to determine their image name. The name is not strictly needed for correct operation, so we won't be treating

Re: Ceph journal

2012-10-31 Thread Tren Blackburn
On Wed, Oct 31, 2012 at 2:18 PM, Gandalf Corvotempesta gandalf.corvotempe...@gmail.com wrote: In a multi replica cluster (for example, replica = 3) is safe to set journal on a tmpfs? As fa as I understood with journal enabled all writes are wrote on journal and then to disk in a second time.

Re: bobtail timing

2012-10-31 Thread Noah Watkins
Which branch is the freeze taken against? master? On Wed, Oct 31, 2012 at 1:46 PM, Sage Weil s...@inktank.com wrote: I would like to freeze v0.55, the bobtail stable release, at the end of next week. If there is any functionality you are working on that should be included, we need to get it

Re: bobtail timing

2012-10-31 Thread Sage Weil
On Wed, 31 Oct 2012, Noah Watkins wrote: Which branch is the freeze taken against? master? Right. Basically, every 3-4 weeks: - next is tagged as v0.XX - and is merged back into master - next branch is reset to current master - testing branch is reset to just-tagged v0.XX sage On

Re: Ceph journal

2012-10-31 Thread Stefan Kleijkers
Hello, On 10/31/2012 10:24 PM, Tren Blackburn wrote: On Wed, Oct 31, 2012 at 2:18 PM, Gandalf Corvotempesta gandalf.corvotempe...@gmail.com wrote: In a multi replica cluster (for example, replica = 3) is safe to set journal on a tmpfs? As fa as I understood with journal enabled all writes are

Re: Ceph journal

2012-10-31 Thread Sage Weil
On Wed, 31 Oct 2012, Tren Blackburn wrote: On Wed, Oct 31, 2012 at 2:18 PM, Gandalf Corvotempesta gandalf.corvotempe...@gmail.com wrote: In a multi replica cluster (for example, replica = 3) is safe to set journal on a tmpfs? As fa as I understood with journal enabled all writes are wrote

Re: Ceph journal

2012-10-31 Thread Stefan Kleijkers
Hello, On 10/31/2012 10:58 PM, Gandalf Corvotempesta wrote: 2012/10/31 Tren Blackburn t...@eotnetworks.com: Unless you're using btrfs which writes to the journal and osd fs concurrently, if you lose the journal device (such as due to a reboot), you've lost the osd device, requiring it to be

Re: new build dep on master

2012-10-31 Thread Dan Mick
Gary, were you also going to update README? (I know, it's imperfect, but...) On 10/31/2012 10:25 AM, Gary Lowell wrote: Hi Sage - Sam may have the build machines updated. I'll double check that, and take care of any packaging changes. Cheers, Gary On Oct 31, 2012, at 9:03 AM, Sage Weil

Re: Unable to Start Ceph service

2012-10-31 Thread Dan Mick
I've had a long private thread with Hemant, and I believe he's past this problem (in case anyone scans archives looking for open questions). Hemant, it would be best to keep the thread on ceph-devel; you get more people looking and answering. It's a mystery, still, how /usr/bin/ceph-osd ended

Re: new build dep on master

2012-10-31 Thread Gary Lowell
I didn't have that on my todo list, but I'll add it. Cheers, Gary On Oct 31, 2012, at 3:33 PM, Dan Mick wrote: Gary, were you also going to update README? (I know, it's imperfect, but...) On 10/31/2012 10:25 AM, Gary Lowell wrote: Hi Sage - Sam may have the build machines updated.

Rados Performance help needed...

2012-10-31 Thread Ryan Nicholson
Guys: I have some tuning questions. I'm not getting the write speeds I'm expecting, and am open to suggestions. I using Rados, on Ceph 0.48.0. I have 12 OSD's split up (using crush/rados pools) into 2 pools this way: 4 Servers     - Dell 2850's, 12GB ram     - 64-bit

RE: Rados Performance help needed...

2012-10-31 Thread Ryan Nicholson
Correction: Missed a carriage return when I copy/pasted at first, sorry... Ryan -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Ryan Nicholson Sent: Wednesday, October 31, 2012 5:50 PM To: ceph-devel@vger.kernel.org

Re: Ceph journal

2012-10-31 Thread Sébastien Han
Hi, Personally I won't take the risk to loose transactions. If a client writes into a journal, assuming it's the first write and if the server crashs for whatever reason, you have high risk of un-consistent data. Because you just lost what was in the journal. Tmpfs is the cheapest solution for

Re: OSD sizes

2012-10-31 Thread Sébastien Han
Hi, As far I'm concerned I think that 12 disks per servers is way too much. -- Bien cordialement. Sébastien HAN. On Wed, Oct 31, 2012 at 11:13 PM, Gandalf Corvotempesta gandalf.corvotempe...@gmail.com wrote: I'll run a testbed environment made with 3 or 5 DELL R515 servers (12 disks each). I

Need CRYPTO_CXXFLAGS with latest master?

2012-10-31 Thread Noah Watkins
Pushed changes to Makefile.am in branch: wip-make-crypto-flags Several changes to Makefile.am that add CRYPTO_CXXFLAGS to various test targets. I needed these to build after updating to master this afternoon. Not sure if there is something else going on in my environment.. Thanks, Noah --

Re: [PATCH 3/6] rbd: get parent spec for version 2 images

2012-10-31 Thread Josh Durgin
Reviewed-by: Josh Durgin josh.dur...@inktank.com On 10/30/2012 06:49 PM, Alex Elder wrote: Add support for getting the the information identifying the parent image for rbd images that have them. The child image holds a reference to its parent image specification structure. Create a new entry

Re: [PATCH 4/6] libceph: define ceph_pg_pool_name_by_id()

2012-10-31 Thread Josh Durgin
Reviewed-by: Josh Durgin josh.dur...@inktank.com On 10/30/2012 06:49 PM, Alex Elder wrote: Define and export function ceph_pg_pool_name_by_id() to supply the name of a pg pool whose id is given. This will be used by the next patch. Signed-off-by: Alex Elder el...@inktank.com ---

Re: [PATCH 5/6] rbd: get additional info in parent spec

2012-10-31 Thread Josh Durgin
I know you've got a queue of these already, but here's another: rbd_dev_probe_update_spec() could definitely use some warnings to distinguish its error cases. Reviewed-by: Josh Durgin josh.dur...@inktank.com On 10/30/2012 06:49 PM, Alex Elder wrote: When a layered rbd image has a parent, that

Re: [PATCH 6/6] rbd: probe the parent of an image if present

2012-10-31 Thread Josh Durgin
This all makes sense, but it reminds me of another issue we'll need to address: http://www.tracker.newdream.net/issues/2533 We don't need to watch the header of a parent snapshot, since it's immutable and guaranteed not to be deleted out from under us. This avoids the bug referenced above. So I

Re: new build dep on master

2012-10-31 Thread Gary Lowell
Looks like Sam fixed that this morning. Cheers, Gary On Oct 31, 2012, at 3:33 PM, Dan Mick wrote: Gary, were you also going to update README? (I know, it's imperfect, but...) On 10/31/2012 10:25 AM, Gary Lowell wrote: Hi Sage - Sam may have the build machines updated. I'll double

Re: bobtail timing

2012-10-31 Thread Mike Ryan
On Thu, Nov 01, 2012 at 03:12:46AM +, Cl??udio Martins wrote: On Wed, 31 Oct 2012 14:38:28 -0700 (PDT) Sage Weil s...@inktank.com wrote: On Wed, 31 Oct 2012, Noah Watkins wrote: Which branch is the freeze taken against? master? Right. Basically, every 3-4 weeks: - next is

Re: bobtail timing

2012-10-31 Thread Sage Weil
On Thu, 1 Nov 2012, Cl?udio Martins wrote: On Wed, 31 Oct 2012 14:38:28 -0700 (PDT) Sage Weil s...@inktank.com wrote: On Wed, 31 Oct 2012, Noah Watkins wrote: Which branch is the freeze taken against? master? Right. Basically, every 3-4 weeks: - next is tagged as v0.XX - and

Re: Need CRYPTO_CXXFLAGS with latest master?

2012-10-31 Thread Noah Watkins
Whoops, here is the original error: CXXtest_idempotent_sequence.o In file included from ./os/LFNIndex.h:27:0, from ./os/HashIndex.h:20, from ./os/IndexManager.h:26, from ./os/ObjectMap.h:18, from ./os/ObjectStore.h:22,

Re: bobtail timing

2012-10-31 Thread Cláudio Martins
On Wed, 31 Oct 2012 20:17:49 -0700 (PDT) Sage Weil s...@inktank.com wrote: On Thu, 1 Nov 2012, Cl?udio Martins wrote: On Wed, 31 Oct 2012 14:38:28 -0700 (PDT) Sage Weil s...@inktank.com wrote: On Wed, 31 Oct 2012, Noah Watkins wrote: Which branch is the freeze taken against? master?

Re: slow fio random read benchmark, need help

2012-10-31 Thread Alexandre DERUMIER
Come to think of it that 15k iops I mentioned was on 10G ethernet with NFS. I have tried infiniband with ipoib and tcp, it's similar to 10G ethernet. I have see new arista 10GBe switch with latency around 1microsecond, that seem pretty good to do the job. You will need to get creative. What

Re: slow fio random read benchmark, need help

2012-10-31 Thread Stefan Priebe - Profihost AG
Am 01.11.2012 um 06:11 schrieb Alexandre DERUMIER aderum...@odiso.com: Come to think of it that 15k iops I mentioned was on 10G ethernet with NFS. I have tried infiniband with ipoib and tcp, it's similar to 10G ethernet. I have see new arista 10GBe switch with latency around 1microsecond,