Am 01.11.2012 um 06:11 schrieb Alexandre DERUMIER :
>>> Come to think of it that 15k iops I mentioned was on 10G ethernet with
>>> NFS. I have tried infiniband with ipoib and tcp, it's similar to 10G
>>> ethernet.
>
> I have see new arista 10GBe switch with latency around 1microsecond, that
> se
>>Come to think of it that 15k iops I mentioned was on 10G ethernet with
>>NFS. I have tried infiniband with ipoib and tcp, it's similar to 10G
>>ethernet.
I have see new arista 10GBe switch with latency around 1microsecond, that seem
pretty good to do the job.
>>You will need to get creative.
On Wed, 31 Oct 2012 20:17:49 -0700 (PDT) Sage Weil wrote:
> On Thu, 1 Nov 2012, Cl?udio Martins wrote:
> > On Wed, 31 Oct 2012 14:38:28 -0700 (PDT) Sage Weil wrote:
> > > On Wed, 31 Oct 2012, Noah Watkins wrote:
> > > > Which branch is the freeze taken against? master?
> > >
> > > Right. Basic
Whoops, here is the original error:
CXXtest_idempotent_sequence.o
In file included from ./os/LFNIndex.h:27:0,
from ./os/HashIndex.h:20,
from ./os/IndexManager.h:26,
from ./os/ObjectMap.h:18,
from ./os/ObjectStore.h:22,
On Thu, 1 Nov 2012, Cl?udio Martins wrote:
> On Wed, 31 Oct 2012 14:38:28 -0700 (PDT) Sage Weil wrote:
> > On Wed, 31 Oct 2012, Noah Watkins wrote:
> > > Which branch is the freeze taken against? master?
> >
> > Right. Basically, every 3-4 weeks:
> >
> > - next is tagged as v0.XX
> >- and
On Thu, Nov 01, 2012 at 03:12:46AM +, Cl??udio Martins wrote:
>
> On Wed, 31 Oct 2012 14:38:28 -0700 (PDT) Sage Weil wrote:
> > On Wed, 31 Oct 2012, Noah Watkins wrote:
> > > Which branch is the freeze taken against? master?
> >
> > Right. Basically, every 3-4 weeks:
> >
> > - next is tag
On Wed, 31 Oct 2012 14:38:28 -0700 (PDT) Sage Weil wrote:
> On Wed, 31 Oct 2012, Noah Watkins wrote:
> > Which branch is the freeze taken against? master?
>
> Right. Basically, every 3-4 weeks:
>
> - next is tagged as v0.XX
>- and is merged back into master
> - next branch is reset to cu
Looks like Sam fixed that this morning.
Cheers,
Gary
On Oct 31, 2012, at 3:33 PM, Dan Mick wrote:
> Gary, were you also going to update README? (I know, it's imperfect, but...)
>
>
> On 10/31/2012 10:25 AM, Gary Lowell wrote:
>> Hi Sage -
>>
>> Sam may have the build machines updated. I'll
This all makes sense, but it reminds me of another issue we'll need to
address:
http://www.tracker.newdream.net/issues/2533
We don't need to watch the header of a parent snapshot, since it's
immutable and guaranteed not to be deleted out from under us.
This avoids the bug referenced above. So I
I know you've got a queue of these already, but here's another:
rbd_dev_probe_update_spec() could definitely use some warnings
to distinguish its error cases.
Reviewed-by: Josh Durgin
On 10/30/2012 06:49 PM, Alex Elder wrote:
When a layered rbd image has a parent, that parent is identified
onl
Reviewed-by: Josh Durgin
On 10/30/2012 06:49 PM, Alex Elder wrote:
Define and export function ceph_pg_pool_name_by_id() to supply
the name of a pg pool whose id is given. This will be used by
the next patch.
Signed-off-by: Alex Elder
---
include/linux/ceph/osdmap.h |1 +
net/ceph/osdm
Reviewed-by: Josh Durgin
On 10/30/2012 06:49 PM, Alex Elder wrote:
Add support for getting the the information identifying the parent
image for rbd images that have them. The child image holds a
reference to its parent image specification structure. Create a new
entry "parent" in /sys/bus/rbd
Pushed changes to Makefile.am in branch:
wip-make-crypto-flags
Several changes to Makefile.am that add CRYPTO_CXXFLAGS to various
test targets. I needed these to build after updating to master this
afternoon.
Not sure if there is something else going on in my environment..
Thanks,
Noah
--
To
Hi,
As far I'm concerned I think that 12 disks per servers is way too much.
--
Bien cordialement.
Sébastien HAN.
On Wed, Oct 31, 2012 at 11:13 PM, Gandalf Corvotempesta
wrote:
> I'll run a testbed environment made with 3 or 5 DELL R515 servers (12
> disks each).
> I have no 10gb ethernet but on
Hi,
Personally I won't take the risk to loose transactions. If a client
writes into a journal, assuming it's the first write and if the server
crashs for whatever reason, you have high risk of un-consistent data.
Because you just lost what was in the journal.
Tmpfs is the cheapest solution for ach
Correction:
Missed a carriage return when I copy/pasted at first, sorry...
Ryan
-Original Message-
From: ceph-devel-ow...@vger.kernel.org
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Ryan Nicholson
Sent: Wednesday, October 31, 2012 5:50 PM
To: ceph-devel@vger.kernel.org
Subjec
Guys:
I have some tuning questions. I'm not getting the write speeds I'm expecting,
and am open to suggestions.
I using Rados, on Ceph 0.48.0. I have 12 OSD's split up (using crush/rados
pools) into 2 pools this way:
4 Servers
- Dell 2850's, 12GB ram
- 64-bit Cent
I didn't have that on my todo list, but I'll add it.
Cheers,
Gary
On Oct 31, 2012, at 3:33 PM, Dan Mick wrote:
> Gary, were you also going to update README? (I know, it's imperfect, but...)
>
>
> On 10/31/2012 10:25 AM, Gary Lowell wrote:
>> Hi Sage -
>>
>> Sam may have the build machines up
I've had a long private thread with Hemant, and I believe he's past this
problem (in case anyone scans archives looking for open questions).
Hemant, it would be best to keep the thread on ceph-devel; you get more
people looking and answering.
It's a mystery, still, how /usr/bin/ceph-osd ended u
Gary, were you also going to update README? (I know, it's imperfect,
but...)
On 10/31/2012 10:25 AM, Gary Lowell wrote:
Hi Sage -
Sam may have the build machines updated. I'll double check that, and take care
of any packaging changes.
Cheers,
Gary
On Oct 31, 2012, at 9:03 AM, Sage Weil w
Hello,
On 10/31/2012 10:58 PM, Gandalf Corvotempesta wrote:
2012/10/31 Tren Blackburn :
Unless you're using btrfs which writes to the journal and osd fs
concurrently, if you lose the journal device (such as due to a
reboot), you've lost the osd device, requiring it to be remade and
re-added.
I
On Wed, 31 Oct 2012, Tren Blackburn wrote:
> On Wed, Oct 31, 2012 at 2:18 PM, Gandalf Corvotempesta
> wrote:
> > In a multi replica cluster (for example, replica = 3) is safe to set
> > journal on a tmpfs?
> > As fa as I understood with journal enabled all writes are wrote on
> > journal and then
Hello,
On 10/31/2012 10:24 PM, Tren Blackburn wrote:
On Wed, Oct 31, 2012 at 2:18 PM, Gandalf Corvotempesta
wrote:
In a multi replica cluster (for example, replica = 3) is safe to set
journal on a tmpfs?
As fa as I understood with journal enabled all writes are wrote on
journal and then to dis
On Wed, 31 Oct 2012, Noah Watkins wrote:
> Which branch is the freeze taken against? master?
Right. Basically, every 3-4 weeks:
- next is tagged as v0.XX
- and is merged back into master
- next branch is reset to current master
- testing branch is reset to just-tagged v0.XX
sage
>
> On
Which branch is the freeze taken against? master?
On Wed, Oct 31, 2012 at 1:46 PM, Sage Weil wrote:
> I would like to freeze v0.55, the "bobtail" stable release, at the end of
> next week. If there is any functionality you are working on that should
> be included, we need to get it into master (
On Wed, Oct 31, 2012 at 2:18 PM, Gandalf Corvotempesta
wrote:
> In a multi replica cluster (for example, replica = 3) is safe to set
> journal on a tmpfs?
> As fa as I understood with journal enabled all writes are wrote on
> journal and then to disk in a second time.
> If node hangs when datas ar
Reviewed-by: Josh Durgin
On 10/30/2012 06:49 PM, Alex Elder wrote:
Format 2 parent images are partially identified by their image id,
but it may not be possible to determine their image name. The name
is not strictly needed for correct operation, so we won't be
treating it as an error if we do
Reviewed-by: Josh Durgin
On 10/30/2012 06:49 PM, Alex Elder wrote:
We will know the image id for format 2 parent images, but won't
initially know its image name. Avoid making the query for an image
id in rbd_dev_image_id() if it's already known.
Signed-off-by: Alex Elder
---
drivers/block/
Reviewed-by: Josh Durgin
On 10/30/2012 02:14 PM, Alex Elder wrote:
Group the activities that now take place after an rbd_dev_probe()
call into a single function, and move the call to that function
into rbd_dev_probe() itself.
Signed-off-by: Alex Elder
---
drivers/block/rbd.c | 161
Reviewed-by: Josh Durgin
On 10/30/2012 02:14 PM, Alex Elder wrote:
Encapsulate the creation/initialization and destruction of rbd
device structures. The rbd_client and the rbd_spec structures
provided on creation hold references whose ownership is transferred
to the new rbd_device structure.
Reviewed-by: Josh Durgin
On 10/30/2012 02:14 PM, Alex Elder wrote:
Group the allocation and initialization of fields of the rbd device
structure created in rbd_add(). Move the grouped code down later in
the function, just prior to the call to rbd_dev_probe(). This is
for the most part simple
I would like to freeze v0.55, the "bobtail" stable release, at the end of
next week. If there is any functionality you are working on that should
be included, we need to get it into master (preferably well) before that.
There will be several weeks of testing in the 'next' branch after that
(p
On 10/31/2012 11:56 AM, Alexandre DERUMIER wrote:
Yes, I think you are right, round trip with mon must cut by half the
performance.
I just want to note that the monitors aren't in the data path.
The client knows how to reach the osds and which osds to talk to based
on the osdmap. This is updat
Hi Matt,
Can you post your ceph config? Once you startup your ceph cluster, you
see that linuscs92 is the standby and linuscs95 is the active? How are
you starting your cluster?
service ceph -a start
and yes linuscs95 comes out as active.
[global]
; enable secure authentication
On 10/31/2012 12:02 PM, Matt Weil wrote:
I have a system with a bunch or ram that I want to remain the active MDS
but still have a backup.
This config doesn't seem to be working. I can make linuscs92 the active
by stopping and starting the mds on linuscs95. It would be nice for
linuscs92 to be
Come to think of it that 15k iops I mentioned was on 10G ethernet with
NFS. I have tried infiniband with ipoib and tcp, it's similar to 10G
ethernet.
You will need to get creative. What you're asking for really is to
have local latencies with remote storage. Just off of the top of my
head you may
Yes, I think you are right, round trip with mon must cut by half the
performance.
I have just done test with 2 parallel fio bench, from 2 differents host,
I get 2 x 5000 iops
so it must be related to network latency.
I have also done tests with --numjob 1000, it doesn't help, same results.
Do
Reviewed-by: Josh Durgin
On 10/30/2012 02:14 PM, Alex Elder wrote:
The only reason rbd_dev is passed to rbd_get_client() is so its
rbd_client field can get assigned. Instead, just return the
rbd_client pointer as a result and have the caller do the
assignment.
Change rbd_put_client() so it ta
Yes, I was going to say that the most I've ever seen out of gigabit is
about 15k iops, with parallel tests and NFS (or iSCSI). Multipathing
may not really parallelize the io for you. It can send an io down one
path, then move to the next path and send the next io without
necessarily waiting for the
Thanks Marcus,
indeed gigabit ethernet.
note that my iscsi results (40k)was with multipath, so multiple gigabit links.
I have also done tests with a netapp array, with nfs, single link, I'm around
13000 iops
I will do more tests with multiples vms, from differents hosts, and with
--numjobs.
Hi Sage -
Sam may have the build machines updated. I'll double check that, and take care
of any packaging changes.
Cheers,
Gary
On Oct 31, 2012, at 9:03 AM, Sage Weil wrote:
> apt-get install libboost-program-options-dev
>
> on debian-based distros; not sure what the rpm equivalent is yet.
>
Hi,
I use a small file size (1G), to be sure it can be handle in buffer. (I don't
see any read access on disks with iostat during the test)
But I think the problem is not the disk hardware ios, but a bottleneck
somewhere in the ceph protocol
(All benchs I have see in the ceph mailing never reach
I have a system with a bunch or ram that I want to remain the active MDS
but still have a backup.
This config doesn't seem to be working. I can make linuscs92 the active
by stopping and starting the mds on linuscs95. It would be nice for
linuscs92 to be the active from the start.
[mds.linus
5000 is actually really good, if you ask me. Assuming everything is
connected via gigabit. If you get 40k iops locally, you add the
latency of tcp, as well as that of the ceph services and VM layer, and
that's what you get. On my network I get about a .1ms round trip on
gigabit over the same switch
Also, I have same results with 8K or 16K block size
Don't know if it's help, here a extract of perf dump of 1 mon and 1 osd
ceph --admin-daemon ceph-mon.a.asok perf dump
{"cluster":{"num_mon":3,"num_mon_quorum":3,"num_osd":15,"num_osd_up":15,"num_osd_in":15,"osd_epoch":54,"osd_kb":2140015680,
>>Have you tried increasing the iodepth?
Yes, I have try with 100 and 200, same results.
I have also try directly from the host, with /dev/rbd1, and I have same result.
I have also try with 3 differents hosts, with differents cpus models.
(note: I can reach around 40.000 iops with same fio confi
apt-get install libboost-program-options-dev
on debian-based distros; not sure what the rpm equivalent is yet.
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordo
On Wed, 31 Oct 2012, Alexandre DERUMIER wrote:
> Hello,
>
> I'm doing some tests with fio from a qemu 1.2 guest (virtio disk,cache=none),
> randread, with 4K block size on a small size of 1G (so it can be handle by
> the buffer cache on ceph cluster)
>
>
> fio --filename=/dev/vdb -rw=randread
On 10/30/2012 08:49 PM, Alex Elder wrote:
> When a layered rbd image has a parent, that parent is identified
> only by its pool id, image id, and snapshot id. Images that have
> been mapped also record *names* for those three id's.
>
> Add code to look up these names for parent images so they mat
Hello,
I'm doing some tests with fio from a qemu 1.2 guest (virtio disk,cache=none),
randread, with 4K block size on a small size of 1G (so it can be handle by the
buffer cache on ceph cluster)
fio --filename=/dev/vdb -rw=randread --bs=4K --size=1000M --iodepth=40
--group_reporting --name=fi
Following on my own message:
On Tue, Oct 30, 2012 at 10:36 AM, Yehuda Sadeh wrote:
> - Keystone
>
> This is not completely implemented yet, but it is likely that it will
> make it to Bobtail. We'll make it so that Swift authentication (and
> user management) will be able to go through Keystone.
51 matches
Mail list logo