...@vger.kernel.org] On Behalf Of Sébastien Han
Sent: 2012年11月22日 5:47
To: Mark Nelson
Cc: Alexandre DERUMIER; ceph-devel; Mark Kampe
Subject: Re: RBD fio Performance concerns
Hi Mark,
Well the most concerning thing is that I have 2 Ceph clusters and both of
them show better rand than seq
Hi,
when i switch the journal to the OSD Disk seperate partiton on each disk
(/dev/sdX1 for journal 1GB and /dev/sdX2 for OSD) i go down from 23.000
iops to 200 iops random 4k.
Greets,
Stefan
Am 22.11.2012 13:50, schrieb Sébastien Han:
journal is running on tmpfs to me but that changes
...@inktank.com
Envoyé: Vendredi 23 Novembre 2012 11:31:15
Objet: Re: RBD fio Performance concerns
Hi,
when i switch the journal to the OSD Disk seperate partiton on each disk
(/dev/sdX1 for journal 1GB and /dev/sdX2 for OSD) i go down from 23.000
iops to 200 iops random 4k.
Greets,
Stefan
Am
Am 23.11.2012 11:47, schrieb Alexandre DERUMIER:
when i switch the journal to the OSD Disk seperate partiton on each disk
(/dev/sdX1 for journal 1GB and /dev/sdX2 for OSD) i go down from 23.000
iops to 200 iops random 4k.
O_o , that's seem crazy...
Are you sure that your partitions are
@vger.kernel.org, Mark Kampe mark.ka...@inktank.com, Sébastien
Han han.sebast...@gmail.com
Envoyé: Vendredi 23 Novembre 2012 11:49:10
Objet: Re: RBD fio Performance concerns
Am 23.11.2012 11:47, schrieb Alexandre DERUMIER:
when i switch the journal to the OSD Disk seperate partiton on each disk
(/dev
Am 23.11.2012 12:03, schrieb Alexandre DERUMIER:
so correcly aligned...
Maybe try to use journal directly on the full partition, without xfs ?
The same - just 200 iops for rand 4k.
Stefan
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to
:49:10
Objet: Re: RBD fio Performance concerns
Am 23.11.2012 11:47, schrieb Alexandre DERUMIER:
when i switch the journal to the OSD Disk seperate partiton on each disk
(/dev/sdX1 for journal 1GB and /dev/sdX2 for OSD) i go down from 23.000
iops to 200 iops random 4k.
O_o , that's seem crazy
...@profihost.ag
À: Alexandre DERUMIER aderum...@odiso.com
Cc: Mark Nelson mark.nel...@inktank.com, ceph-devel
ceph-devel@vger.kernel.org, Mark Kampe mark.ka...@inktank.com,
Sébastien Han han.sebast...@gmail.com
Envoyé: Vendredi 23 Novembre 2012 11:49:10
Objet: Re: RBD fio Performance concerns
Am
Kampe mark.ka...@inktank.com, Sébastien
Han han.sebast...@gmail.com
Envoyé: Vendredi 23 Novembre 2012 14:24:26
Objet: Re: RBD fio Performance concerns
Am 23.11.2012 14:18, schrieb Mark Nelson:
Agreed with Alexandre, try putting the journal on a raw partition.
That's pretty insane! What
, ceph-devel
ceph-devel@vger.kernel.org, Mark Kampe mark.ka...@inktank.com,
Sébastien Han han.sebast...@gmail.com
Envoyé: Vendredi 23 Novembre 2012 11:49:10
Objet: Re: RBD fio Performance concerns
Am 23.11.2012 11:47, schrieb Alexandre DERUMIER:
when i switch the journal to the OSD Disk seperate
-ow...@vger.kernel.org] On Behalf Of Sébastien Han
Sent: 2012年11月22日 5:47
To: Mark Nelson
Cc: Alexandre DERUMIER; ceph-devel; Mark Kampe
Subject: Re: RBD fio Performance concerns
Hi Mark,
Well the most concerning thing is that I have 2 Ceph clusters and both of them
show better rand than seq...
I
Priebe - Profihost AG s.pri...@profihost.ag
À: Alexandre DERUMIER aderum...@odiso.com
Cc: Mark Nelson mark.nel...@inktank.com, ceph-devel
ceph-devel@vger.kernel.org, Mark Kampe mark.ka...@inktank.com,
Sébastien Han han.sebast...@gmail.com
Envoyé: Vendredi 23 Novembre 2012 11:49:10
Objet: Re: RBD fio
Hum sorry, you're right. Forget about what I said :)
On Thu, Nov 22, 2012 at 4:54 PM, Stefan Priebe - Profihost AG
s.pri...@profihost.ag wrote:
I thought the Client would then write to the 2nd is this wrong?
Stefan
Am 22.11.2012 um 16:49 schrieb Sébastien Han han.sebast...@gmail.com:
But
Han han.sebast...@gmail.com
Cc: Mark Nelson mark.nel...@inktank.com, Alexandre DERUMIER
aderum...@odiso.com, ceph-devel ceph-devel@vger.kernel.org, Mark Kampe
mark.ka...@inktank.com
Envoyé: Jeudi 22 Novembre 2012 14:29:03
Objet: Re: RBD fio Performance concerns
Am 22.11.2012 14:22, schrieb
I thought the Client would then write to the 2nd is this wrong?
Stefan
Am 22.11.2012 um 16:49 schrieb Sébastien Han han.sebast...@gmail.com:
But who cares? it's also on the 2nd node. or even on the 3rd if you have
replicas 3.
Yes but you could also suffer a crash while writing the first
Am 22.11.2012 13:50, schrieb Sébastien Han:
journal is running on tmpfs to me but that changes nothing.
I don't think it works then. According to the doc: Enables using
libaio for asynchronous writes to the journal. Requires journal dio
set to true.
Ah might be but as the SSDs are pretty
mark.ka...@inktank.com
Envoyé: Jeudi 22 Novembre 2012 14:29:03
Objet: Re: RBD fio Performance concerns
Am 22.11.2012 14:22, schrieb Sébastien Han:
And RAMDISK devices are too expensive.
It would make sense in your infra, but yes they are really expensive.
We need something like tmpfs - running
...@gmail.com
À: Mark Kampe mark.ka...@inktank.com
mailto:mark.ka...@inktank.com
Cc: Alexandre DERUMIER aderum...@odiso.com
mailto:aderum...@odiso.com, ceph-devel
ceph-devel@vger.kernel.org mailto:ceph-devel@vger.kernel.org
Envoyé: Lundi 19 Novembre 2012 19:03:40
Objet: Re: RBD fio Performance
Otherwise you would have the same problem with the disk crashes
Am 22.11.2012 um 16:55 schrieb Sébastien Han han.sebast...@gmail.com:
Hum sorry, you're right. Forget about what I said :)
On Thu, Nov 22, 2012 at 4:54 PM, Stefan Priebe - Profihost AG
s.pri...@profihost.ag wrote:
I thought
14:29:03
Objet: Re: RBD fio Performance concerns
Am 22.11.2012 14:22, schrieb Sébastien Han:
And RAMDISK devices are too expensive.
It would make sense in your infra, but yes they are really expensive.
We need something like tmpfs - running in local memory but support dio.
Stefan
--
Mark
Am 22.11.2012 14:22, schrieb Sébastien Han:
And RAMDISK devices are too expensive.
It would make sense in your infra, but yes they are really expensive.
We need something like tmpfs - running in local memory but support dio.
Stefan
--
To unsubscribe from this list: send the line
@vger.kernel.org, Mark
Kampe mark.ka...@inktank.com, Sébastien Han han.sebast...@gmail.com
Envoyé: Jeudi 22 Novembre 2012 15:42:14
Objet: Re: RBD fio Performance concerns
Am 22.11.2012 15:37, schrieb Mark Nelson:
I don't think we recommend tmpfs at all for anything other than playing
around. :)
I
s.pri...@profihost.ag
À: Sébastien Han han.sebast...@gmail.com
Cc: Mark Nelson mark.nel...@inktank.com, Alexandre DERUMIER
aderum...@odiso.com, ceph-devel ceph-devel@vger.kernel.org,
Mark Kampe mark.ka...@inktank.com
Envoyé: Jeudi 22 Novembre 2012 14:29:03
Objet: Re: RBD fio Performance concerns
Am
Am 22.11.2012 11:49, schrieb Sébastien Han:
@Alexandre: cool!
@ Stefan: Full SSD cluster and 10G switches?
Yes
Couple of weeks ago I saw
that you use journal aio, did you notice performance improvement with it?
journal is running on tmpfs to me but that changes nothing.
Stefan
--
To
À: Mark Nelson mark.nel...@inktank.com
Cc: Alexandre DERUMIER aderum...@odiso.com, ceph-devel
ceph-devel@vger.kernel.org, Mark Kampe mark.ka...@inktank.com, Sébastien
Han han.sebast...@gmail.com
Envoyé: Jeudi 22 Novembre 2012 15:42:14
Objet: Re: RBD fio Performance concerns
Am 22.11.2012 15
: Lundi 19 Novembre 2012 19:03:40
Objet: Re: RBD fio Performance concerns
@Sage, thanks for the info :)
@Mark:
If you want to do sequential I/O, you should do it buffered
(so that the writes can be aggregated) or with a 4M block size
(very efficient and avoiding object serialization).
The original
...@inktank.com
Envoyé: Mercredi 21 Novembre 2012 22:47:08
Objet: Re: RBD fio Performance concerns
Hi Mark,
Well the most concerning thing is that I have 2 Ceph clusters and both
of them show better rand than seq...
I don't have enough background to argue on your assomptions but I
could try
Objet: Re: RBD fio Performance concerns
Am 22.11.2012 15:46, schrieb Mark Nelson:
I haven't played a whole lot with SSD only OSDs yet (other than noting
last summer that iop performance wasn't as high as I wanted it). Is a
second partition on the SSD for the journal not an option for you
...@inktank.com, Alexandre DERUMIER aderum...@odiso.com, ceph-devel
ceph-devel@vger.kernel.org, Mark Kampe mark.ka...@inktank.com
Envoyé: Jeudi 22 Novembre 2012 14:29:03
Objet: Re: RBD fio Performance concerns
Am 22.11.2012 14:22, schrieb Sébastien Han:
And RAMDISK devices are too expensive
Sequential is faster than random on a disk, but we are not
doing I/O to a disk, but a distributed storage cluster:
small random operations are striped over multiple objects and
servers, and so can proceed in parallel and take advantage of
more nodes and disks. This parallelism can
...@gmail.com, Mark
Nelson mark.nel...@inktank.com
Envoyé: Jeudi 22 Novembre 2012 16:28:57
Objet: Re: RBD fio Performance concerns
Am 22.11.2012 16:26, schrieb Alexandre DERUMIER:
Haven't tested that. But does this makes sense? I mean data goes to Disk
journal - same disk then has to copy
original -
De: Sébastien Han han.sebast...@gmail.com
À: Mark Kampe mark.ka...@inktank.com
Cc: Alexandre DERUMIER aderum...@odiso.com, ceph-devel
ceph-devel@vger.kernel.org
Envoyé: Lundi 19 Novembre 2012 19:03:40
Objet: Re: RBD fio Performance concerns
@Sage, thanks for the info :)
@Mark
: Jeudi 22 Novembre 2012 14:29:03
Objet: Re: RBD fio Performance concerns
Am 22.11.2012 14:22, schrieb Sébastien Han:
And RAMDISK devices are too expensive.
It would make sense in your infra, but yes they are really expensive.
We need something like tmpfs - running in local memory but support dio
: Alexandre DERUMIER aderum...@odiso.com
Cc: ceph-devel ceph-devel@vger.kernel.org, Mark Kampe
mark.ka...@inktank.com, Sébastien Han han.sebast...@gmail.com, Mark
Nelson mark.nel...@inktank.com
Envoyé: Jeudi 22 Novembre 2012 16:28:57
Objet: Re: RBD fio Performance concerns
Am 22.11.2012 16:26
...@odiso.com, ceph-devel
ceph-devel@vger.kernel.org, Mark Kampe mark.ka...@inktank.com, Sébastien
Han han.sebast...@gmail.com
Envoyé: Jeudi 22 Novembre 2012 16:01:56
Objet: Re: RBD fio Performance concerns
Am 22.11.2012 15:46, schrieb Mark Nelson:
I haven't played a whole lot with SSD only OSDs yet
But who cares? it's also on the 2nd node. or even on the 3rd if you have
replicas 3.
Yes but you could also suffer a crash while writing the first replica.
If the journal is in tmpfs, there is nothing to replay.
On Thu, Nov 22, 2012 at 4:35 PM, Alexandre DERUMIER aderum...@odiso.com wrote:
journal is running on tmpfs to me but that changes nothing.
I don't think it works then. According to the doc: Enables using
libaio for asynchronous writes to the journal. Requires journal dio
set to true.
On Thu, Nov 22, 2012 at 12:48 PM, Stefan Priebe - Profihost AG
s.pri...@profihost.ag
: RBD fio Performance concerns
@Sage, thanks for the info :)
@Mark:
If you want to do sequential I/O, you should do it buffered
(so that the writes can be aggregated) or with a 4M block size
(very efficient and avoiding object serialization).
The original benchmark has been performed with 4M
Kampe mark.ka...@inktank.com
Cc: Alexandre DERUMIER aderum...@odiso.com, ceph-devel
ceph-devel@vger.kernel.org
Envoyé: Lundi 19 Novembre 2012 19:03:40
Objet: Re: RBD fio Performance concerns
@Sage, thanks for the info :)
@Mark:
If you want to do sequential I/O, you should do it buffered
(so
: Sébastien Han han.sebast...@gmail.com
À: Alexandre DERUMIER aderum...@odiso.com
Cc: ceph-devel ceph-devel@vger.kernel.org, Mark Kampe
mark.ka...@inktank.com
Envoyé: Lundi 19 Novembre 2012 21:57:59
Objet: Re: RBD fio Performance concerns
Which iodepth did you use for those benchs?
I really
Hello Mark,
First of all, thank you again for another accurate answer :-).
I would have expected write aggregation and cylinder affinity to
have eliminated some seeks and improved rotational latency resulting
in better than theoretical random write throughput. Against those
expectations
mark.ka...@inktank.com
Cc: ceph-devel ceph-devel@vger.kernel.org
Envoyé: Lundi 19 Novembre 2012 15:56:35
Objet: Re: RBD fio Performance concerns
Hello Mark,
First of all, thank you again for another accurate answer :-).
I would have expected write aggregation and cylinder affinity to
have
-devel@vger.kernel.org
Envoy?: Lundi 19 Novembre 2012 15:56:35
Objet: Re: RBD fio Performance concerns
Hello Mark,
First of all, thank you again for another accurate answer :-).
I would have expected write aggregation and cylinder affinity to
have eliminated some seeks and improved
Recall:
1. RBD volumes are striped (4M wide) across RADOS objects
2. distinct writes to a single RADOS object are serialized
Your sequential 4K writes are direct, depth=256, so there are
(at all times) 256 writes queued to the same object. All of
your writes are waiting through a very
@Sage, thanks for the info :)
@Mark:
If you want to do sequential I/O, you should do it buffered
(so that the writes can be aggregated) or with a 4M block size
(very efficient and avoiding object serialization).
The original benchmark has been performed with 4M block size. And as
you can see
during read bench)
- Mail original -
De: Sébastien Han han.sebast...@gmail.com
À: Mark Kampe mark.ka...@inktank.com
Cc: Alexandre DERUMIER aderum...@odiso.com, ceph-devel
ceph-devel@vger.kernel.org
Envoyé: Lundi 19 Novembre 2012 19:03:40
Objet: Re: RBD fio Performance concerns
@Sage
Hello Mark,
See below my benchmarks results:
-RADOS Bench with 4M block size write:
# rados -p bench bench 300 write -t 32 --no-cleanup
Maintaining 32 concurrent writes of 4194304 bytes for at least 300 seconds.
2012-11-19 21:35:01.722143min lat: 0.255396 max lat: 8.40212 avg lat: 1.14076
Objet: Re: RBD fio Performance concerns
Which iodepth did you use for those benchs?
I really don't understand why I can't get more rand read iops with 4K block
...
Me neither, hope to get some clarification from the Inktank guys. It
doesn't make any sense to me...
--
Bien cordialement
On 11/15/2012 12:23 PM, Sébastien Han wrote:
First of all, I would like to thank you for this well explained,
structured and clear answer. I guess I got better IOPS thanks to the 10K disks.
10K RPM would bring your per-drive throughput (for 4K random writes)
up to 142 IOPS and your aggregate
49 matches
Mail list logo