2013-7-31, 2:01, Sage Weil wrote:
> Hi Haomai,
>
> On Wed, 31 Jul 2013, Haomai Wang wrote:
>> Every node of ceph cluster has a backend filesystem such as btrfs,
>> xfs and ext4 that provides storage for data objects, whose location
>> are determined by CRUSH algorithm. There should exists an ab
A better format result:
1KB Block
LevelDB with Compress: 1.77MB/s
LevelDB without Compress: 1.12MB/s
Btrfs: 13.84MB/s
4KB Block
LevelDB with Compress: 5.15MB/s
LevelDB without Compress: 3.21MB/s
Btrfs: 12.96MB/s
8KB Block
LevelDB with Compress: 6.44MB/s
LevelDB without Compress: 4.57MB/s
Btrfs:
We have the same idea and already tested the LevelDB Performance VS
Btrfs. The result is negative, especially for big block IO.
1KB Block 4KB
Block8KB Block 128KB Block1MB Block
LevelDB with Compress:1.77MB/s 5.15MB/s
Hi Haomai,
On Wed, 31 Jul 2013, Haomai Wang wrote:
> Every node of ceph cluster has a backend filesystem such as btrfs,
> xfs and ext4 that provides storage for data objects, whose location
> are determined by CRUSH algorithm. There should exists an abstract
> interface sitting between osd and bac
On Tue, Jul 30, 2013 at 3:54 PM, Alex Elsayed wrote:
> I posted this as a comment on the blueprint, but I figured I'd say it here:
>
> The thing I'd worry about here is that LevelDB's performance (along with
> that of various other K/V stores) falls off a cliff for large values.
>
> Symas (who mak
I posted this as a comment on the blueprint, but I figured I'd say it here:
The thing I'd worry about here is that LevelDB's performance (along with
that of various other K/V stores) falls off a cliff for large values.
Symas (who make LMDB, used by OpenLDAP) did some benchmarking that shows
dra
>On Wed, Jul 31, 2013 at 9:36 AM, majianpeng wrote:
>> [snip]
>> I think this patch can do work:
>> Those case which i tested
>> A: filesize=0, buffer=1M
>> B: data[2M] | hole| data[2M], bs= 6M/7M
>
>I don't think your zero buffer change is correct for this test case.
>
dd if=/dev/urandom of=fil
My 0.02, we have done some readahead test tuning on server(ceph osd) side, the
result showing that when readahead = 0.5 * object_size(4M in default), we can
get max read throughput. Readahead value larger than this generally will not
help, but also not harm the performance.
For your case, seems
Every node of ceph cluster has a backend filesystem such as btrfs,
xfs and ext4 that provides storage for data objects, whose location
are determined by CRUSH algorithm. There should exists an abstract
interface sitting between osd and backend store, allowing different
backend store implementation.
[snip]
I think this patch can do work:
Those case which i tested
A: filesize=0, buffer=1M
B: data[2M] | hole| data[2M], bs= 6M/7M
C: data[4m] | hole | hole |data[2M] bs=16M/18M
Are there some case ignore?
Thanks!
Jianpeng Ma
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 2ddf061..96ce893
On Wed, 31 Jul 2013, majianpeng wrote:
> >On Wed, 31 Jul 2013, majianpeng wrote:
> >> >On Tue, Jul 30, 2013 at 7:41 PM, majianpeng wrote:
> [snip]
> >
> >For ceph_osdc_readpages(),
> >
> >> A: ret = ENOENT
> >
> From the original code, for this case we should zero the area.
> Why?
If an object is
>On Wed, 31 Jul 2013, majianpeng wrote:
>> >On Tue, Jul 30, 2013 at 7:41 PM, majianpeng wrote:
[snip]
>
>For ceph_osdc_readpages(),
>
>> A: ret = ENOENT
>
From the original code, for this case we should zero the area.
Why?
Thanks!
Jianpeng Ma
>The object does not exist.
>
>> B: ret = 0
>
>The obj
On Wed, 31 Jul 2013, majianpeng wrote:
> >On Tue, Jul 30, 2013 at 7:41 PM, majianpeng wrote:
>
> >dd if=/dev/urandom bs=1M count=2 of=file_with_holes
> >dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
> >dd if=file_with_holes bs=8M >/dev/null
> >
>
>On Tue, Jul 30, 2013 at 7:41 PM, majianpeng wrote:
>dd if=/dev/urandom bs=1M count=2 of=file_with_holes
>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
>dd if=file_with_holes bs=8M >/dev/null
>
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
in
Hi,
It would be nice to have URLs to the current implementation and the benchmark
results you got in the blueprint.
http://wiki.ceph.com/01Planning/02Blueprints/Emperor/Inline_data_support_%28Step_2%29
Cheers
On 31/07/2013 02:10, Li Wang wrote:
> We have worked out a preliminary implementation
We have worked out a preliminary implementation for inline data support,
and observed obvious speed up for small file access.
The step 2 will focus on (1) Try to make things simpler to eliminate the
state of a file half-inlined; (2) To efficiently deal with the share
write or read/write; (3) P
Hi,
I submitted a blueprint about the current status of the erasure coded pool
implementation. As of now there still is a lot of work to be done to refactor
PG and ReplicatedPG but Samuel Just found the right way to do it, not too long
ago. In a nutshell a PGBackend base class from which Replic
From: Alexey Khoroshilov
Date: Mon, 29 Jul 2013 06:58:08 +0400
> ceph_build_auth() locks ac->mutex and then calls ceph_auth_build_hello()
> that locks the same mutex, i.e. bring itself to deadlock.
>
> The patch moves actual code from ceph_auth_build_hello() to
> ceph_build_hello_auth_request()
Hi,
I will report the issue there as well. Please note that Ceph seems to
support Fedora 17, even though that release is considered end-of-life by
Fedora. This issue with the leveldb package cannot be fixed for Fedora
17, only for 18 and 19.
So if Ceph wants to continue supporting Fedora 17, addin
I posted a blueprint with an alternative approach to tiered storage than
the redirects I mentioned yesterday. Instead of demoting data out of an
existing pool to a colder pool, we could put a faster pool logically in
front of an existing pool as a cache. Think SSD or fusionio or similar.
I t
We measured Cephfs read performance by using iozone on a 32-node HPC
cluster, the Ceph cluster configuration: 24 OSDs (one per node), 1 MDS,
1 -4 Clients (one thread per client per per node). The hardware of a
node: CPU and network are both very powerful to not be bottleneck during
the test, me
Hi Laurent,
Your patch can be applied to 3.8 as described here:
http://dachary.org/?p=2179
Thanks again
On 30/07/2013 12:07, Laurent Barbe wrote:
> Live resize has been added in 3.6.10 for krbd client.
>
> There is a need to do revalidate_disk() on rbd resize :
> https://git.kernel.org/cgit/li
On 30/07/13 09:20, Roald van Loon wrote:
> Came across it this morning when booting my development environment,
> has anyone seen this before?
>
> It's with 0.67-rc2;
>
> 2013-07-30 08:09:09.230349 mon.0 [INF] pgmap v4172: 1216 pgs: 123
> active, 219 active+clean, 161 active+clean+replay, 713 pee
Hello,
Please route this through the subsystem tree. As written in the
description, this shouldn't make any functional difference and just
prepares for the removal of WQ_NON_REENTRANT which is already noop.
Thanks.
-- 8< ---
dbf2576e37 ("workqueue: make all workqueues non-reentrant") ma
On Tue, Jul 30, 2013 at 7:41 PM, majianpeng wrote:
>>>
dd if=/dev/urandom bs=1M count=2 of=file_with_holes
dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
dd if=file_with_holes bs=8M >/dev/null
>>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>>> index 2ddf
>On Tue, Jul 30, 2013 at 7:01 PM, majianpeng wrote:
>>>On Mon, Jul 29, 2013 at 11:00 AM, majianpeng wrote:
[snip]
>I don't think the later was_short can handle the hole case. For the hole
>case,
>we should try reading next strip object instead of return. how about
>
>On Tue, Jul 30, 2013 at 7:01 PM, majianpeng wrote:
>>>On Mon, Jul 29, 2013 at 11:00 AM, majianpeng wrote:
[snip]
>I don't think the later was_short can handle the hole case. For the hole
>case,
>we should try reading next strip object instead of return. how about
>
On Tue, Jul 30, 2013 at 7:01 PM, majianpeng wrote:
>>On Mon, Jul 29, 2013 at 11:00 AM, majianpeng wrote:
>>>
>>> [snip]
>>> >I don't think the later was_short can handle the hole case. For the hole
>>> >case,
>>> >we should try reading next strip object instead of return. how about
>>> >below pa
>On Mon, Jul 29, 2013 at 11:00 AM, majianpeng wrote:
>>
>> [snip]
>> >I don't think the later was_short can handle the hole case. For the hole
>> >case,
>> >we should try reading next strip object instead of return. how about
>> >below patch.
>> >
>> Hi Yan,
>> i uesed this demo to test h
Hi Laurent,
Thanks for the solution !
On 30/07/2013 12:07, Laurent Barbe wrote:
> Live resize has been added in 3.6.10 for krbd client.
>
> There is a need to do revalidate_disk() on rbd resize :
> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=d98df63ea7e87d5df4
Live resize has been added in 3.6.10 for krbd client.
There is a need to do revalidate_disk() on rbd resize :
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=d98df63ea7e87d5df4dce0cece0210e2a777ac00
Cheers
Laurent
Le 30/07/2013 11:57, Loic Dachary a écrit :
Hi,
Tried on another machine running
3.8.0-25-generic #37~precise1-Ubuntu SMP
and the behavior is the same.
Cheers
On 30/07/2013 11:57, Loic Dachary wrote:
>
>
> On 30/07/2013 11:55, Laurent Barbe wrote:
>> Hello Loic,
>>
>> which version of kernel do you use for krbd ?
>
> Linux i-csnces-
On 30/07/2013 11:55, Laurent Barbe wrote:
> Hello Loic,
>
> which version of kernel do you use for krbd ?
Linux i-csnces- 3.2.0-41-generic #66-Ubuntu SMP
That may explain a few things ... :-)
>
> Laurent
>
>
> Le 29/07/2013 23:50, Loic Dachary a écrit :
>> Hi,
>>
>> This works:
>>
>> l
Hello Loic,
which version of kernel do you use for krbd ?
Laurent
Le 29/07/2013 23:50, Loic Dachary a écrit :
Hi,
This works:
lvcreate --name tmp --size 10G all
Logical volume "tmp" created
mkfs.ext4 /dev/all/tmp
mount /dev/all/tmp /mnt
blockdev --getsize64 /dev/all/tmp
10737418240
lvext
On Mon, Jul 29, 2013 at 08:47:00AM -0700, Sage Weil wrote:
> Hi Andreas,
>
> Can you reproduce this (from mkcephfs onward) with debug mds = 20 and
> debug ms = 1? I've seen this crash several times but never been able to
> get to the bottom of it.
... done.
The mds.0 logging file is appended.
Came across it this morning when booting my development environment,
has anyone seen this before?
It's with 0.67-rc2;
2013-07-30 08:09:09.230349 mon.0 [INF] pgmap v4172: 1216 pgs: 123
active, 219 active+clean, 161 active+clean+replay, 713 peering; 241 MB
data, 630 MB used, 5483 MB / 6114 MB avail
Hi,
then the Fedora package is broken. If you check the spec file of:
http://dl.fedoraproject.org/pub/fedora/linux/updates/19/SRPMS/leveldb-1.12.0-3.fc19.src.rpm
You can see the spec-file sets a:
BuildRequires: snappy-devel
But not the corresponding "Requires: snappy-devel" for the devel pac
Hi,
This patch adds two buildrequires to the ceph.spec file, that are needed
to build the rpms under Fedora. Danny Al-Gaaf commented that the
snappy-devel dependency should actually be added to the leveldb-devel
package. I will try to get that fixed too, in the mean time, this patch
does make sure
Hi,
Fedora, in this case Fedora 19, x86_64.
Kind regards,
Erik.
On 07/30/2013 09:29 AM, Danny Al-Gaaf wrote:
> Hi,
>
> I think this is a bug in packaging of the leveldb package in this case
> since the spec-file already sets dependencies on on leveldb-devel.
>
> leveldb depends on snappy, th
Hi,
I think this is a bug in packaging of the leveldb package in this case
since the spec-file already sets dependencies on on leveldb-devel.
leveldb depends on snappy, therefore the leveldb package should set a
dependency on snappy-devel for leveldb-devel (check the SUSE spec file
for leveldb:
h
40 matches
Mail list logo