vember 20, 2015 7:28 PM
> To: Chen, Xiaoxi
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: Aggregate failure report in ceph -s
>
> On Fri, 20 Nov 2015, Chen, Xiaoxi wrote:
> >
> > Hi Sage,
> >
> > As we are looking at the failure detection part of
&
PM
> To: Chen, Xiaoxi
> Cc: ceph-devel@vger.kernel.org
> Subject: RE: Cannot start osd due to permission of journal raw device
>
> On Mon, 9 Nov 2015, Chen, Xiaoxi wrote:
> > There is no such rules (only 70-persistent-net.rules) in my
> > /etc/udev/ruled.d/
> >
> &
gt; To: Chen, Xiaoxi
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: Cannot start osd due to permission of journal raw device
>
> On Fri, 6 Nov 2015, Chen, Xiaoxi wrote:
> > Hi,
> > I tried infernalis (version 9.1.0
> (3be81ae6cf17fcf689cd6f187c4615249fea4f61)) bu
Hi,
I tried infernalis (version 9.1.0
(3be81ae6cf17fcf689cd6f187c4615249fea4f61)) but failed due to permission of
journal , the OSD was upgraded from hammer(also true for newly created OSD).
I am using raw device as journal, this is because the default privilege of
raw block is
Hi Ning,
Yes, we doesn’t save any IO, or may even need more IO as read amplification by
LevelDB. But the tradeoff is using SSD IOPS instead of HDD IOPS, IOPS/$$ in
SSD(10K+ IOPS per $100) is 2 order cheaper than that of in an HDD( 100 IOPS per
$100).
Some use case:
1.When we have enough
As we use submit_transaction(instead of submit_transaction_sync) in
DBObjectmap, and we also don't use a kv_sync_thread for DB. Seems we need to
rely on the syncfs(2) at commit time for persist everything?
If that is the case, moving db out of the same FS as Data may cause issue?
>
5 4:13 PM
> To: Chen, Xiaoxi
> Cc: ceph-devel@vger.kernel.org
> Subject: RE: chooseleaf may cause some unnecessary pg migrations
>
> I just realized the measurement I mentioned last time is not precise. It
> should be 'number of changed mappings' instead of 'number of remapped
&g
Nelson [mailto:mnel...@redhat.com]
> Sent: Wednesday, October 21, 2015 9:36 PM
> To: Allen Samuels; Sage Weil; Chen, Xiaoxi
> Cc: James (Fei) Liu-SSI; Somnath Roy; ceph-devel@vger.kernel.org
> Subject: Re: newstore direction
>
> Thanks Allen! The devil is always in the details
usangdi
> Cc: Chen, Xiaoxi; ceph-devel@vger.kernel.org
> Subject: RE: chooseleaf may cause some unnecessary pg migrations
>
> On Mon, 19 Oct 2015, Xusangdi wrote:
> >
> > > -Original Message-
> > > From: ceph-devel-ow...@vger.kernel.org
> > > [m
+1, nowadays K-V DB care more about very small key-value pairs, say several
bytes to a few KB, but in SSD case we only care about 4KB or 8KB. In this way,
NVMKV is a good design and seems some of the SSD vendor are also trying to
build this kind of interface, we had a NVM-L library but still
There is something like : http://pmem.io/nvml/libpmemobj/ to adapt NVMe to
transactional object storage.
But definitely need some more works
> -Original Message-
> From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> ow...@vger.kernel.org] On Behalf Of Varada Kari
> Sent:
t: Friday, October 16, 2015 2:44 PM
> To: Chen, Xiaoxi
> Cc: ceph-devel@vger.kernel.org
> Subject: RE: chooseleaf may cause some unnecessary pg migrations
>
> Sorry if I didn't state that clearly.
>
> Like you did, the performance is measured by the number of PGs remapped
> betw
.com]
> Sent: Friday, October 16, 2015 2:12 PM
> To: Chen, Xiaoxi
> Cc: ceph-devel@vger.kernel.org
> Subject: RE: chooseleaf may cause some unnecessary pg migrations
>
>
>
> > -Original Message-
> > From: Chen, Xiaoxi [mailto:xiaoxi.c...@intel.com]
> > Sent:
I did some test by using crushtool --test (together with this PR and #PR 6004)
It doesn't help in quantities way.
In a 40 OSDs, 10 OSDs per Nodes demo crush map, test with rep = 2. 4096 PGs, in
each run I will randomly kick off an OSD (reweight to 0) and compared the PG
mapping. If any OSD in
How many osds do you have? I wonder if the overlay layer is that large to keep
160K obj(which is 64K * 32 on default, per OSD)?
> -Original Message-
> From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> ow...@vger.kernel.org] On Behalf Of Tomy Cheru
> Sent: Thursday, October
Hi Mark,
The Async result in 128K drops quickly after some point, is that because
of the testing methodology?
Other conclusion looks to me like simple messenger + Jemalloc is the best
practice till now as it has the same performance as async but using much less
memory?
Hi Casey,
Would it better if we create an integration brunch on
ceph/ceph/wip-fio-objstore to allow more people try and improve it? Seems
James has some patches.
-Xiaoxi
> -Original Message-
> From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> ow...@vger.kernel.org]
FWIW, blkid works well in both GPT(created by parted) and MSDOS(created by
fdisk) in my environment.
But blkid doesn't show the information of disk in external bay (which is
connected by a JBOD controller) in my setup.
See below, SDB and SDH are SSDs attached to the front panel but the rest
This is kind of unsolvable problem, in CAP , we choose Consistency and
Availability, thus we had to lose Partition tolerance.
There are three networks here , mon<-> osd, osd<-public->osd, osd<- cluster->
osd. If some of the networks are reachable but some are not, likely the
flipping will
That require some kind of driver in Ceph(see XFSFileStoreBackend.cc/h and
BTRFSFileStoreBackend.cc/h), you need to implement a GPFSFileStoreBackend in
Ceph.
But Why you want OSD on top of GPFS?
-Original Message-
From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
FWIW, I often see performance increase when favoring inode/dentry cache, but
probably with far fewer inodes that the setup you just saw. It sounds like
there
needs to be some maximum limit on the inode/dentry cache to prevent this
kind of behavior but still favor it up until that point.
Hi Mark,
Really good test:) I only played a bit on SSD, the parallel WAL threads
really helps but we still have a long way to go especially on all-ssd case.
I tried this
https://github.com/facebook/rocksdb/blob/master/util/env_posix.cc#L1515 by
hacking the rocksdb, but the performance
Hi Mark,
You may miss this tunable: newstore_sync_wal_apply, which is default
to true, but would be better to make if false.
If sync_wal_apply is true, WAL apply will be don synchronize (in
kv_sync_thread) instead of WAL thread. See
if (g_conf-newstore_sync_wal_apply) {
I think this is great since when we trying to optimize WAL, we set the
write_buffer and memtable very aggressive, which will case read amplification.
I was worring about it but now we can have separate column family :
Write optimized for big stuff(WAL, and overlay)-trying to
work.
How do you think
Xiaoxi
-Original Message-
From: Sage Weil [mailto:sw...@redhat.com]
Sent: Tuesday, April 21, 2015 12:48 AM
To: Chen, Xiaoxi
Cc: Mark
-
From: Mark Nelson [mailto:mnel...@redhat.com]
Sent: Wednesday, April 22, 2015 7:59 AM
To: Sage Weil; Chen, Xiaoxi
Cc: Haomai Wang; Somnath Roy; Duan, Jiangang; Zhang, Jian; ceph-devel
Subject: Re: 回复: Re: 回复: Re: 回复: Re: NewStore performance analysis
On 04/21/2015 06:57 PM, Sage Weil wrote
Sage Weil编写
On Tue, 21 Apr 2015, Chen, Xiaoxi wrote:
Haomai is right in theory, but I am not sure whether all
user(mon,filestore,kvstore) of submit_transaction API clearly holding
the expectation that their data is not persistent and may lost in
failure. So in rocksdb now
[Resend in plain text]
Hi,
I have played some tunable on RocksDB these days, try to optimize the
performance of Newstore. From the data now ,seems the WA of RocksDB is not
the issue that blocking the performance, and also seems not the fragment
part(aio/dio, etc). The issue might be
Sage Weil编写
On Mon, 20 Apr 2015, Chen, Xiaoxi wrote:
[Resend in plain text]
Hi,
I have played some tunable on RocksDB these days, try to optimize
the performance of Newstore. From the data now ,seems the WA of RocksDB
is not the issue that blocking
;
Xiaoxi
-Original Message-
From: Mark Nelson [mailto:mnel...@redhat.com]
Sent: Friday, April 17, 2015 8:11 PM
To: Sage Weil
Cc: Somnath Roy; Chen, Xiaoxi; Haomai Wang; ceph-devel
Subject: Re: Regarding newstore performance
On 04/16/2015 07:38 PM, Sage Weil wrote:
On Thu
10519532 145653264 7% /var/lib/ceph/osd/ceph-0
-Original Message-
From: Mark Nelson [mailto:mnel...@redhat.com]
Sent: Friday, April 17, 2015 8:11 PM
To: Sage Weil
Cc: Somnath Roy; Chen, Xiaoxi; Haomai Wang; ceph-devel
Subject: Re: Regarding newstore performance
On 04/16/2015 07:38 PM
batch,
29.4 MB user ingest, stall time: 0 us
Interval WAL: 15180 writes, 15179 syncs, 1.00 writes per sync, 0.03 MB written
-Original Message-
From: Haomai Wang [mailto:haomaiw...@gmail.com]
Sent: Friday, April 17, 2015 10:20 PM
To: Chen, Xiaoxi
Cc: Mark Nelson; Sage Weil; Somnath Roy; ceph
...@gregs42.com]
Sent: Friday, April 17, 2015 8:48 AM
To: Sage Weil
Cc: Mark Nelson; Somnath Roy; Chen, Xiaoxi; Haomai Wang; ceph-devel
Subject: Re: Regarding newstore performance
On Thu, Apr 16, 2015 at 5:38 PM, Sage Weil s...@newdream.net wrote:
On Thu, 16 Apr 2015, Mark Nelson wrote:
On 04/16
Hi Somnath,
You could try apply this one:)
https://github.com/ceph/ceph/pull/4356
BTW the previous RocksDB configuration has a bug that set
rocksdb_disableDataSync to true by default, which may cause data loss in
failure. So pls update the newstore to latest or manually set it
Hi mark,
Really thanks for the data.
Not sure if this PR will be merged soon (https://github.com/ceph/ceph/pull/4266)
Some known bugs around :
`rados ls` will cause assert fault (which was fix by the PR)
`rbd list` will also cause assert failure (because omap_iter hasn’t
We can always use a structure database in an unstructured way, I think it's
workable in theory, but why choose MySQL?
As discussed some while ago, any LSM structured database design will suffer in
performance due to write amplification, is that the reason goes to MySQL only
about prevent
This is due to the implicit type cast in the compiler, when
st-f_blocks (used_bytes/st-f_bsize),
the minus should be a negative ,but the compiler take it as an unsigned value
Fix is proposed in https://github.com/ceph/ceph/pull/3451
4GB if you're going to be writing for 60 seconds with 4K objects at 20K IOPS.
Thanks,
Stephen
-Original Message-
From: Chen, Xiaoxi
Sent: Thursday, January 22, 2015 1:39 AM
To: mnel...@redhat.com; Blinick, Stephen L; Ceph Development
Subject: RE: Memstore issue on v0.91
This is due
2527428193 169
std dev 12.5935
vs 12.5983 (expected).
Xiaoxi
-Original Message-
From: Sage Weil [mailto:sw...@redhat.com]
Sent: Friday, January 16, 2015 10:22 AM
To: Chen
) ;
}
Xiaoxi
-Original Message-
From: ceph-devel-ow...@vger.kernel.org
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Sage Weil
Sent: Friday, November 21, 2014 9:33 AM
To: Chen, Xiaoxi
Cc: jianpeng; ceph-devel@vger.kernel.org; Cook
/621c2a7dc2bc9724e9d2106b52aa9eedd2c793e8
xiaoxi
-Original Message-
From: Sage Weil [mailto:s...@newdream.net]
Sent: Friday, November 21, 2014 1:30 AM
To: Chen, Xiaoxi
Cc: jianpeng; ceph-devel@vger.kernel.org; Cook, Nigel
Subject: RE
DERUMIER [mailto:aderum...@odiso.com]
Sent: Thursday, October 16, 2014 2:25 PM
To: Chen, Xiaoxi
Cc: ceph-devel@vger.kernel.org; Mark Nelson
Subject: Re: 10/14/2014 Weekly Ceph Performance Meeting
We also find this before, seems it's because QEMU use single thread for IO, I
tried to enable the debug
We also find this before, seems it's because QEMU use single thread for IO, I
tried to enable the debug log of in librbd, find that the threaded is always
the same. Assuming the backend is power enough, so how many IOs can be sent out
by qemu == how many IOPS can we get.
The upper bound may
Have you ever seen large readahead_kb would hear random performance?
We usually set it to very large (2M) , the random read performance keep steady,
even in all SSD setup. Maybe with your optimization code for OP_QUEUE, the
things may different?
-Original Message-
From:
Same question as Somnath, some customer of us not feeling that comfortable
with cache, they still have some consistent concern.
-Original Message-
From: ceph-devel-ow...@vger.kernel.org
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Somnath Roy
Sent: Thursday, September 18,
Hi Nicheal,
1. The main purpose of journal is provide transaction semantics (prevent
partially update). Peer is not enough for this need because ceph writes all
replica at the same time, so when crush, you have no idea about which replica
has right data. For example, say if we have 2 replica,
...@vger.kernel.org
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Alexandre DERUMIER
Sent: Thursday, September 18, 2014 5:13 AM
To: Mark Nelson
Cc: Somnath Roy; ??; ceph-devel@vger.kernel.org; Chen, Xiaoxi
Subject: Re: puzzled with the design pattern of ceph journal, really ruining
performance
The rule has max_size, can we just use that value?
-Original Message-
From: ceph-devel-ow...@vger.kernel.org
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Johnu George (johnugeo)
Sent: Thursday, September 18, 2014 6:41 AM
To: Loic Dachary; ceph-devel
Subject: Re: [ceph-users]
1. 12% wa is quite normal, with more disks and more load ,you could even see
30%+ on random write case
2. Our BKM is set osd_op_threads to 20,
-Original Message-
From: ceph-devel-ow...@vger.kernel.org
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of yue longguang
Sent: Thursday,
can we set the cache_min_evict_age to a reasonable larger number (say 5min?
10min?) to walk around the window?-If a request cannot finished in
minutes, that indicate there should be some issue in the cluster.
-Original Message-
From: ceph-devel-ow...@vger.kernel.org
Hi list,
I tried to understand the output of CEPH DF , finally I got it but it's
really confusing in the POOLS section, so I sent out the mail and see if any
good suggestion to make it better.
Here is an example from my cluster
GLOBAL:
SIZE AVAIL RAW
Hi List,
The
CrushWrapper(https://github.com/ceph/ceph/blob/master/src/crush/CrushWrapper.cc)
is a wrapper for crush so we can use such C++ wrapper instead of playing
directly with crush C-API.
But currently, in CrushWrapper, the member struct crush_map *crush is
a public
在 2014-8-15,17:26,Loic Dachary l...@dachary.org 写道:
On 15/08/2014 11:20, Loic Dachary wrote:
Hi,
I've added a few comments inline at
https://github.com/xiaoxichen/ceph/commit/354c09131a64ac1e1a67c71794d1a3bab8334ca8
. Could you explain in pseudo code, in the commit message, what
I think before we start bug fix or try to get rid of ruleset concept, we can
start with define a reasonable use case. How we expect user to play with rule
and pools. there is no CLI to create/modify a ruleset, even worse , you are
not able to get the ruleset id without dump a rule.
currently
, the ID for myrule1 is 3. So they simply
type in ceph osd pool set mypool1 crush_ruleset 3
In most case, this works, but actually this is not the right way to do.
-Original Message-
From: Loic Dachary [mailto:l...@dachary.org]
Sent: Friday, August 8, 2014 11:11 PM
To: Chen, Xiaoxi
Make sense, would you mind me to take this job? I will start with the
conversion function in monitor.
在 2014-8-9,0:08,Sage Weil sw...@redhat.com 写道:
On Fri, 8 Aug 2014, Chen, Xiaoxi wrote:
For my side, I have seen some guys(actually more than 80% of the user I have
seen in university
estimate date or plan for when we will introduce these stuff?
-Original Message-
From: Sage Weil [mailto:s...@inktank.com]
Sent: Friday, August 09, 2013 1:06 PM
To: Chen, Xiaoxi
Cc: ceph-devel@vger.kernel.org
Subject: Re: Could we introduce launchpad/gerrit for ceph
Hi,
??Now it?s a bit hard
Hi,
Now it’s a bit hard for us to track the bugs, review the submits, and track
the blueprints. We do have a bug tracking system, but most of the time it
doesn’t connect with a github submit link. We have email review , pull
requests, and also some internal mechanism inside inktank , we do
My 0.02, we have done some readahead test tuning on server(ceph osd) side, the
result showing that when readahead = 0.5 * object_size(4M in default), we can
get max read throughput. Readahead value larger than this generally will not
help, but also not harm the performance.
For your case,
PM
To: Chen, Xiaoxi
Cc: ceph-devel@vger.kernel.org; ceph-us...@ceph.com
Subject: Re: Any concern about Ceph on CentOS
Hi Xiaoxi,
we are really running Ceph on CentOS-6.4
(6 server nodes, 3 client nodes, 160 OSDs).
We put a 3.8.13 Kernel on top and installed the ceph-0.61.4 cluster with
mkcephfs
Hi list,
I would like to ask if anyone really run Ceph on CentOS/RHEL? Since the
kernel version for Cent/RHEL is much older than that of Ubuntu, I am thinking
about whether we have some known performance/functionality issue?
Thanks for everyone could share your insight for Ceph+CentOS.
Hi,
From the code, each pipe (contains a TCP socket) will fork 2 threads,
a reader and a writer. We really observe 100+ threads per OSD daemon with 30
instances of rados bench as clients.
But this number seems a bit crazy, if I have a 40 disks node, thus I
will have 40 OSDs,
threads. This is still too high for 8 core or 16 core cpu/cpus and will waste a
lot of cycles in context switchinh.
发自我的 iPhone
在 2013-6-7,0:21,Gregory Farnum g...@inktank.com 写道:
On Thu, Jun 6, 2013 at 12:25 AM, Chen, Xiaoxi xiaoxi.c...@intel.com wrote:
Hi,
From the code, each pipe
-Original Message-
From: Gregory Farnum [mailto:g...@inktank.com]
Sent: 2013年6月4日 0:37
To: Chen, Xiaoxi
Cc: ceph-devel@vger.kernel.org; Mark Nelson (mark.nel...@inktank.com);
ceph-us...@ceph.com
Subject: Re: [ceph-users] Ceph killed by OS because of OOM under high load
On Mon, Jun 3, 2013 at 8:47 AM
Hi,
As my previous mail reported some weeks ago ,we are suffering from OSD
crash/ OSD Flipping / System reboot and etc, all these unstable issue really
stop us from digging further into ceph characterization.
Good news is that we seems find out the cause, I explain our
.
Xiaoxi
-Original Message-
From: Chen, Xiaoxi
Sent: 2013年5月16日 6:38
To: 'Sage Weil'
Subject: RE: [ceph-users] OSD state flipping when cluster-network in high
utilization
Uploaded to /home/cephdrop/xiaoxi_flip_osd/osdlog.tar.gz
Thanks
-Original Message-
From
4103'5330 (3853'4329,4103'5330] local-les=4092 n=154 ec
=100 les/c 4092/4093 4091/4091/4034) [319,46] r=0 lpr=4091 mlcod 4103'5329
active+clean] do_op mode now rmw(wr=0)
-Original Message-
From: Sage Weil [mailto:s...@inktank.com]
Sent: 2013年5月15日 11:40
To: Chen, Xiaoxi
Cc: Mark Nelson
Thanks, but i am not quite understand how to determine weather monitor
overloaded? and if yes,will start several monitor help?
发自我的 iPhone
在 2013-5-15,23:07,Jim Schutt jasc...@sandia.gov 写道:
On 05/14/2013 09:23 PM, Chen, Xiaoxi wrote:
How responsive generally is the machine under load
from which release can we get this?
发自我的 iPhone
在 2013-5-14,8:36,Sage Weil s...@inktank.com 写道:
Hi Jim-
You mentioned the other day your concerns about the uniformity of the PG
and data distribution. There are several ways to attack it (including
increasing the number of PGs), but one
% io wait).Enabling jumbo frame **seems**
make things worth.(just feeling.no data supports)
发自我的 iPhone
在 2013-5-14,23:36,Mark Nelson mark.nel...@inktank.com 写道:
On 05/14/2013 10:30 AM, Sage Weil wrote:
On Tue, 14 May 2013, Chen, Xiaoxi wrote:
Hi
We are suffering our OSD flipping
related with CPU scheduler ? The
heartbeat thread (in busy OSD ) failed to get enough cpu cycle.
-Original Message-
From: ceph-devel-ow...@vger.kernel.org
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Sage Weil
Sent: 2013年5月15日 7:23
To: Chen, Xiaoxi
Cc: Mark Nelson; ceph-devel
, I
believe, though I'm not sure how much detail they include there versus
in the QAs).
On Fri, Apr 12, 2013 at 7:32 PM, Chen, Xiaoxi xiaoxi.c...@intel.com wrote:
We are also discussing this internally, and come out with an idea to walk
around it(Only for RBD case,havent think about Obj store
We are also discussing this internally, and come out with an idea to walk
around it(Only for RBD case,havent think about Obj store),but not yet tested.
If Mark and Greg can provide some feedback,that would be great.
We are trying to write a script to generate some pools,for rack A,there is a
If this feature works, I suppose we can have incremental backup RBD(2
copies,SSD based) to Rados ( 3 Copies,HDD based) to achieve higher HA with
quite a few extra cost :)
-Original Message-
From: ceph-devel-ow...@vger.kernel.org
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of
Rephrase it to make it more clear
From: ceph-users-boun...@lists.ceph.com
[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Chen, Xiaoxi
Sent: 2013年3月25日 17:02
To: 'ceph-us...@lists.ceph.com' (ceph-us...@lists.ceph.com)
Cc: ceph-devel@vger.kernel.org
Subject: [ceph-users] Ceph Crach
[mailto:s...@inktank.com]
Sent: 2013年3月25日 23:35
To: Chen, Xiaoxi
Cc: 'ceph-us...@lists.ceph.com' (ceph-us...@lists.ceph.com);
ceph-devel@vger.kernel.org
Subject: Re: [ceph-users] Ceph Crach at sync_thread_timeout after heavy random
writes.
Hi Xiaoxi,
On Mon, 25 Mar 2013, Chen, Xiaoxi wrote
can we have a review system like review.openstack.com?
发自我的 iPhone
在 2013-3-20,7:10,Guilhem Lettron guil...@lettron.fr 写道:
Glad to see this openness! Everyone isn't like you.
And I hope to see less [PATCH] in mailing-list, but maybe it's only a dream.
Just my two cents.
On Tue, Mar 19,
But can we change the pg_num of a pool when the pool contains data? If yes, how
to do this?
-Original Message-
From: ceph-devel-ow...@vger.kernel.org
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Sage Weil
Sent: 2013年2月6日 9:50
To: Mandell Degerness
Cc:
I doubt your data is correct ,even the ext4 data, have you use O_DIRECT when
doing the test? It's unusual to have 2X random write IOPS than random read.
CephFS kernel client seems not stable enough, think twice before you use it.
From your previous mail I guess you would like to do some caching
] 1.523 deep-scrub ok
2013-02-01 16:38:12.301511 osd.117 [INF] 1.442 deep-scrub ok
2013-02-01 16:38:12.390220 osd.214 [INF] 2.26c deep-scrub ok
-Original Message-
From: Sage Weil [mailto:s...@inktank.com]
Sent: 2013年2月1日 4:01
To: Jim Schutt
Cc: Chen, Xiaoxi; ceph-devel@vger.kernel.org
Subject
Hi list,
I just rebuild my ceph setup with 6 nodes, (20 sata+ 4 ssd as journal
+10GbE ) per node,software stack is Ubuntu 12.10+ Kernel 3.6.3 + xfs+ceph
0.56.2. Before build up ceph cluster , I have checked all my disks can reach
90MB+/s for sequential write and 100MB+/s for sequential
[The following views only behalf of myself, not relate with Intel..]
Looking forward for the performance data on Atom.
Atom perform badly in Swift, but since Ceph is slightly efficient than Swift,
it must be better.
I have some concern about weather Atom can support such high throughput( you
Hi list,
I have got the following log when I running test on top of Ceph. Seems this
part of codes are quite fresh(not yet appear in 0.56.1), any idea about what
happen?
pgs=714 cs=11 l=0).reader got old message 1 = 6 0x4552800 osd_map(363..375 src
has 1..375) v3, discarding
2013-01-25
Is there any known connection with the previous discussion Hit suicide
timeout after adding new osd or Ceph unstable on XFS ?
-Original Message-
From: ceph-devel-ow...@vger.kernel.org
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Sage Weil
Sent: 2013年1月22日 14:06
To:
Hi list,
When first time I start my ceph cluster,it takes more than 15 minutes
to get all the pg activeclean. It's fast at first (say 100pg/s) but quite slow
when only hundreds of PG left peering.
Is it a common situation? Since there is quite a few disk IO and
network IO
Hi List,
Here is part of /etc/init.d/ceph script:
case $command in
start)
# Increase max_open_files, if the configuration calls for it.
get_conf max_open_files 8192 max open files
if [ $max_open_files != 0 ]; then
# Note: Don't try
)
Xiaoxi
-Original Message-
From: Sage Weil [mailto:s...@inktank.com]
Sent: 2013年1月17日 10:35
To: Chen, Xiaoxi
Subject: RE: Ceph slow request unstable issue
On Thu, 17 Jan 2013, Chen, Xiaoxi wrote:
No, on the OSD node, not the same node. OSD node with 3.2 kernel while
client node with 3.6
Hi,
I have also seen the same warning even when I use v0.56.1 (both kernel
rbd and OSD side) when write stress is high enough(Say I have 3 osds but having
4~5 clients doing dd on top of the rbd).
2013-01-15 15:54:05.990052 7ff97dd0c700 0 log [WRN] : slow request 32.545624
seconds old,
Hi list,
We are suffering from OSD or OS down when there is continuing high
pressure on the Ceph rack.
Basically we are on Ubuntu 12.04+ Ceph 0.56.1, 6 nodes, in each nodes
with 20 * spindles + 4* SSDs as journal.(120 spindles in total)
We create a lots of RBD volumes
...@inktank.com]
Sent: 2013年1月16日 5:43
To: Chen, Xiaoxi
Cc: Mark Nelson; Yan, Zheng
Subject: RE: Seperate metadata disk for OSD
On Tue, 15 Jan 2013, Chen, Xiaoxi wrote:
Hi Sage,
FlashCache works well for this scenarios, I created a hybrid-disk with
1 ssd partition(shared the same ssd
!
Xiaoxi
-Original Message-
From: Sage Weil [mailto:s...@inktank.com]
Sent: 2013年1月13日 0:57
To: Chen, Xiaoxi
Cc: Mark Nelson; Yan, Zheng ; ceph-devel@vger.kernel.org
Subject: RE: Seperate metadata disk for OSD
On Sat, 12 Jan 2013, Chen, Xiaoxi wrote
: Mark Nelson [mailto:mark.nel...@inktank.com]
Sent: 2013年1月12日 21:36
To: Yan, Zheng
Cc: Chen, Xiaoxi; ceph-devel@vger.kernel.org
Subject: Re: Seperate metadata disk for OSD
Hi Xiaoxi and Zheng,
We've played with both of these some internally, but not for a production
deployment. Mostly just
Hi list,
For a rbd write request, Ceph need to do 3 writes:
2013-01-10 13:10:15.539967 7f52f516c700 10 filestore(/data/osd.21)
_do_transaction on 0x327d790
2013-01-10 13:10:15.539979 7f52f516c700 15 filestore(/data/osd.21) write
meta/516b801c/pglog_2.1a/0//-1 36015~147
2013-01-10
Hi,
Setting rep size to 3 only make the data triple-replication, that means
when you fail all OSDs in 2 out of 3 DCs, the data still accessable.
But Monitor is another story, for monitor clusters with 2N+1 nodes, it
require at least N+1 nodes alive, and indeed this is why you
Hi Han,
I have a cluster with 8 nodes(each node with 1 SSD as journal and 3 7200
rpm sata disk as data disk), each OSD consist of 1 sata disk together with one
30G partition from the SSD.So in total I have 24 OSDs.
My test method is start 24VMs and 24 RBD volumes, make the VM and
Hi list,
I am thinking about the possibility to add some primitive in CRUSH to meet
the following user stories:
A. Same host, Same rack
To balance between availability and performance ,one may like such a
rule: 3 Replicas, Replica 1 and Replica 2 should in the same rack while
96 matches
Mail list logo