Re: [ceph-users] dealing with the full osd / help reweight

2016-03-25 Thread lin zhou
Yeah,I think the main reason is the setting of pg_num and pgp_num of some key pool. This site will tell you the correct value:http://ceph.com/pgcalc/ Before you adjust pg_num and pgp_num,if this is a product environment,you should set as Christian Balzer said: ---  osd_max_backfills = 1  osd_ba

Re: [ceph-users] Ceph-fuse huge performance gap between different block sizes

2016-03-25 Thread Christian Balzer
On Fri, 25 Mar 2016 09:17:08 + Zhang Qiang wrote: > Hi Christian, Thanks for your reply, here're the test specs: > >>> > [global] > ioengine=libaio > runtime=90 > direct=1 There it is. You do understand what that flag does and what latencies are, right? You're basically telling the I/O stack

Re: [ceph-users] Question about cache tier and backfill/recover

2016-03-25 Thread Christian Balzer
On Fri, 25 Mar 2016 14:14:37 -0700 Bob R wrote: > Mike, > > Recovery would be based on placement groups and those degraded groups > would only exist on the storage pool(s) rather than the cache tier in > this scenario. > Precisely. They are entirely different entities. There may be partially id

Re: [ceph-users] Losing data in healthy cluster

2016-03-25 Thread Christian Balzer
Hello, this was of course discussed here in the very recent thread "data corruption with hammer" Read it, it contains fixes and a workaround as well. Also from that thread: http://tracker.ceph.com/issues/12814 You don't need to remove the cache tier to fix things. And as also discussed here,

[ceph-users] Losing data in healthy cluster

2016-03-25 Thread Blade Doyle
Help, my Ceph cluster is losing data slowly over time. I keep finding files that are the same length as they should be, but all the content has been lost & replaced by nulls. Here is an example: (from a backup I have the original file) [root@blotter docker]# ls -lart /backup/space/docker/ceph-m

Re: [ceph-users] pg incomplete second osd in acting set still available

2016-03-25 Thread John-Paul Robinson
So one more update. I suspect I may need to do more than force the secondary osd to become the primary due to the reported sate of the pg. The reported pg state reflects that it contains what it thinks is the correct but inaccurate pg state. In the dump for one of the pg's below the version tim

Re: [ceph-users] pg incomplete second osd in acting set still available

2016-03-25 Thread John-Paul Robinson
So I think I know what might have gone wrong. When I took might osd's out of the cluster and shut them down, the first set of osds likely came back up and in the cluster before 300 seconds expired. This would have prevented cluster triggering recovery of the pg from the replica osd. So the quest

Re: [ceph-users] Question about cache tier and backfill/recover

2016-03-25 Thread Bob R
Mike, Recovery would be based on placement groups and those degraded groups would only exist on the storage pool(s) rather than the cache tier in this scenario. Bob On Fri, Mar 25, 2016 at 8:30 AM, Mike Miller wrote: > Hi, > > in case of a failure in the storage tier, say single OSD disk failu

[ceph-users] pg incomplete second osd in acting set still available

2016-03-25 Thread John-Paul Robinson
Hi Folks, One last dip into my old bobtail cluster. (new hardware is on order) I have three pg in an incomplete state. The cluster was previously stable but with a health warn state due to a few near full osds. I started resizing drives on one host to expand space after taking the osds that se

[ceph-users] Question about cache tier and backfill/recover

2016-03-25 Thread Mike Miller
Hi, in case of a failure in the storage tier, say single OSD disk failure or complete system failure with several OSD disks, will the remaining cache tier (on other nodes) be used for rapid backfilling/recovering first until it is full? Or is backfill/recovery done directly to the storage tier

Re: [ceph-users] Ceph-fuse huge performance gap between different block sizes

2016-03-25 Thread Jan Schermer
FYI when I performed testing on our cluster I saw the same thing. fio randwrite 4k test over a large volume was a lot faster with larger RBD object size (8mb was marginally better than the default 4mb). It makes no sense to me unless there is a huge overhead with increasing number of objects. Or

Re: [ceph-users] xfs: v4 or v5?

2016-03-25 Thread Jan Schermer
V5 is supposedly stable, but that only means it will be just as bad as any other XFS. I recommend avoiding XFS whenever possible. Ext4 works perfectly and I never lost any data with it, even when it got corrupted, while XFS still likes to eat the data when something goes wrong (and it will, lik

[ceph-users] xfs: v4 or v5?

2016-03-25 Thread Dzianis Kahanovich
Before adding/replacing new OSDs: What version of xfs is preferred by ceph developers/testers now? Time ago I move all to v5 (crc=1,finobt=1), it works, exclude "logbsize=256k,logbufs=8" in 4.4. Now I see, v5 is default mode (xfsprogs & kernel 4.5 at least). I in doubts: make new OSDs old-style v

Re: [ceph-users] after upgrade from 0.80.11 to 0.94.6, rbd cmd core dump

2016-03-25 Thread archer.wudong
sorry, it's my fault. I have found the problem: because on that host, there is wrong version librados.so, this version is built long time ago and I forgot this, so it mislead me. I have removed it and link to the right one. 2016-03-25 archer.wudong 发件人:Dong Wu 发送时间:2016-03-25 16:02 主题:af

Re: [ceph-users] Ceph-fuse huge performance gap between different block sizes

2016-03-25 Thread Zhang Qiang
Hi Christian, Thanks for your reply, here're the test specs: >>> [global] ioengine=libaio runtime=90 direct=1 group_reporting iodepth=16 ramp_time=5 size=1G [seq_w_4k_20] bs=4k filename=seq_w_4k_20 rw=write numjobs=20 [seq_w_1m_20] bs=1m filename=seq_w_1m_20 rw=write numjobs=20 Test results

Re: [ceph-users] Ceph-fuse huge performance gap between different block sizes

2016-03-25 Thread Christian Balzer
Hello, On Fri, 25 Mar 2016 08:11:27 + Zhang Qiang wrote: > Hi all, > > According to fio, Exact fio command please. >with 4k block size, the sequence write performance of > my ceph-fuse mount Exact mount options, ceph config (RBD cache) please. >is just about 20+ M/s, only 200 Mb of 1 G

[ceph-users] Ceph-fuse huge performance gap between different block sizes

2016-03-25 Thread Zhang Qiang
Hi all, According to fio, with 4k block size, the sequence write performance of my ceph-fuse mount is just about 20+ M/s, only 200 Mb of 1 Gb full duplex NIC outgoing bandwidth was used for maximum. But for 1M block size the performance could achieve as high as 1000 M/s, approaching the limit of t

[ceph-users] after upgrade from 0.80.11 to 0.94.6, rbd cmd core dump

2016-03-25 Thread Dong Wu
Hi all, I upgraded my cluster from 0.80.11 to 0.94.6, everything is ok except that rbd cmd cord dump on one host and success on others. I have disabled auth in ceph.conf: auth_cluster_required = none auth_service_required = none auth_client_required = none here is the core message. $ sudo rbd ls