Re: [ceph-users] Local SSD cache for ceph on each compute node.

2016-03-29 Thread Van Leeuwen, Robert
On 3/27/16, 9:59 AM, "Ric Wheeler" wrote: >On 03/16/2016 12:15 PM, Van Leeuwen, Robert wrote: >>> My understanding of how a writeback cache should work is that it should >>> only take a few seconds for writes to be streamed onto the network and is >>> focussed on resolving the speed issue

Re: [ceph-users] HELP Ceph Errors won't allow vm to start

2016-03-29 Thread Brian ::
Hi Dan, Various proxmox daemons don't look happy on startup also. Are you using a single samsung SSD for your OSD journals on this host? Is that SSD ok? Brian On Tue, Mar 29, 2016 at 5:22 AM, Dan Moses wrote: > Any suggestions to fix this issue? We are using Ceph with proxmox and vms > won’t

[ceph-users] unsubscribe ceph-users

2016-03-29 Thread zengyijie
unsubscribe ceph-users ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] dealing with the full osd / help reweight

2016-03-29 Thread Jacek Jarosiewicz
On 03/25/2016 04:39 AM, Christian Balzer wrote: Hello, ID WEIGHT REWEIGHT SIZE USEAVAIL %USE VAR 0 1.0 1.0 5585G 2653G 2931G 47.51 0.85 1 1.0 1.0 5585G 2960G 2624G 53.02 0.94 2 1.0 1.0 5585G 3193G 2391G 57.18 1.02 10 1.0 1.0 3723

Re: [ceph-users] dealing with the full osd / help reweight

2016-03-29 Thread Jacek Jarosiewicz
Thanks! I've set the parameters to the lower values and now recovery process doesn't disrupt the rados gateway! Regards, J On 03/26/2016 04:09 AM, lin zhou wrote: Yeah,I think the main reason is the setting of pg_num and pgp_num of some key pool. This site will tell you the correct value:http

Re: [ceph-users] Local SSD cache for ceph on each compute node.

2016-03-29 Thread Ric Wheeler
On 03/29/2016 10:06 AM, Van Leeuwen, Robert wrote: On 3/27/16, 9:59 AM, "Ric Wheeler" wrote: On 03/16/2016 12:15 PM, Van Leeuwen, Robert wrote: My understanding of how a writeback cache should work is that it should only take a few seconds for writes to be streamed onto the network and i

Re: [ceph-users] dealing with the full osd / help reweight

2016-03-29 Thread Christian Balzer
Hello, On Tue, 29 Mar 2016 10:32:35 +0200 Jacek Jarosiewicz wrote: > On 03/25/2016 04:39 AM, Christian Balzer wrote: > > > > Hello, > > > >> > >> ID WEIGHT REWEIGHT SIZE USEAVAIL %USE VAR > >>0 1.0 1.0 5585G 2653G 2931G 47.51 0.85 > >>1 1.0 1.0 5585G 2960G

Re: [ceph-users] HELP Ceph Errors won't allow vm to start

2016-03-29 Thread Dan Moses
Our setup matches this one exactly for Proxmox and Ceph https://pve.proxmox.com/wiki/Ceph_Server. The brand of SSDs may not be the same but they are the same sizes or larger and are Enterprise quality. FilesystemSize Used Avail Use% Mounted on udev 10M 0 1

Re: [ceph-users] dealing with the full osd / help reweight

2016-03-29 Thread Jacek Jarosiewicz
On 03/29/2016 11:35 AM, Christian Balzer wrote: Hello, On Tue, 29 Mar 2016 10:32:35 +0200 Jacek Jarosiewicz wrote: I very specifically and intentionally wrote "ceph osd crush reweight" in my reply above. While your current state of affairs is better, it is not permanent ("ceph osd reweight" s

Re: [ceph-users] HELP Ceph Errors won't allow vm to start

2016-03-29 Thread Dan Moses
Also I see these 2 errors but not sure if they are preventing our host from starting any VMs. Any suggestions for action to correct this? root@pm3:~# fdisk -l Disk /dev/ram0: 64 MiB, 67108864 bytes, 131072 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes

Re: [ceph-users] Local SSD cache for ceph on each compute node.

2016-03-29 Thread Van Leeuwen, Robert
>>> If you try to look at the rbd device under dm-cache from another host, of >>> course >>> any data that was cached on the dm-cache layer will be missing since the >>> dm-cache device itself is local to the host you wrote the data from >>> originally. >> And here it can (and probably will) go

Re: [ceph-users] HELP Ceph Errors won't allow vm to start

2016-03-29 Thread Oliver Dzombic
Hi Dan, the full root partition is the very first thing you have to solve. This >can< be responsible for the missbehaviour, but it is for >sure< a general problem you >need< to solve. So: 1. Clean / 2. Restart the server 3. Check if its working, and if not, what are the exact error messages If

Re: [ceph-users] Local SSD cache for ceph on each compute node.

2016-03-29 Thread Ric Wheeler
On 03/29/2016 01:35 PM, Van Leeuwen, Robert wrote: If you try to look at the rbd device under dm-cache from another host, of course any data that was cached on the dm-cache layer will be missing since the dm-cache device itself is local to the host you wrote the data from originally. And here it

Re: [ceph-users] HELP Ceph Errors won't allow vm to start

2016-03-29 Thread Dan Moses
Cleaned up the old logs. All host nodes are happy now in a quorum. However we cannot start any VMs. trying to aquire lock...TASK ERROR: can't lock file '/var/lock/qemu-server/lock-106.conf' - got timeout -- Hi Dan, the full root partition i

Re: [ceph-users] HELP Ceph Errors won't allow vm to start

2016-03-29 Thread Oliver Dzombic
Hi Dan, good. --- Please run the command manually. For now this is a proxmox specific problem, that something is the way, proxmox does not like. But why, we dont know. You need to provide more info. So run the command manually or search in the logs for some more info around this task error.

Re: [ceph-users] HELP Ceph Errors won't allow vm to start

2016-03-29 Thread Dan Moses
The error I pasted is what we got when we ran the qm start command. Here is something that could be the cause. I can tell from the error that we need some tweaking but is there anything I can do to just allow us to start vms for now? root@pm3:~# ceph health HEALTH_WARN 380 pgs backfill; 32 pg

Re: [ceph-users] Local SSD cache for ceph on each compute node.

2016-03-29 Thread Sage Weil
On Tue, 29 Mar 2016, Ric Wheeler wrote: > > However, if the write cache would would be "flushed in-order" to Ceph > > you would just lose x seconds of data and, hopefully, not have a > > corrupted disk. That could be acceptable for some people. I was just > > stressing that that isn’t the case.

Re: [ceph-users] HELP Ceph Errors won't allow vm to start

2016-03-29 Thread Oliver Dzombic
Hi Dan, please try to access the rbd volume via rados tools. If its working ( you can list images ) then the problem is not ceph. If its not working, then you should take care of the ceph cluster first and make it healthy(er). At first you should correct the mistake with the pg numbers. -- Mi

Re: [ceph-users] Local SSD cache for ceph on each compute node.

2016-03-29 Thread Ric Wheeler
On 03/29/2016 03:42 PM, Sage Weil wrote: On Tue, 29 Mar 2016, Ric Wheeler wrote: However, if the write cache would would be "flushed in-order" to Ceph you would just lose x seconds of data and, hopefully, not have a corrupted disk. That could be acceptable for some people. I was just stressing t

Re: [ceph-users] Local SSD cache for ceph on each compute node.

2016-03-29 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Ric Wheeler > Sent: 29 March 2016 14:07 > To: Sage Weil > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Local SSD cache for ceph on each compute node. > > On 03/29/2016 03:42 PM,

Re: [ceph-users] Local SSD cache for ceph on each compute node.

2016-03-29 Thread Ric Wheeler
On 03/29/2016 04:35 PM, Nick Fisk wrote: One thing I picked up on when looking at dm-cache for doing caching with RBD's is that it wasn't really designed to be used as a writeback cache for new writes, as in how you would expect a traditional writeback cache to work. It seems all the policies are

Re: [ceph-users] Local SSD cache for ceph on each compute node.

2016-03-29 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Ric Wheeler > Sent: 29 March 2016 14:40 > To: Nick Fisk ; 'Sage Weil' > Cc: ceph-users@lists.ceph.com; device-mapper development de...@redhat.com> > Subject: Re: [ceph-users] Local SSD cache

Re: [ceph-users] radosgw_agent sync issues

2016-03-29 Thread ceph new
resend any one can help ? On Thu, Mar 17, 2016 at 5:44 PM, ceph new wrote: > HI > i setup 2 clusters and in using radosgw_agent to sync them last week the > sync stop working if on runinig the agent from command line i see its stuck > on 2 files in the console im geting : > 2016-03-17 21:11:57,3

Re: [ceph-users] Local SSD cache for ceph on each compute node.

2016-03-29 Thread Ric Wheeler
On 03/29/2016 04:53 PM, Nick Fisk wrote: -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Ric Wheeler Sent: 29 March 2016 14:40 To: Nick Fisk ; 'Sage Weil' Cc: ceph-users@lists.ceph.com; device-mapper development Subject: Re: [ceph-users] Local

[ceph-users] Scrubbing a lot

2016-03-29 Thread German Anders
Hi All, I've maybe a simple question, I've setup a new cluster with Infernalis release, there's no IO going on at the cluster level and I'm receiving a lot of these messages: 2016-03-29 12:22:07.462818 mon.0 [INF] pgmap v158062: 8192 pgs: 8192 active+clean; 20617 MB data, 46164 MB used, 52484 GB

Re: [ceph-users] Scrubbing a lot

2016-03-29 Thread Samuel Just
That seems to be scrubbing pretty often. Can you attach a config diff from osd.4 (ceph daemon osd.4 config diff)? -Sam On Tue, Mar 29, 2016 at 9:30 AM, German Anders wrote: > Hi All, > > I've maybe a simple question, I've setup a new cluster with Infernalis > release, there's no IO going on at t

Re: [ceph-users] Scrubbing a lot

2016-03-29 Thread German Anders
Sure, also the scrubbing is happening on all the osds :S # ceph --cluster cephIB daemon osd.4 config diff { "diff": { "current": { "admin_socket": "\/var\/run\/ceph\/cephIB-osd.4.asok", "auth_client_required": "cephx", "filestore_fd_cache_size": "102

[ceph-users] Ceph upgrade questions

2016-03-29 Thread Shain Miley
Hello all, We are currently running ceph version 0.80.11 on Ubuntu 12.04.5 LTS. I would like to upgrade the cluster so that we can stay a little more current on the releases and hopefully take advantage of some of the exciting new features coming out soon. My questions are: 1)Which version

Re: [ceph-users] librbd on opensolaris/illumos

2016-03-29 Thread Gregory Farnum
On Mon, Mar 28, 2016 at 9:55 PM, Sumit Gaur wrote: > Hello , > Can anybody let me know if ceph team is working on porting of librbd on > openSolaris like it did for librados ? Nope, this isn't on anybody's roadmap. -Greg ___ ceph-users mailing list ceph

Re: [ceph-users] Redirect snapshot COW to alternative pool

2016-03-29 Thread Gregory Farnum
On Sat, Mar 26, 2016 at 3:13 PM, Nick Fisk wrote: > > Evening All, > > I’ve been testing the RBD snapshot functionality and one thing that I have > seen is that once you take a snapshot of a RBD and perform small random IO on > the original RBD, performance is really bad due to the amount of wri

[ceph-users] Latest ceph branch for using Infiniband/RoCE

2016-03-29 Thread Wenda Ni
Dear all, We try to leverage RDMA as the underlying data transfer protocol to run ceph. A quick survey leads us to XioMessenger. When cloning code from https://github.com/linuxbox2/linuxbox-ceph , we observe multiple branches associated with it. Can we know which one is working and we can use for

[ceph-users] PG Stuck active+undersized+degraded+inconsistent

2016-03-29 Thread Calvin Morrow
Ceph cluster with 60 OSDs, Giant 0.87.2. One of the OSDs failed due to a hardware error, however after normal recovery it seems stuck with one active+undersized+degraded+inconsistent pg. I haven't been able to get repair to happen using "ceph pg repair 12.28a"; I can see the activity logged in t

Re: [ceph-users] Radosgw (civetweb) hangs once around 850 established connections

2016-03-29 Thread seapasu...@uchicago.edu
So an update for anyone else having this issue. It looks like radosgw either has a memory leak or it spools the whole object into ram or something. root@kh11-9:/etc/apt/sources.list.d# free -m total used free sharedbuffers cached Mem: 64397 63775

Re: [ceph-users] Dump Historic Ops Breakdown

2016-03-29 Thread Gregory Farnum
Been a while, but... On Thu, Feb 25, 2016 at 9:50 AM, Nick Fisk wrote: > I'm just trying to understand the steps each IO goes through and have been > looking at the output dump historic ops command from the admin socket. > There's a couple of steps I'm not quite sure what they mean and also > sli

Re: [ceph-users] Redirect snapshot COW to alternative pool

2016-03-29 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Gregory Farnum > Sent: 29 March 2016 18:52 > To: Nick Fisk > Cc: Ceph Users > Subject: Re: [ceph-users] Redirect snapshot COW to alternative pool > > On Sat, Mar 26, 2016 at 3:13 PM, Nick Fi

Re: [ceph-users] Dump Historic Ops Breakdown

2016-03-29 Thread Nick Fisk
> Been a while, but... Brilliant, just what I needed to know. Thanks for the confirmation/answers. > > On Thu, Feb 25, 2016 at 9:50 AM, Nick Fisk wrote: > > I'm just trying to understand the steps each IO goes through and have > > been looking at the output dump historic ops command from the ad

Re: [ceph-users] Redirect snapshot COW to alternative pool

2016-03-29 Thread Jason Dillaman
> I think this is where I see slow performance. If you are doing large IO, then > copying 4MB objects (assuming defaults) is maybe only 2x times the original > IO to the disk. However if you are doing smaller IO from what I can see a > single 4kb write would lead to a 4MB object being copied to the

Re: [ceph-users] Scrubbing a lot

2016-03-29 Thread German Anders
I've just upgrade to *jewel*, and the scrubbing seems to been corrected... but now I'm not able to map an rbd on a host (before I was able to), basically I'm getting this error msg: *rbd: sysfs write failedrbd: map failed: (5) Input/output error* # rbd --cluster cephIB create host01 --size 10240

Re: [ceph-users] Scrubbing a lot

2016-03-29 Thread Samuel Just
Sounds like a version/compatibility thing. Are your rbd clients really old? -Sam On Tue, Mar 29, 2016 at 1:19 PM, German Anders wrote: > I've just upgrade to jewel, and the scrubbing seems to been corrected... but > now I'm not able to map an rbd on a host (before I was able to), basically > I'm

Re: [ceph-users] Scrubbing a lot

2016-03-29 Thread Samuel Just
Or you needed to run it as root? -Sam On Tue, Mar 29, 2016 at 1:24 PM, Samuel Just wrote: > Sounds like a version/compatibility thing. Are your rbd clients really old? > -Sam > > On Tue, Mar 29, 2016 at 1:19 PM, German Anders wrote: >> I've just upgrade to jewel, and the scrubbing seems to been

Re: [ceph-users] Scrubbing a lot

2016-03-29 Thread German Anders
On the host: # ceph --cluster cephIB --version *ceph version 10.1.0* (96ae8bd25f31862dbd5302f304ebf8bf1166aba6) # rbd --version *ceph version 10.1.0* (96ae8bd25f31862dbd5302f304ebf8bf1166aba6) If I run the command without root or sudo the command failed with a Permission denied *German* 2016-0

Re: [ceph-users] Scrubbing a lot

2016-03-29 Thread Samuel Just
What's the kernel version? -Sam On Tue, Mar 29, 2016 at 1:33 PM, German Anders wrote: > On the host: > > # ceph --cluster cephIB --version > ceph version 10.1.0 (96ae8bd25f31862dbd5302f304ebf8bf1166aba6) > > # rbd --version > ceph version 10.1.0 (96ae8bd25f31862dbd5302f304ebf8bf1166aba6) > > If I

Re: [ceph-users] Scrubbing a lot

2016-03-29 Thread Jason Dillaman
Under Jewel, newly created images default to features that are not currently compatible with krbd. If you run 'rbd --cluster cephIB info host01 --pool cinder-volumes', what features do you see? If you see more than layering, you need to disable them via the 'rbd feature disable' command. [1]

Re: [ceph-users] Scrubbing a lot

2016-03-29 Thread German Anders
# rbd --cluster cephIB info e60host02 --pool cinder-volumes rbd image 'e60host02': size 102400 MB in 25600 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.5ef1238e1f29 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten flags:

Re: [ceph-users] Scrubbing a lot

2016-03-29 Thread Stefan Lissmats
I agrree. I ran in to the same issue and the error massage is not that clear. Mapping with the kernel rbd client (rbd map) needs a quite new kernel to handle the new image format. The work-around is to use - - image-format 1 when creating the image. Originalmeddelande Från:

Re: [ceph-users] Scrubbing a lot

2016-03-29 Thread Jason Dillaman
Running the following should fix your image up for krbd usage: # rbd --cluster cephIB feature disable e60host02 exclusive-lock,object-map,fast-diff,deep-flatten --pool cinder-volumes In the future, you can create krbd-compatible images by adding "--image-feature layering" to the "rbd create" co

Re: [ceph-users] Redirect snapshot COW to alternative pool

2016-03-29 Thread Nick Fisk
> > I think this is where I see slow performance. If you are doing large > > IO, then copying 4MB objects (assuming defaults) is maybe only 2x > > times the original IO to the disk. However if you are doing smaller IO > > from what I can see a single 4kb write would lead to a 4MB object > > being

Re: [ceph-users] Scrubbing a lot

2016-03-29 Thread German Anders
it seems that the image-format option is deprecated: # rbd --id cinder --cluster cephIB create e60host01v2 --size 100G --image-format 1 --pool cinder-volumes -k /etc/ceph/cephIB.client.cinder.keyring rbd: image format 1 is deprecated # rbd --cluster cephIB info e60host01v2 --pool cinder-volu

Re: [ceph-users] Scrubbing a lot

2016-03-29 Thread German Anders
Jason, I try that but the mapping is not working anyway: # rbd --cluster cephIB map e60host02 --pool cinder-volumes -k /etc/ceph/cephIB.client.cinder.keyring rbd: sysfs write failed rbd: map failed: (5) Input/output error *German* 2016-03-29 17:46 GMT-03:00 Jason Dillaman : > Running the follo

Re: [ceph-users] Scrubbing a lot

2016-03-29 Thread Stefan Lissmats
Ok, i also got the warning but was able to use it anyway. Could be blocked in the new release of Jewel. Probably the more correct answer is to use the other answer (to use - - image-features layering) but haven't tried that myself. Skickat från min Samsung-enhet Originalmeddelande

Re: [ceph-users] Scrubbing a lot

2016-03-29 Thread Jason Dillaman
Any Ceph krbd logs in dmesg? Given that this is a Jewel cluster on an older kernel, my blind guess is that your CRUSH map is using newer features not supported by your kernel. If that is the case, there should be a log message. -- Jason Dillaman - Original Message - > From: "Ger

Re: [ceph-users] Scrubbing a lot

2016-03-29 Thread Jason Dillaman
Image format 1 is still supported -- just trying to slowly move users off of it and onto image format 2 through lots of log message nagging. -- Jason Dillaman - Original Message - > From: "Stefan Lissmats" > To: "German Anders" > Cc: "ceph-users" > Sent: Tuesday, March 29, 2016

Re: [ceph-users] Redirect snapshot COW to alternative pool

2016-03-29 Thread Gregory Farnum
On Tue, Mar 29, 2016 at 1:07 PM, Jason Dillaman wrote: >> I think this is where I see slow performance. If you are doing large IO, then >> copying 4MB objects (assuming defaults) is maybe only 2x times the original >> IO to the disk. However if you are doing smaller IO from what I can see a >> sin

[ceph-users] Image format support (Was: Re: Scrubbing a lot)

2016-03-29 Thread Christian Balzer
Hello, On Tue, 29 Mar 2016 18:15:00 -0400 (EDT) Jason Dillaman wrote: > Image format 1 is still supported -- just trying to slowly move users > off of it and onto image format 2 through lots of log message nagging. > Until there is a "live" migration or something as close to it as possible, do

Re: [ceph-users] Scrubbing a lot

2016-03-29 Thread Somnath Roy
We faced this issue too and figured out it in Jewel the default image creation was with format 2. Not sure if this is a good idea to change the default though as almost all the LTS releases are with older kernel and will face incompatibility issue. Thanks & Regards Somnath -Original Message

Re: [ceph-users] Redirect snapshot COW to alternative pool

2016-03-29 Thread Jason Dillaman
> > > I think this is where I see slow performance. If you are doing large > > > IO, then copying 4MB objects (assuming defaults) is maybe only 2x > > > times the original IO to the disk. However if you are doing smaller IO > > > from what I can see a single 4kb write would lead to a 4MB object > >

Re: [ceph-users] Image format support (Was: Re: Scrubbing a lot)

2016-03-29 Thread Jason Dillaman
Very good points w.r.t. easing migration. There is no plan to remove support for image format 1 -- it's more about being able to concentrate our attention to format 2 and to nudge users towards creating new images in format 2. -- Jason Dillaman - Original Message - > From: "Christia

Re: [ceph-users] Scrubbing a lot

2016-03-29 Thread Jason Dillaman
Understood -- format 2 was promoted to the default image format starting with Infernalis (which not all users would have played with since it isn't LTS). The defaults can be overridden via the command-line when creating new images or via the Ceph configuration file. I'll let Ilya provide input

Re: [ceph-users] Redirect snapshot COW to alternative pool

2016-03-29 Thread Jason Dillaman
> >> From reading the RBD layering docs it looked like you could also specify a > >> different object size for the target. If there was some way that the > >> snapshot could have a different object size or some sort of dirty bitmap, > >> then this would reduce the amount of data that would have to

Re: [ceph-users] Image format support (Was: Re: Scrubbing a lot)

2016-03-29 Thread Christian Balzer
On Tue, 29 Mar 2016 20:45:38 -0400 (EDT) Jason Dillaman wrote: > Very good points w.r.t. easing migration. There is no plan to remove > support for image format 1 -- it's more about being able to concentrate > our attention to format 2 and to nudge users towards creating new images > in format 2.

Re: [ceph-users] Ceph upgrade questions

2016-03-29 Thread Christian Balzer
Hello, On Tue, 29 Mar 2016 13:47:04 -0400 Shain Miley wrote: > Hello all, > > We are currently running ceph version 0.80.11 on Ubuntu 12.04.5 LTS. > Used to be 0.80.10/11 on Debian Jessie here. > I would like to upgrade the cluster so that we can stay a little more > current on the releases

[ceph-users] an osd which reweight is 0.0 in crushmap has high latency in osd perf

2016-03-29 Thread lin zhou
Hi,cephers some osd has high latency in theoutput of ceph osd perf,but I have setting the reweight of these osd in crushmap tp 0.0 and I use iostat to check these disk,no load. so how does command `ceph osd perf` work? root@node-67:~# ceph osd perf osdid fs_commit_latency(ms) fs_apply_latency(ms

Re: [ceph-users] how to re-add a deleted osd device as a osd with data

2016-03-29 Thread lin zhou
2016-03-29 14:50 GMT+08:00 Christian Balzer : > > Hello, > > On Tue, 29 Mar 2016 14:00:44 +0800 lin zhou wrote: > >> Hi,Christian. >> When I re-add these OSD(0,3,9,12,15),the high latency occur again.the >> default reweight of these OSD is 0.0 >> > That makes no sense, at a crush weight (not reweig

Re: [ceph-users] an osd which reweight is 0.0 in crushmap has high latency in osd perf

2016-03-29 Thread lin zhou
some info update: disk are 3TB SATA,Model Number: WDC WD3000FYYZ-01UL1B1 and today,I try to set osd.0 reweight to 0.1;and then check.some useful data found. avg-cpu: %user %nice %system %iowait %steal %idle 1.630.000.48 16.150.00 81.75 Device: rrqm/

Re: [ceph-users] how to re-add a deleted osd device as a osd with data

2016-03-29 Thread lin zhou
maybe I found the problerm: smartctl -a /dev/sda | grep Media_Wearout_Indicator 233 Media_Wearout_Indicator 0x0032 001 001 000Old_age Always root@node-65:~# fio -direct=1 -bs=4k -ramp_time=40 -runtime=100 -size=20g -filename=./testfio.file -ioengine=libaio -iodepth=8 -norandommap -ran

Re: [ceph-users] how to re-add a deleted osd device as a osd with data

2016-03-29 Thread Christian Balzer
Hello, On Wed, 30 Mar 2016 12:19:57 +0800 lin zhou wrote: > 2016-03-29 14:50 GMT+08:00 Christian Balzer : > > > > Hello, > > > > On Tue, 29 Mar 2016 14:00:44 +0800 lin zhou wrote: > > > >> Hi,Christian. > >> When I re-add these OSD(0,3,9,12,15),the high latency occur again.the > >> default rewei

Re: [ceph-users] how to re-add a deleted osd device as a osd with data

2016-03-29 Thread Christian Balzer
Hello, On Wed, 30 Mar 2016 13:50:17 +0800 lin zhou wrote: > maybe I found the problerm: > > smartctl -a /dev/sda | grep Media_Wearout_Indicator > 233 Media_Wearout_Indicator 0x0032 001 001 000Old_age Always > Exactly what I thought it would be. See my previous mail. Christian >

Re: [ceph-users] Ceph stopped self repair.

2016-03-29 Thread Dan Moses
Any suggestions how to clean up ceph errors that don't autocorrect? All these counters haven't moved in 2 hours now. HEALTH_WARN 93 pgs degraded; 93 pgs stuck degraded; 113 pgs stuck unclean; 93 pgs stuck undersized; 93 pgs undersized; too many PGs per OSD (472 > max 300); mon.0 low disk space