[ceph-users] Erasure Pool OSD fail

2017-10-24 Thread Jorge Pinilla López
I am testing erasure code pools and doing a rados test write to try fault tolerace. I have 3 Nodes with 1 OSD each, K=2 M=1. While performing the write (rados bench -p replicate 100 write), I stop one of the OSDs daemons (example osd.0), simulating a node fail, and then the hole write stops and I

Re: [ceph-users] Erasure Pool OSD fail

2017-10-24 Thread Jorge Pinilla López
Okay I think I can respond myself, the pool is created with a default min_size of 3, so when one of the OSDs goes down, the pool doenst perform any IO, manually changing the the pool min_size to 2 worked great. El 24/10/2017 a las 10:13, Jorge Pinilla López escribió: > I am testing erasure code p

[ceph-users] [luminous]OSD memory usage increase when writing a lot of data to cluster

2017-10-24 Thread shadow_lin
Hi All, The cluster has 24 osd with 24 8TB hdd. Each osd server has 2GB ram and runs 2OSD with 2 8TBHDD. I know the memory is below the remmanded value, but this osd server is an ARM server so I can't do anything to add more ram. I created a replicated(2 rep) pool and an 20TB image and mounted

Re: [ceph-users] [luminous]OSD memory usage increase when writing a lot of data to cluster

2017-10-24 Thread Denes Dolhay
Hi, There was a thread about this a not long ago, please check: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021676.html Denes. On 10/24/2017 11:48 AM, shadow_lin wrote: Hi All, The cluster has 24 osd with 24 8TB hdd. Each osd server has 2GB ram and runs 2OSD with 2 8TBHDD

[ceph-users] Lots of reads on default.rgw.usage pool

2017-10-24 Thread Mark Schouten
Hi, Since I upgraded to Luminous last week, I see a lot of read-activity on the default.rgw.usage pool. (See attached image). I think it has something to with the rgw-daemons, since restarting them slows the reads down for a while. It might also have to do with tenants and the fact that dynami

Re: [ceph-users] Erasure code profile

2017-10-24 Thread Ronny Aasen
yes you can. but just like a raid5 array with a lost disk, it is not a comfortable way to run your cluster for any significant time. you also get performance degradations. having a warning active all the time makes it harder to detect new issues, and such. One becomes numb to the warning allwa

Re: [ceph-users] Lots of reads on default.rgw.usage pool

2017-10-24 Thread Mark Schouten
Stracing the radosgw-process, I see a lot of the following: [pid 12364] sendmsg(23, {msg_name(0)=NULL, msg_iov(5)=[{"\7{\340\r\0\0\0\0\0P\200\16\0\0\0\0\0*\0?\0\10\0\331\0\0\0\0\0\0\0M"..., 54}, {"\1\1\22\0\0\0\1\10\0\0\0\0\0\0\0\0\0\0\0\377\377\377\377\377\20\226\206\351\v3\0\0"..., 217}, {

Re: [ceph-users] Erasure Pool OSD fail

2017-10-24 Thread Eino Tuominen
Hello, Correct me if I'm wrong, but isn't your configuration just twice as bad as running with replication size=2? With replication size=2 when you lose a disk you lose data if there is even one defect block found when ceph is reconstructing the pgs that had a replica on the failed disk. No, w

Re: [ceph-users] [luminous]OSD memory usage increase when writing a lot of data to cluster

2017-10-24 Thread Sage Weil
On Tue, 24 Oct 2017, shadow_lin wrote: > BLOCKQUOTE{margin-Top: 0px; margin-Bottom: 0px; margin-Left: 2em} body > {border-width:0;margin:0} img {border:0;margin:0;padding:0} Hi All, > The cluster has 24 osd with 24 8TB hdd. > Each osd server has 2GB ram and runs 2OSD with 2 8TBHDD. I know the memor

[ceph-users] 回复: Re: [luminous]OSD memory usage increase when writing a lot of data to cluster

2017-10-24 Thread shadow_lin
Hi Sage, When will 12.2.2 be released? 2017-10-24 lin.yunfan 发件人:Sage Weil 发送时间:2017-10-24 20:03 主题:Re: [ceph-users] [luminous]OSD memory usage increase when writing a lot of data to cluster 收件人:"shadow_lin" 抄送:"ceph-users" On Tue, 24 Oct 2017, shadow_lin wrote: > BLOCKQUOTE{margin-Top: 0

Re: [ceph-users] Speeding up garbage collection in RGW

2017-10-24 Thread David Turner
As I'm looking into this more and more, I'm realizing how big of a problem garbage collection has been in our clusters. The biggest cluster has over 1 billion objects in its gc list (the command is still running, it just recently passed by the 1B mark). Does anyone have any guidance on what to do

[ceph-users] MDS damaged

2017-10-24 Thread Daniel Davidson
Our ceph system is having a problem. A few days a go we had a pg that was marked as inconsistent, and today I fixed it with a: #ceph pg repair 1.37c then a file was stuck as missing so I did a: #ceph pg 1.37c mark_unfound_lost delete pg has 1 objects unfound and apparently lost marking That

Re: [ceph-users] luminous ubuntu 16.04 HWE (4.10 kernel). ceph-disk can't prepare a disk

2017-10-24 Thread Webert de Souza Lima
When you do umount the device, the raised error is still the same? Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* On Mon, Oct 23, 2017 at 4:46 AM, Wido den Hollander wrote: > > > Op 22 oktober 2017 om 18:45 schreef Sean Sullivan : > > > > > > On freshly insta

Re: [ceph-users] Erasure code profile

2017-10-24 Thread Karun Josy
Thank you for your reply. I am finding it confusing to understand the profile structure. Consider a cluster of 8 OSD servers with 3 disks on each server. If I use a profile setting of k=5, m=3 and ruleset-failure-domain=host ; Encoding Rate (r) : r = k / n , where n = k+m = 5/8 = 0.625 Storage

Re: [ceph-users] Erasure code profile

2017-10-24 Thread Oliver Humpage
> Consider a cluster of 8 OSD servers with 3 disks on each server. > > If I use a profile setting of k=5, m=3 and ruleset-failure-domain=host ; > > As far as I understand it can tolerate failure of 3 OSDs and 1 host, am I > right ? When setting up your pool, you specify a crush map which say

Re: [ceph-users] Speeding up garbage collection in RGW

2017-10-24 Thread Ben Hines
I agree the settings are rather confusing. We also have many millions of objects and had this trouble, so i set these rather aggressive gc settings on our cluster which result in gc almost always running. We also use lifecycles to expire objects. rgw lifecycle work time = 00:01-23:59 rgw gc max ob

Re: [ceph-users] Speeding up garbage collection in RGW

2017-10-24 Thread David Turner
Thank you so much for chiming in, Ben. Can you explain what each setting value means? I believe I understand min wait, that's just how long to wait before allowing the object to be cleaned up. gc max objs is how many will be cleaned up during each period? gc processor period is how often it will

Re: [ceph-users] Reported bucket size incorrect (Luminous)

2017-10-24 Thread Christian Wuerdig
What version of Ceph are you using? There were a few bugs leaving behind orphaned objects (e.g. http://tracker.ceph.com/issues/18331 and http://tracker.ceph.com/issues/10295). If that's your problem then there is a tool for finding these objects so you can then manually delete them - have a google

Re: [ceph-users] Erasure Pool OSD fail

2017-10-24 Thread Jorge Pinilla López
well, you should use M > 1, the more you have, less risk and more performance. You don't read twice as much data, you read it from different sources, further more you can even read less data and have to rebuild it, because on erasure pools you don't replicate the data. On the other hand, the con

Re: [ceph-users] Infinite degraded objects

2017-10-24 Thread Christian Wuerdig
>From which version of ceph to which other version of ceph did you upgrade? Can you provide logs from crashing OSDs? The degraded object percentage being larger than 100% has been reported before (https://www.spinics.net/lists/ceph-users/msg39519.html) and looks like it's been fixed a week or so ag

Re: [ceph-users] Erasure code profile

2017-10-24 Thread jorpilo
That's a pretty hard question, I don't think it would speed writes so much because you end writing the same amount of data but I think on a 4+4 setup re-building or serving data while a node is down will go faster and will use less resources because it has to rebuild a smallers chunks of data. A

Re: [ceph-users] MDS damaged

2017-10-24 Thread Daniel Davidson
Out of desperation, I started with the disaster recovery guide: http://docs.ceph.com/docs/jewel/cephfs/disaster-recovery/ After exporting the journal, I started doing: cephfs-journal-tool event recover_dentries summary And that was about 7 hours ago, and it is still running.  I am getting a l

[ceph-users] Bluestore with SSD-backed DBs; what if the SSD fails?

2017-10-24 Thread Christian Sarrasin
I'm planning to migrate an existing Filestore cluster with (SATA) SSD-based journals fronting multiple HDD-hosted OSDs - should be a common enough setup. So I've been trying to parse various contributions here and Ceph devs' blog posts (for which, thanks!) Seems the best way to repurpose that har

[ceph-users] pg inconsistent and repair doesn't work

2017-10-24 Thread Wei Jin
Hi, list, We ran into pg deep scrub error. And we tried to repair it by `ceph pg repair pgid`. But it didn't work. We also verified object files, and found both 3 replicas were zero size. What's the problem, whether it is a bug? And how to fix the inconsistent? I haven't restarted the osds so far

Re: [ceph-users] MDS damaged

2017-10-24 Thread Daniel Davidson
This finally finished: 2017-10-24 22:50:11.766519 7f775e539bc0  1 scavenge_dentries: frag 607. is corrupt, overwriting Events by type:   OPEN: 5640344   SESSION: 10   SUBTREEMAP: 8070   UPDATE: 1384964 Errors: 0 I truncated: #cephfs-journal-tool journal reset old journal was 6255163020

Re: [ceph-users] Erasure Pool OSD fail

2017-10-24 Thread Eino Tuominen
Hi, Yes, I realised that you are correct in that it's not twice as bad, it's just as bad. I did a trivial error when doing the math in my head which made the this case of erasure coding look worse than it is. But, I still hold on to my previous statement: with m=1 you will lose data, it mus