Re: [ceph-users] Tip for erasure code profile?

2019-05-03 Thread Maged Mokhtar
On 03/05/2019 17:45, Robert Sander wrote: Hi, I would be glad if anybody could give me a tip for an erasure code profile and an associated crush ruleset. The cluster spans 2 rooms with each room containing 6 hosts and each host has 12 to 16 OSDs. The failure domain would be the room level,

Re: [ceph-users] Nautilus (14.2.0) OSDs crashing at startup after removing a pool containing a PG with an unrepairable error

2019-05-03 Thread Dietmar Rieder
Hi, to answer my question and for the record: It turned out that the "device_health_metrics" pool was using PG 7.0 which had no objects after removing the pool but the PG was somehow not deleted/removed. [root@cephosd-05 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-98 --pgid 7.0

Re: [ceph-users] Ceph cluster available to clients with 2 different VLANs ?

2019-05-03 Thread solarflow99
How is this better than using a single public network, routing through a L3 switch? If I understand the scenario right, this way would require the switch to be a trunk port containing all the public vlans, and you can bridge directly through the switch so L3 wouldn't be necessary? On Fri, May

[ceph-users] RGW BEAST mimic backport dont show customer IP

2019-05-03 Thread EDH - Manuel Rios Fernandez
Hi Folks, We migrated our RGW from Citeweb to Beast as frontend backport to mimic, the performance is impressive compared with the old one. But. in ceph logs don't show client peer IP, checked with debug rgw = 1 and 2. Checked the documentation in ceph don't tell us much more. How

Re: [ceph-users] Nautilus (14.2.0) OSDs crashing at startup after removing a pool containing a PG with an unrepairable error

2019-05-03 Thread Dietmar Rieder
HI, I think I just hit the sam problem on Nautilus 14.2.1 I tested the ceph device monitoring, which created a new pool (device_health_metrics), after looking into the monitoring feature, I turned it off again and removed the pool. This resulted int 3 OSDs down which can not be started again

Re: [ceph-users] Tip for erasure code profile?

2019-05-03 Thread Feng Zhang
Will m=6 cause huge CPU usage? Best, Feng On Fri, May 3, 2019 at 11:57 AM Ashley Merrick wrote: > > I may be wrong, but your correct with your m=6 statement. > > Your need atleast K amount of shards available. If you had k=8 and m=2 > equally across 2 rooms (5 each), a faidlure in either room

Re: [ceph-users] Ceph cluster available to clients with 2 different VLANs ?

2019-05-03 Thread EDH - Manuel Rios Fernandez
You can put multiple networks in ceph.conf with commas public network = 172.16.2.0/24, 192.168.0/22 But remember your servers must be able to reach it. L3 , FW needed. Regards Manuel De: ceph-users En nombre de Martin Verges Enviado el: viernes, 3 de mayo de 2019 11:36 Para:

Re: [ceph-users] Tip for erasure code profile?

2019-05-03 Thread Igor Podlesny
On Fri, 3 May 2019 at 22:46, Robert Sander wrote: > The cluster spans 2 rooms ... > The failure domain would be the room level ... > Is that even possible with erasure coding? Sure deal but you'd need slightly more rooms then. For e. g., minimal EC(2, 1) means (2 + 1) rooms. -- End of message.

Re: [ceph-users] Unexplainable high memory usage OSD with BlueStore

2019-05-03 Thread Igor Podlesny
On Fri, 3 May 2019 at 21:39, Mark Nelson wrote: [...] > > [osd] > > ... > > bluestore_allocator = bitmap > > bluefs_allocator = bitmap > > > > I would restart the nodes one by one and see, what happens. > > If you are using 12.2.11 you likely still have the old bitmap allocator Would those

[ceph-users] radosgw daemons constantly reading default.rgw.log pool

2019-05-03 Thread Vladimir Brik
Hello I have set up rados gateway using "ceph-deploy rgw create" (default pools, 3 machines acting as gateways) on Ceph 13.2.5. For over 2 weeks now, the three rados gateways have been generating constant ~30MB/s 4K ops/s of read i/o on default.rgw.log even though nothing is using the rados

[ceph-users] CRUSH rule device classes mystery

2019-05-03 Thread Stefan Kooman
Hi List, I'm playing around with CRUSH rules and device classes and I'm puzzled if it's working correctly. Platform specifics: Ubuntu Bionic with Ceph 14.2.1 I created two new device classes "cheaphdd" and "fasthdd". I made sure these device classes are applied to the right OSDs and that the

Re: [ceph-users] Tip for erasure code profile?

2019-05-03 Thread Ashley Merrick
I may be wrong, but your correct with your m=6 statement. Your need atleast K amount of shards available. If you had k=8 and m=2 equally across 2 rooms (5 each), a faidlure in either room would cause an outrage. With M=6 your atleast getting better disk space availability than 3 replication. But

[ceph-users] Tip for erasure code profile?

2019-05-03 Thread Robert Sander
Hi, I would be glad if anybody could give me a tip for an erasure code profile and an associated crush ruleset. The cluster spans 2 rooms with each room containing 6 hosts and each host has 12 to 16 OSDs. The failure domain would be the room level, i.e. data should survive if one of the rooms

Re: [ceph-users] Unexplainable high memory usage OSD with BlueStore

2019-05-03 Thread Mark Nelson
On 5/3/19 1:38 AM, Denny Fuchs wrote: hi, I never recognized the Debian /etc/default/ceph :-) = # Increase tcmalloc cache size TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728 that is, what is active now. Yep, if you profile the OSD under a small write workload you can see

Re: [ceph-users] RGW Bucket unable to list buckets 100TB bucket

2019-05-03 Thread Matt Benjamin
I think I would not override the default value for "rgw list buckets max chunk", I have no experience doing that, though I can see why it might be plausible. Matt On Fri, May 3, 2019 at 9:39 AM EDH - Manuel Rios Fernandez wrote: > > From changes right know we got some other errors... > >

Re: [ceph-users] RGW Bucket unable to list buckets 100TB bucket

2019-05-03 Thread EDH - Manuel Rios Fernandez
From changes right know we got some other errors... 2019-05-03 15:37:28.604 7f499a2e8700 1 == starting new request req=0x55f326692970 = 2019-05-03 15:37:28.604 7f499a2e8700 2 req 23651:0s::GET

Re: [ceph-users] obj_size_info_mismatch error handling

2019-05-03 Thread Reed Dier
Just to follow up for the sake of the mailing list, I had not had a chance to attempt your steps yet, but things appear to have worked themselves out on their own. Both scrub errors cleared without intervention, and I'm not sure if it is the results of that object getting touched in CephFS

Re: [ceph-users] RGW Bucket unable to list buckets 100TB bucket

2019-05-03 Thread EDH - Manuel Rios Fernandez
Hi Matt, Thanks for your help, We have done the changes plus a reboot of MONs and RGW they look like strange stucked , now we're able to list 250 directories. time s3cmd ls s3://datos101 --no-ssl --limit 150 real2m50.854s user0m0.147s sys 0m0.042s Is there any recommendation of

Re: [ceph-users] RGW Bucket unable to list buckets 100TB bucket

2019-05-03 Thread Matt Benjamin
Hi Folks, Thanks for sharing your ceph.conf along with the behavior. There are some odd things there. 1. rgw_num_rados_handles is deprecated--it should be 1 (the default), but changing it may require you to check and retune the values for objecter_inflight_ops and objecter_inflight_op_bytes to

Re: [ceph-users] ceph-volume activate runs infinitely

2019-05-03 Thread Robert Sander
Hi, On 02.05.19 15:20, Alfredo Deza wrote: > stderr: Job for ceph-osd@21.service canceled. > > Do you have output on the osd12 logs at /var/log/ceph ? Unfortunately the customer has setup a central logging without local fallback. Rsyslogd was not running yet and the Ceph OSDs where

Re: [ceph-users] Restricting access to RadosGW/S3 buckets

2019-05-03 Thread Janne Johansson
Den tors 2 maj 2019 kl 23:41 skrev Vladimir Brik < vladimir.b...@icecube.wisc.edu>: > Hello > I am trying to figure out a way to restrict access to S3 buckets. Is it > possible to create a RadosGW user that can only access specific bucket(s)? > You can have a user with very small bucket/bytes

Re: [ceph-users] rbd ssd pool for (windows) vms

2019-05-03 Thread Janne Johansson
Den ons 1 maj 2019 kl 23:00 skrev Marc Roos : > Do you need to tell the vm's that they are on a ssd rbd pool? Or does > ceph and the libvirt drivers do this automatically for you? > When testing a nutanix acropolis virtual install, I had to 'cheat' it by > adding this > > To make the installer

Re: [ceph-users] getting pg inconsistent periodly

2019-05-03 Thread Hervé Ballans
Le 24/04/2019 à 10:06, Janne Johansson a écrit : Den ons 24 apr. 2019 kl 08:46 skrev Zhenshi Zhou >: Hi, I'm running a cluster for a period of time. I find the cluster usually run into unhealthy state recently. With 'ceph health detail', one or

Re: [ceph-users] Ceph Multi Mds Trim Log Slow

2019-05-03 Thread Lars Täuber
Hi, I'm still new to ceph. Here are similar problems with CephFS. ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable) on Debian GNU/Linux buster/sid # ceph health detail HEALTH_WARN 1 MDSs report slow requests; 1 MDSs behind on trimming MDS_SLOW_REQUEST 1 MDSs

Re: [ceph-users] Ceph cluster available to clients with 2 different VLANs ?

2019-05-03 Thread Martin Verges
Hello, configure a gateway on your router or use a good rack switch that can provide such features and use layer3 routing to connect different vlans / ip zones. -- Martin Verges Managing director Mobile: +49 174 9335695 E-Mail: martin.ver...@croit.io Chat: https://t.me/MartinVerges croit GmbH,

[ceph-users] RGW Bucket unable to list buckets 100TB bucket

2019-05-03 Thread EDH - Manuel Rios Fernandez
Hi, We got a ceph deployment 13.2.5 version, but several bucket with millions of files. services: mon: 3 daemons, quorum CEPH001,CEPH002,CEPH003 mgr: CEPH001(active) osd: 106 osds: 106 up, 106 in rgw: 2 daemons active data: pools: 17 pools, 7120 pgs

Re: [ceph-users] RGW Beast frontend and ipv6 options

2019-05-03 Thread Wido den Hollander
On 5/2/19 4:08 PM, Daniel Gryniewicz wrote: > Based on past experience with this issue in other projects, I would > propose this: > > 1. By default (rgw frontends=beast), we should bind to both IPv4 and > IPv6, if available. > > 2. Just specifying port (rgw frontends=beast port=8000) should

[ceph-users] Ceph cluster available to clients with 2 different VLANs ?

2019-05-03 Thread Hervé Ballans
Hi all, I have a Ceph cluster on Luminous 12.2.10 with 3 mon and 6 osd servers. My current network settings is a separated public and cluster (private IP) network. I would like my cluster available to clients on another VLAN than the default one (which is the public network on ceph.conf)

Re: [ceph-users] Unexplainable high memory usage OSD with BlueStore

2019-05-03 Thread Igor Podlesny
On Fri, 3 May 2019 at 13:38, Denny Fuchs wrote: [...] > If I understand correct: I should try to set bitmap allocator That's among one of the options I mentioned. Another one was to try using jemalloc (re-read my emails). > [osd] > ... > bluestore_allocator = bitmap > bluefs_allocator = bitmap

Re: [ceph-users] Unexplainable high memory usage OSD with BlueStore

2019-05-03 Thread Denny Fuchs
hi, I never recognized the Debian /etc/default/ceph :-) = # Increase tcmalloc cache size TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728 that is, what is active now. Huge pages: # cat /sys/kernel/mm/transparent_hugepage/enabled always [madvise] never # dpkg -S