Re: [ceph-users] Bluestore increased disk usage

2019-02-10 Thread Jakub Jaszewski
Hi Yenya, I guess Ceph adds the size of all your data.db devices to the cluster total used space. Regards, Jakub pt., 8 lut 2019, 10:11: Jan Kasprzak napisał(a): > Hello, ceph users, > > I moved my cluster to bluestore (Ceph Mimic), and now I see the increased > disk usage. From

Re: [ceph-users] Migrating to a dedicated cluster network

2019-01-23 Thread Jakub Jaszewski
Hi Yenya, Can I ask how your cluster looks and why you want to do the network splitting? We used to set up 9-12 OSD nodes (12-16 HDDs each) clusters using 2x10Gb for access and 2x10Gb for cluster network, however, I don't see the reasons to not use just one network for next cluster setup.

Re: [ceph-users] CEPH Cluster Usage Discrepancy

2018-10-20 Thread Jakub Jaszewski
Hi Dan, Did you configure block.wal/block.db as separate devices/partition (osd_scenario: non-collocated or lvm for clusters installed using ceph-ansbile playbooks )? I run Ceph version 13.2.1 with non-collocated data.db and have the same situation - the sum of block.db partitions' size is

Re: [ceph-users] understanding % used in ceph df

2018-10-19 Thread Jakub Jaszewski
Hi, your question is more about MAX AVAIL value I think, see how Ceph calculates it http://docs.ceph.com/docs/luminous/rados/operations/monitoring/#checking-a-cluster-s-usage-stats One OSD getting full makes the pool full as well, so keep on nearfull OSDs reweighting . Jakub 19 paź 2018 16:34

Re: [ceph-users] Fastest way to find raw device from OSD-ID? (osd -> lvm lv -> lvm pv -> disk)

2018-10-08 Thread Jakub Jaszewski
Hi Kevin, Have you tried ceph osd metadata OSDid ? Jakub pon., 8 paź 2018, 19:32 użytkownik Alfredo Deza napisał: > On Mon, Oct 8, 2018 at 6:09 AM Kevin Olbrich wrote: > > > > Hi! > > > > Yes, thank you. At least on one node this works, the other node just > freezes but this might by caused

[ceph-users] commit_latency equals apply_latency on bluestore

2018-10-02 Thread Jakub Jaszewski
Hi Cephers, Hi Gregory, I consider same case like here, commit_latency==apply_latency in ceph osd perf http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-January/024317.html What's the meaning of commit_latency and apply_latency in bluestore OSD setups[? How useful is it when

[ceph-users] Testing cluster throughput - one OSD is always 100% utilized during rados bench write

2018-10-02 Thread Jakub Jaszewski
Hi Cephers, I'm testing cluster throughput before moving to the production. Ceph version 13.2.1 (I'll update to 13.2.2). I run rados bench from 10 cluster nodes and 10 clients in parallel. Just after I call rados command, HDDs behind three OSDs are 100% utilized while others are < 40%. After the

[ceph-users] Ceph Mimic packages not available for Ubuntu Trusty

2018-09-19 Thread Jakub Jaszewski
Hi Cephers, Any plans for Ceph Mimic packages for Ubuntu Trusty? I found only ceph-deploy. https://download.ceph.com/debian-mimic/dists/trusty/main/binary-amd64/ Thanks Jakub ___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] total_used statistic incorrect

2018-09-19 Thread Jakub Jaszewski
Hi, I've recently deployed fresh cluster via ceph-ansible. I've not yet created pools, but storage is used anyway. [root@ceph01 ~]# ceph version ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable) [root@ceph01 ~]# ceph df GLOBAL: SIZEAVAIL RAW USED

Re: [ceph-users] Tons of "cls_rgw.cc:3284: gc_iterate_entries end_key=" records in OSD logs

2018-08-20 Thread Jakub Jaszewski
Issue tracker http://tracker.ceph.com/issues/23801. Still don't know why only particular OSDs write this information to log files. Jakub On Wed, Aug 8, 2018 at 12:02 PM Jakub Jaszewski wrote: > Hi All, exactly the same story today, same 8 OSDs and a lot of garbage > collection o

Re: [ceph-users] Tons of "cls_rgw.cc:3284: gc_iterate_entries end_key=" records in OSD logs

2018-08-08 Thread Jakub Jaszewski
w-admin gc list --include-all|grep oid |wc -l 302357 # Can anyone please explain what is going on ? Thanks! Jakub On Tue, Aug 7, 2018 at 3:03 PM Jakub Jaszewski wrote: > Hi, > > 8 out of 192 OSDs in our cluster (version 12.2.5) write plenty of records > like "cls_rgw.cc:3284:

[ceph-users] Tons of "cls_rgw.cc:3284: gc_iterate_entries end_key=" records in OSD logs

2018-08-07 Thread Jakub Jaszewski
Hi, 8 out of 192 OSDs in our cluster (version 12.2.5) write plenty of records like "cls_rgw.cc:3284: gc_iterate_entries end_key=" to the corresponding log files, e.g. 2018-08-07 04:34:06.000585 7fdd8f012700 0 /build/ceph-12.2.5/src/cls/rgw/cls_rgw.cc:3284: gc_iterate_entries

[ceph-users] Luminous 12.2.5 - crushable RGW

2018-07-16 Thread Jakub Jaszewski
Hi, We run 5 RADOS Gateways on Luminous 12.2.5 as upstream servers in nginx active-active setup, based on keepalived. Cluster is 12x Ceph nodes (16x 10TB OSD(bluestore) per node, 2x 10Gb network link shared by access and cluster networks), RGW pool is EC 9+3. We recently noticed below entries in

Re: [ceph-users] Replicated pool with an even size - has min_size to be bigger than half the size?

2018-03-29 Thread Jakub Jaszewski
On Thu, Mar 29, 2018 at 12:25 PM, Janne Johansson wrote: > > > 2018-03-29 11:50 GMT+02:00 David Rabel : > >> On 29.03.2018 11:43, Janne Johansson wrote: >> > 2018-03-29 11:39 GMT+02:00 David Rabel : >> > >> >> For example a

Re: [ceph-users] PGs stuck activating after adding new OSDs

2018-03-29 Thread Jakub Jaszewski
Hi Jon, can you reweight one OSD to default value and share outcome of "ceph osd df tree; ceph -s; ceph health detail" ? Recently I was adding new node, 12x 4TB, one disk at a time and faced activating+remapped state for few hours. Not sure but maybe that was caused by "osd_max_backfills"

Re: [ceph-users] All pools full after one OSD got OSD_FULL state

2018-03-05 Thread Jakub Jaszewski
One full OSD has caused that all pools got full. Can anyone help me understand this ? During ongoing PGs backfilling I see that MAX AVAIL values are changing when USED values are constant. GLOBAL: SIZE AVAIL RAW USED %RAW USED 425T 145T 279T 65.70 POOLS:

[ceph-users] All pools full after one OSD got OSD_FULL state

2018-03-03 Thread Jakub Jaszewski
Hi Ceph Admins, This night our ceph cluster got all pools 100% full. This happend after osd.56 (95% used) reached OSD_FULL state. ceph versions 12.2.2 Logs 2018-03-03 17:15:22.560710 mon.cephnode01 mon.0 10.212.32.18:6789/0 5224452 : cluster [ERR] overall HEALTH_ERR noscrub,nodeep-scrub

Re: [ceph-users] High apply latency

2018-02-06 Thread Jakub Jaszewski
​Hi Frederic, I've not enable debug level logging on all OSDs, just on one for the test, need to double check that. But looks that merging is ongoing on few OSDs or OSDs are faulty, I will dig into that tomorrow. Write bandwidth is very random # rados bench -p default.rgw.buckets.data 120 write

Re: [ceph-users] High apply latency

2018-02-02 Thread Jakub Jaszewski
4304 Bandwidth (MB/sec): 2033.82 Average IOPS: 508 Stddev IOPS: 20 Max IOPS: 544 Min IOPS: 484 Average Latency(s): 0.0307879 Max latency(s): 1.3466 Min latency(s): 0.00688148 # Regards Jakub On Thu, Feb 1, 2018 at 3:33 PM, Jakub Jasze

Re: [ceph-users] High apply latency

2018-02-01 Thread Jakub Jaszewski
Regarding split & merge, I have default values filestore_merge_threshold = 10 filestore_split_multiple = 2 according to https://bugzilla.redhat.com/show_bug.cgi?id=1219974 the recommended values are filestore_merge_threshold = 40 filestore_split_multiple = 8 Is it something that I can easily

Re: [ceph-users] High apply latency

2018-02-01 Thread Jakub Jaszewski
pool set nodeep-scrub". > > On Thursday, February 1, 2018 at 00:10, Jakub Jaszewski wrote: > > 3active+clean+scrubbing+deep > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] High apply latency

2018-01-31 Thread Jakub Jaszewski
he pg_num and pgp_num won't help, > and short term, will make it worse. > > Metadata pools (like default.rgw.buckets.index) really excel in a SSD > pool, even if small. I carved a small OSD in the journal SSDs for > those kinds of workloads. > > On Wed, Jan 31, 2018 at 2:26

Re: [ceph-users] High apply latency

2018-01-31 Thread Jakub Jaszewski
:37 PM, Jakub Jaszewski <jaszewski.ja...@gmail.com> wrote: > ​​ > Hi, > > I'm wondering why slow requests are being reported mainly when the request > has been put into the queue for processing by its PG (queued_for_pg , > http://docs.ceph.com/docs/master/rados/troublesho

Re: [ceph-users] High apply latency

2018-01-31 Thread Jakub Jaszewski
​​ Hi, I'm wondering why slow requests are being reported mainly when the request has been put into the queue for processing by its PG (queued_for_pg , http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/#debugging-slow-request ). Could it be due too low pg_num/pgp_num ?

[ceph-users] High apply latency

2018-01-30 Thread Jakub Jaszewski
Hi, We observe high apply_latency(ms) and poor write performance I believe. In logs there are repetitive slow request warnings related different OSDs and servers. ceph versions 12.2.2 Cluster HW description: 9x Dell PowerEdge R730xd 1x Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (10C/20T) 256 GB

[ceph-users] Ceph with multiple public networks

2017-12-18 Thread Jakub Jaszewski
Hi, We have ceph cluster in version luminous 12.2.2. It has public network and cluster network configured. Cluster provides services for two big groups of clients and some individual clients One group uses RGW and another uses RBD. Ceph's public network and two mentioned groups are located in

Re: [ceph-users] CRUSH rule seems to work fine not for all PGs in erasure coded pools

2017-11-30 Thread Jakub Jaszewski
I've just did ceph upgrade jewel -> luminous and am facing the same case... # EC profile crush-failure-domain=host crush-root=default jerasure-per-chunk-alignment=false k=3 m=2 plugin=jerasure technique=reed_sol_van w=8 5 hosts in the cluster and I run systemctl stop ceph.target on one of them

Re: [ceph-users] CRUSH rule seems to work fine not for all PGs in erasure coded pools

2017-11-28 Thread Jakub Jaszewski
Hi David, thanks for quick feedback. Then why some PGs were remapped and some were not? # LOOKS THAT 338 PGs IN ERASURE CODED POOLS HAVE BEEN REMAPPED # I DONT GET WHY 540 PGs STILL ENCOUNTER active+undersized+degraded STATE root at host01

[ceph-users] CRUSH rule seems to work fine not for all PGs in erasure coded pools

2017-11-28 Thread Jakub Jaszewski
Hi, I'm trying to understand erasure coded pools and why CRUSH rules seem to work for only part of PGs in EC pools. Basically what I'm trying to do is to check erasure coded pool recovering behaviour after the single OSD or single HOST failure. I noticed that in case of HOST failure only part of