Hi Yenya,
I guess Ceph adds the size of all your data.db devices to the cluster
total used space.
Regards,
Jakub
pt., 8 lut 2019, 10:11: Jan Kasprzak napisał(a):
> Hello, ceph users,
>
> I moved my cluster to bluestore (Ceph Mimic), and now I see the increased
> disk usage. From
Hi Yenya,
Can I ask how your cluster looks and why you want to do the network
splitting?
We used to set up 9-12 OSD nodes (12-16 HDDs each) clusters using 2x10Gb
for access and 2x10Gb for cluster network, however, I don't see the reasons
to not use just one network for next cluster setup.
Hi Dan,
Did you configure block.wal/block.db as separate devices/partition
(osd_scenario: non-collocated or lvm for clusters installed using
ceph-ansbile playbooks )?
I run Ceph version 13.2.1 with non-collocated data.db and have the same
situation - the sum of block.db partitions' size is
Hi, your question is more about MAX AVAIL value I think, see how Ceph
calculates it
http://docs.ceph.com/docs/luminous/rados/operations/monitoring/#checking-a-cluster-s-usage-stats
One OSD getting full makes the pool full as well, so keep on nearfull OSDs
reweighting .
Jakub
19 paź 2018 16:34
Hi Kevin,
Have you tried ceph osd metadata OSDid ?
Jakub
pon., 8 paź 2018, 19:32 użytkownik Alfredo Deza napisał:
> On Mon, Oct 8, 2018 at 6:09 AM Kevin Olbrich wrote:
> >
> > Hi!
> >
> > Yes, thank you. At least on one node this works, the other node just
> freezes but this might by caused
Hi Cephers, Hi Gregory,
I consider same case like here, commit_latency==apply_latency in ceph osd
perf
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-January/024317.html
What's the meaning of commit_latency and apply_latency in bluestore OSD
setups[? How useful is it when
Hi Cephers,
I'm testing cluster throughput before moving to the production. Ceph
version 13.2.1 (I'll update to 13.2.2).
I run rados bench from 10 cluster nodes and 10 clients in parallel.
Just after I call rados command, HDDs behind three OSDs are 100% utilized
while others are < 40%. After the
Hi Cephers,
Any plans for Ceph Mimic packages for Ubuntu Trusty? I found only
ceph-deploy.
https://download.ceph.com/debian-mimic/dists/trusty/main/binary-amd64/
Thanks
Jakub
___
ceph-users mailing list
ceph-users@lists.ceph.com
Hi, I've recently deployed fresh cluster via ceph-ansible. I've not yet
created pools, but storage is used anyway.
[root@ceph01 ~]# ceph version
ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic
(stable)
[root@ceph01 ~]# ceph df
GLOBAL:
SIZEAVAIL RAW USED
Issue tracker http://tracker.ceph.com/issues/23801.
Still don't know why only particular OSDs write this information to log
files.
Jakub
On Wed, Aug 8, 2018 at 12:02 PM Jakub Jaszewski
wrote:
> Hi All, exactly the same story today, same 8 OSDs and a lot of garbage
> collection o
w-admin gc list --include-all|grep oid |wc -l
302357
#
Can anyone please explain what is going on ?
Thanks!
Jakub
On Tue, Aug 7, 2018 at 3:03 PM Jakub Jaszewski
wrote:
> Hi,
>
> 8 out of 192 OSDs in our cluster (version 12.2.5) write plenty of records
> like "cls_rgw.cc:3284:
Hi,
8 out of 192 OSDs in our cluster (version 12.2.5) write plenty of records
like "cls_rgw.cc:3284: gc_iterate_entries end_key=" to the corresponding
log files, e.g.
2018-08-07 04:34:06.000585 7fdd8f012700 0
/build/ceph-12.2.5/src/cls/rgw/cls_rgw.cc:3284: gc_iterate_entries
Hi,
We run 5 RADOS Gateways on Luminous 12.2.5 as upstream servers in nginx
active-active setup, based on keepalived.
Cluster is 12x Ceph nodes (16x 10TB OSD(bluestore) per node, 2x 10Gb
network link shared by access and cluster networks), RGW pool is EC 9+3.
We recently noticed below entries in
On Thu, Mar 29, 2018 at 12:25 PM, Janne Johansson
wrote:
>
>
> 2018-03-29 11:50 GMT+02:00 David Rabel :
>
>> On 29.03.2018 11:43, Janne Johansson wrote:
>> > 2018-03-29 11:39 GMT+02:00 David Rabel :
>> >
>> >> For example a
Hi Jon, can you reweight one OSD to default value and share outcome of "ceph
osd df tree; ceph -s; ceph health detail" ?
Recently I was adding new node, 12x 4TB, one disk at a time and faced
activating+remapped state for few hours.
Not sure but maybe that was caused by "osd_max_backfills"
One full OSD has caused that all pools got full. Can anyone help me
understand this ?
During ongoing PGs backfilling I see that MAX AVAIL values are changing
when USED values are constant.
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
425T 145T 279T 65.70
POOLS:
Hi Ceph Admins,
This night our ceph cluster got all pools 100% full. This happend after
osd.56 (95% used) reached OSD_FULL state.
ceph versions 12.2.2
Logs
2018-03-03 17:15:22.560710 mon.cephnode01 mon.0 10.212.32.18:6789/0 5224452
: cluster [ERR] overall HEALTH_ERR noscrub,nodeep-scrub
Hi Frederic,
I've not enable debug level logging on all OSDs, just on one for the test,
need to double check that.
But looks that merging is ongoing on few OSDs or OSDs are faulty, I will
dig into that tomorrow.
Write bandwidth is very random
# rados bench -p default.rgw.buckets.data 120 write
4304
Bandwidth (MB/sec): 2033.82
Average IOPS: 508
Stddev IOPS: 20
Max IOPS: 544
Min IOPS: 484
Average Latency(s): 0.0307879
Max latency(s): 1.3466
Min latency(s): 0.00688148
#
Regards
Jakub
On Thu, Feb 1, 2018 at 3:33 PM, Jakub Jasze
Regarding split & merge, I have default values
filestore_merge_threshold = 10
filestore_split_multiple = 2
according to https://bugzilla.redhat.com/show_bug.cgi?id=1219974 the
recommended values are
filestore_merge_threshold = 40
filestore_split_multiple = 8
Is it something that I can easily
pool set nodeep-scrub".
>
> On Thursday, February 1, 2018 at 00:10, Jakub Jaszewski wrote:
>
> 3active+clean+scrubbing+deep
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
he pg_num and pgp_num won't help,
> and short term, will make it worse.
>
> Metadata pools (like default.rgw.buckets.index) really excel in a SSD
> pool, even if small. I carved a small OSD in the journal SSDs for
> those kinds of workloads.
>
> On Wed, Jan 31, 2018 at 2:26
:37 PM, Jakub Jaszewski <jaszewski.ja...@gmail.com>
wrote:
>
> Hi,
>
> I'm wondering why slow requests are being reported mainly when the request
> has been put into the queue for processing by its PG (queued_for_pg ,
> http://docs.ceph.com/docs/master/rados/troublesho
Hi,
I'm wondering why slow requests are being reported mainly when the request
has been put into the queue for processing by its PG (queued_for_pg ,
http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/#debugging-slow-request
).
Could it be due too low pg_num/pgp_num ?
Hi,
We observe high apply_latency(ms) and poor write performance I believe.
In logs there are repetitive slow request warnings related different OSDs
and servers.
ceph versions 12.2.2
Cluster HW description:
9x Dell PowerEdge R730xd
1x Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (10C/20T)
256 GB
Hi,
We have ceph cluster in version luminous 12.2.2. It has public network and
cluster network configured.
Cluster provides services for two big groups of clients and some individual
clients
One group uses RGW and another uses RBD.
Ceph's public network and two mentioned groups are located in
I've just did ceph upgrade jewel -> luminous and am facing the same case...
# EC profile
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=3
m=2
plugin=jerasure
technique=reed_sol_van
w=8
5 hosts in the cluster and I run systemctl stop ceph.target on one of them
Hi David, thanks for quick feedback.
Then why some PGs were remapped and some were not?
# LOOKS THAT 338 PGs IN ERASURE CODED POOLS HAVE BEEN REMAPPED
# I DONT GET WHY 540 PGs STILL ENCOUNTER active+undersized+degraded STATE
root at host01
Hi, I'm trying to understand erasure coded pools and why CRUSH rules seem
to work for only part of PGs in EC pools.
Basically what I'm trying to do is to check erasure coded pool recovering
behaviour after the single OSD or single HOST failure.
I noticed that in case of HOST failure only part of
29 matches
Mail list logo