from:"胡玮文"

[ceph-users] Re: slow recovery with Quincy

2023-10-10 Thread 胡玮文

Hi Ben, Please see this thread https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/PWHG6QJ6N2TJEYD2U4AXJAJ23CRPJG4E/#7ZMBM23GXYFIGY52ZWJDY5NUSYSDSYL6 for possible workaround. 发自我的 iPad 在 2023年10月10日，22:26，Ben 写道： Dear cephers, with one osd down(200GB/9.1TB data), rebalance

[ceph-users] Re: EC 8+3 Pool PGs stuck in remapped+incomplete

2023-06-17 Thread 胡玮文

Hi Jayanth, Can you post the complete output of “ceph pg query”? So that we can understand the situation better. Can you get OSD 3 or 4 back into the cluster? If you are sure they cannot rejoin, you may try “ceph osd lost ” (doc says this may result in permanent data lost. I didn’t have a

[ceph-users] Re: degraded objects increasing

2023-06-15 Thread 胡玮文

Hi Angelo, From my experience, I guess the objects written to degraded pg is immediately degraded. As the total number of objects is increasing, I think the increase of degraded objects is normal. Weiwen Hu > 在 2023年6月15日，23:40，Angelo Höngens 写道： > > Hey guys, > > I'm trying to understand

[ceph-users] Re: Container image of Pacific latest version

2023-06-11 Thread 胡玮文

It is available at quay.io/ceph/ceph:v16.2.13 > 在 2023年6月11日，16:31，mahnoosh shahidi 写道： > > Hi all, > > It seems the latest Pacific image in the registry is 16.2.11. Is there any > plan to push the latest version of Pacific (16.2.13) in near future? > > Best Regards, > Mahnoosh >

[ceph-users] Re: Newer linux kernel cephfs clients is more trouble?

2023-05-29 Thread 胡玮文

Hi Dan, We also experienced very high network usage and memory pressure with our machine learning workload. This patch [1] (currently testing, may be merged in 6.5) may fix it. See [2] for more about my experiment about this issue. [1]:

[ceph-users] Re: MDS Upgrade from 17.2.5 to 17.2.6 not possible

2023-05-24 Thread 胡玮文

Hi Henning, I think the increasing strays_created is normal. This is a counter that is monotonically increasing when any file is deleted. And is only reset when the MDS is restarted. The num_strays is the actual number of strays in your system, and they are not necessarily reside in memory.

[ceph-users] Re: BlueStore fragmentation woes

2023-05-24 Thread 胡玮文

Hi Hector, Not related to fragmentation. But I see you mentioned CephFS, and your OSDs are at high utilization. Is your pool NEAR FULL? CephFS write performance is severely degraded if the pool is NEAR FULL. Buffered write will be disabled, and every single write() system call needs to wait

[ceph-users] Re: Slow recovery on Quincy

2023-05-16 Thread 胡玮文

Hi Sake, We are experiencing the same. I set “osd_mclock_cost_per_byte_usec_hdd” to 0.1 (default is 2.6) and get about 15 times backfill speed, without significant affect client IO. This parameter seems calculated wrongly, from the description 5e-3 should be a reasonable value for HDD

[ceph-users] Re: OSDs growing beyond full ratio

2022-08-30 Thread 胡玮文

在 2022年8月30日，23:20，Dave Schulz 写道： Is a file in ceph assigned to a specific PG? In my case it seems like a file that's close to the size of a single OSD gets moved from one OSD to the next filling it up and domino-ing around the cluster filling up OSDs. I believe no. Each large file is

[ceph-users] Re: ceph mds dump tree - root inode is not in cache

2022-08-07 Thread 胡玮文

; 发送时间: 2022年8月7日 23:29 收件人: 胡玮文<mailto:huw...@outlook.com> 抄送: ceph-users@ceph.io<mailto:ceph-users@ceph.io> 主题: Re: [ceph-users] Re: ceph mds dump tree - root inode is not in cache Hi Weiwen, please see also my previous 2 posts. There seems to be something wrong when trying to

[ceph-users] Re: ceph mds dump tree - root inode is not in cache

2022-08-07 Thread 胡玮文

ay0} (starting...) > 2022-08-07T14:31:19.900+0200 7f785731d700 1 mds.tceph-03 asok_command: dump > tree {prefix=dump tree,root=~mds0/stray0} (starting...) > > Please let me know if/how I can provide more info. > > Thanks and best regards, > = > Frank Schi

[ceph-users] Re: ceph mds dump tree - root inode is not in cache

2022-08-04 Thread 胡玮文

Hi Frank, I have not experienced this before. Maybe mds.tceph-03 is in standby state? Could you show the output of “ceph fs status”? You can also try “ceph tell mds.0 …” and let ceph find the correct daemon for you. You may also try dumping “~mds0/stray0”. Weiwen Hu > 在

[ceph-users] Re: LibCephFS Python Mount Failure

2022-07-28 Thread 胡玮文

Hi Adam, Have you tried ‘cephfs.LibCephFS(auth_id="monitoring")’? Weiwen Hu > 在 2022年7月27日，20:41，Adam Carrgilson (NBI) 写道： > > I’m still persevering with this, if anyone can assist, I would truly > appreciate it. > > As I said previously, I’ve been able to identify that the error is >

[ceph-users] Re: Generation of systemd units after nuking /etc/systemd/system

2022-06-10 Thread 胡玮文

the mon, mgr and mds units? On Fri, 10 Jun 2022 at 09:06, 胡玮文 mailto:huw...@outlook.com>> wrote: I think “ceph-volume lvm activate —all” should do it. Weiwen Hu > 在 2022年6月10日，14:34，Flemming Frandsen > mailto:dren...@gmail.com>> 写道： > > Hi, this is somewhat emb

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-02-10 Thread 胡玮文

I believe this is the reason. I mean number of OSDs in the “up” set should be at least 1 greater than the min_size for the upgrade to proceed. Or once any OSD is stopped, it can drop below min_size, and prevent the pg from becoming active. So just cleanup the misplaced and the upgrade should

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-02-10 Thread 胡玮文

Hi Zach, How about your min_size setting? Have you checked the number of OSDs in the acting set of every PG is at least 1 greater than the min_size of the corresponding pool? Weiwen Hu > 在 2022年2月10日，05:02，Zach Heise (SSCC) 写道： > > Hello, > > ceph health detail says my 5-node cluster is

[ceph-users] 回复: Ceph 16.2.7 + cephadm, how to reduce logging and trim existing logs?

2022-01-27 Thread 胡玮文

Hi Zakhar, I use this docker config file (/etc/docker/daemon.json) to limit its log size: { "log-driver": "json-file", "log-opts": { "max-size": "100m" } } I only changed the default rocksdb log level with ceph config set global debug_rocksdb 1/5 And ceph has a lot of log level

[ceph-users] Re: cephfs: [ERR] loaded dup inode

2022-01-14 Thread 胡玮文

Hi Frank, I just studied the exact same issue that conda generates a lot of strays. And I created a Python script[1] to trigger reintegration efficiently. This script invokes cephfs python binding and do not rely on the kernel ceph client. And should also bypass your sssd. This script works by

[ceph-users] Re: How to troubleshoot monitor node

2022-01-10 Thread 胡玮文

> 在 2022年1月11日，00:19，Andre Tann 写道： > > Hi Janne, > >> On 10.01.22 16:49, Janne Johansson wrote: >> >> Well, nc would not tell you if a bad (local or remote) firewall >> configuration prevented nc (and ceph -s) from connecting, it would >> give the same results as if the daemon wasn't

[ceph-users] 回复: Filesystem offline after enabling cephadm

2021-12-29 Thread 胡玮文

Hi Javier, It seems the MDS deployed by cephadm is 16.2.5. Please check your “container_image” config (should be quay.io/ceph/ceph:v16.2.7 if you are not running your own registry). Then redeploy the MDS daemons with “ceph orch redeploy ”, where can be found by “ceph orch ls”. I guess it is

[ceph-users] 回复: Re: CephFS single file size limit and performance impact

2021-12-11 Thread 胡玮文

Maybe you can use RBD instead of cephfs to bypass the MDS? You can make your applications to directly read from or write to the RBD block devices. 发件人: huxia...@horebdata.cn 发送时间: Saturday, December 11, 2021 9:11:53 PM 收件人: Yan, Zheng 抄送: ceph-users 主题:

[ceph-users] Re: 16.2.6 Convert Docker to Podman?

2021-12-10 Thread 胡玮文

On Fri, Dec 10, 2021 at 01:12:56AM +0100, Roman Steinhart wrote: > hi, > > recently I had to switch the other way around (from podman to docker). > I just... > - stopped all daemons on a host with "systemctl stop ceph-{uuid}@*" > - purged podman > - triggered a redeploy for every daemon with

[ceph-users] Find high IOPS client

2021-12-06 Thread 胡玮文

Hi all, Sometimes we see high IOPS in our cluster (from “ceph -s”), and the access latency is increased due to high load. Now how can I tell which client is issuing a lot of requests? I want to find out the offending application so that I can make changes to it. We are primarily using cephfs,

[ceph-users] 回复: How data is stored on EC?

2021-12-03 Thread 胡玮文

Hi Istvan, Upper-level applications may chunk data into smaller objects, typically 4M each, e.g. CephFS [1], RBD [2]. However, the max object size enforced by OSD is configurable by osd_max_object_size, which defaults to 128M. So, as of my understanding, your 100MB file will typically be

[ceph-users] Re: Annoying MDS_CLIENT_RECALL Warning

2021-11-22 Thread 胡玮文

onn...@redhat.com> 发送时间: 2021年11月20日 3:20 收件人: 胡玮文<mailto:huw...@outlook.com> 抄送: Dan van der Ster<mailto:d...@vanderster.com>; ceph-users@ceph.io<mailto:ceph-users@ceph.io> 主题: Re: [ceph-users] Re: Annoying MDS_CLIENT_RECALL Warning On Fri, Nov 19, 2021 at 2:14 AM 胡玮文 wrot

[ceph-users] 回复: Re: One pg stuck in active+undersized+degraded after OSD down

2021-11-22 Thread 胡玮文

Hi David, https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-pg/#crush-gives-up-too-soon I think this is the reason. Although the page is describing a erasure coded pool, I think it also applies to replicated pools. You may check that page and try the steps described there.

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-19 Thread 胡玮文

> 在 2021年11月19日，02:51，Marc 写道： > > >> >> We also use containers for ceph and love it. If for some reason we >> couldn't run ceph this way any longer, we would probably migrate >> everything to a different solution. We are absolutely committed to >> containerization. > > I wonder if you are

[ceph-users] 回复: The osd-block* file is gone

2021-11-19 Thread 胡玮文

That one should be automatically created on boot. If not, you should check if your disk is broken / not connected, maybe by checking the kernel logs. You can share the output of `lsblk` and `lvs` 发件人: GHui 发送时间: 2021年11月19日 16:43 收件人: ceph-users

[ceph-users] Re: Annoying MDS_CLIENT_RECALL Warning

2021-11-18 Thread 胡玮文

原件- > 发件人: Patrick Donnelly > 发送时间: 2021年11月19日 9:37 > 收件人: 胡玮文 > 抄送: ceph-users@ceph.io > 主题: Re: [ceph-users] Annoying MDS_CLIENT_RECALL Warning > > On Thu, Nov 18, 2021 at 12:36 AM 胡玮文 wrote: > > > > Hi all, > > > > We are consistently seein

[ceph-users] Re: Annoying MDS_CLIENT_RECALL Warning

2021-11-18 Thread 胡玮文

: "", "xattrs": [], "dirfragtree": { "splits": [] }, "old_inodes": [], "oldest_snap": 18446744073709551614, "damage_flags": 0, "is_auth": true, "auth_state": { "

[ceph-users] Annoying MDS_CLIENT_RECALL Warning

2021-11-17 Thread 胡玮文

Hi all, We are consistently seeing the MDS_CLIENT_RECALL warning in our cluster, it seems harmless, but we cannot get HEALTH_OK, which is annoying. The clients that are reported failing to respond to cache pressure are constantly changing, and most of the time we got 1-5 such clients out of

[ceph-users] Re: how to list ceph file size on ubuntu 20.04

2021-11-16 Thread 胡玮文

There is a rbytes mount option [1]. Besides, you can use “getfattr -n ceph.dir.rbytes /path/in/cephfs” [1]: https://docs.ceph.com/en/latest/man/8/mount.ceph/#advanced Weiwen Hu 在 2021年11月17日，10:26，zxcs 写道： Hi, I want to list cephfs directory size on ubuntu 20.04, but when I use ls -alh

[ceph-users] Re: Fwd: pg inactive+remapped

2021-11-16 Thread 胡玮文

> But my log file for osd 7 is empty Is it deployed by cephadm? If so, you can try “sudo cephadm logs --name osd.7”, which is a wrapper around journalctl. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to

[ceph-users] Re: mons fail as soon as I attempt to mount

2021-11-16 Thread 胡玮文

Hi Jeremy. Since you say the mons fail, could you share the logs of the failing mons? It is hard to diagnostic with little information. 发件人: Jeremy Hansen 发送时间: 2021年11月16日 19:27 收件人: ceph-users 主题: [ceph-users] Re: mons fail as soon as I

[ceph-users] Re: OSDs not starting up

2021-11-13 Thread 胡玮文

Hi Stephen, I think this output you posted is pretty normal, there is no systemd in the container, thus the error. You still need to find the logs. You may try “sudo cephadm logs --name osd.0”. If that still fails, you should try to run the ceph-osd daemon manually. Weiwen Hu 发件人: Stephen J.

[ceph-users] 回复: Pacific: parallel PG reads?

2021-11-11 Thread 胡玮文

Hi Zakhar, If you are using RBD, you may be interested in the striping feature. It works like RAID0 and can read from multiple object at once for sequential read requests. https://docs.ceph.com/en/latest/man/8/rbd/#striping Weiwen Hu 从 Windows

[ceph-users] Re: Question if WAL/block.db partition will benefit us

2021-11-08 Thread 胡玮文

> 在 2021年11月8日，19:08，Boris Behrens 写道： > > And does it make a different to have only a block.db partition or a > block.db and a block.wal partition? I think having only a block.db partition is better if you don’t have 2 separate disks for them. WAL will be placed in the DB partition if you

[ceph-users] 回复: Re: Cluster Health error's status

2021-10-29 Thread 胡玮文

Hi Michel, This “Structure needs cleaning” seems to mean that your file system is not in order, you should try “fsck”. Weiwen Hu 发件人: Michel Niyoyita 发送时间: 2021年10月29日 20:10 收件人: Etienne Menguy 抄送: ceph-users

[ceph-users] 回复: 16.2.6 OSD down, out but container running....

2021-10-27 Thread 胡玮文

ong with this disk? Maybe also do a self-test. 从 Windows 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>发送发件人: Marco Pizzolo<mailto:marcopizz...@gmail.com> 发送时间: 2021年10月28日 1:17 收件人: 胡玮文<mailto:huw...@outlook.com> 抄送: ceph-users<mailto:ceph-users@ceph.io> 主题: Re: [ceph-

[ceph-users] 回复: 16.2.6 OSD down, out but container running....

2021-10-25 Thread 胡玮文

Could you post the logs of the problematic OSDs? E.g.: cephadm logs --name osd.0 发件人: Marco Pizzolo 发送时间: 2021年10月26日 7:15 收件人: ceph-users 主题: [ceph-users] 16.2.6 OSD down, out but container running Hello Everyone, I'm seeing an

[ceph-users] 回复: deep-scrubs not respecting scrub interval (ceph luminous)

2021-10-23 Thread 胡玮文

Hi Mehmet, I think this is expected, if you read the help: # ceph config help osd_scrub_interval_randomize_ratio osd_scrub_interval_randomize_ratio - Ratio of scrub interval to randomly vary (float, advanced) Default: 0.50 Can update at runtime: true See also:

[ceph-users] Re: clients failing to respond to cache pressure (nfs-ganesha)

2021-10-20 Thread 胡玮文

I don’t know if it is related. But we are routinely get warning about 1-4 clients failed to respond to cache pressure. But it seems to be harmless. We are running 16.2.6, 2 active MDSes, over 20 kernel cephfs clients, with the latest 5.11 kernel from Ubuntu. > 在 2021年10月20日，16:36，Marc 写道： >

[ceph-users] Re: Cluster inaccessible

2021-10-10 Thread 胡玮文

in the referenced thread https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/KQ5A5OWRIUEOJBC7VILBGDIKPQGJQIWN/ 发件人: Ben Timby<mailto:bti...@smartfile.com> 发送时间: 2021年10月10日 21:50 收件人: 胡玮文<mailto:huw...@outlook.com> 抄送: ceph-users@ceph.io<mailto:ceph-users@ceph.io> 主

[ceph-users] Re: Cluster inaccessible

2021-10-10 Thread 胡玮文

Hi Ben, This looks a lot like the issue I had. Do you know what caused this crashing? In my case, I issued “ceph mds rmfailed”. Did you also issued this command? Here is what I did to prevent the crash, if I recall it correctly. In gdb, before running the mon: (gdb) b

[ceph-users] Re: MDS not becoming active after migrating to cephadm

2021-10-04 Thread 胡玮文

玮文 mailto:huw...@outlook.com>> wrote: Hi Petr, Please read https://docs.ceph.com/en/latest/cephfs/upgrading/ for MDS upgrade procedure. In short, when upgrading to 16.2.6, you need to disable standby-replay and reduce the number of ranks to 1. Weiwen Hu 从 Windows 版邮件<https://go.micr

[ceph-users] 回复: MDS not becoming active after migrating to cephadm

2021-10-04 Thread 胡玮文

Hi Petr, Please read https://docs.ceph.com/en/latest/cephfs/upgrading/ for MDS upgrade procedure. In short, when upgrading to 16.2.6, you need to disable standby-replay and reduce the number of ranks to 1. Weiwen Hu 从 Windows 版邮件发送发件人: Petr

[ceph-users] Re: ceph-objectstore-tool core dump

2021-10-03 Thread 胡玮文

> 在 2021年10月4日，04:18，Michael Thomas 写道： > > On 10/3/21 12:08, 胡玮文 wrote: >>>> 在 2021年10月4日，00:53，Michael Thomas 写道： >>> >>> I recently started getting inconsistent PGs in my Octopus (15.2.14) ceph >>> cluster. I was able to determine that

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-03 Thread 胡玮文

Victor Hooi <mailto:victorh...@yahoo.com> Sent: Friday, October 1, 2021 5:30 AM To: Eugen Block <mailto:ebl...@nde.ag> Cc: Szabo, Istvan (Agoda) <mailto:istvan.sz...@agoda.com>; 胡玮文 <mailto:huw...@outlook.com>; ceph-users <mailto:ceph-users@ceph.io> Subject: Re: [ceph-

[ceph-users] 回复: Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-03 Thread 胡玮文

ahoo.com> Sent: Friday, October 1, 2021 5:30 AM To: Eugen Block <mailto:ebl...@nde.ag> Cc: Szabo, Istvan (Agoda) <mailto:istvan.sz...@agoda.com>; 胡玮文 <mailto:huw...@outlook.com>; ceph-users <mailto:ceph-users@ceph.io> Subject: Re: [ceph-users] Re: is it possible to remove

[ceph-users] Re: ceph-objectstore-tool core dump

2021-10-03 Thread 胡玮文

> 在 2021年10月4日，00:53，Michael Thomas 写道： > > I recently started getting inconsistent PGs in my Octopus (15.2.14) ceph > cluster. I was able to determine that they are all coming from the same OSD: > osd.143. This host recently suffered from an unplanned power loss, so I'm > not surprised

[ceph-users] Re: How to get ceph bug 'non-errors' off the dashboard?

2021-10-02 Thread 胡玮文

Hi Harry, Please try these commands in CLI: ceph health mute MGR_MODULE_ERROR ceph health mute CEPHADM_CHECK_NETWORK_MISSING Weiwen Hu > 在 2021年10月3日，05:37，Harry G. Coin 写道： > > I need help getting two 'non errors' off the ceph dashboard so it stops > falsely scaring people with the

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-29 Thread 胡玮文

21:40 收件人: 胡玮文<mailto:huw...@outlook.com> 抄送: Igor Fedotov<mailto:ifedo...@suse.de>; Szabo, Istvan (Agoda)<mailto:istvan.sz...@agoda.com>; ceph-users@ceph.io<mailto:ceph-users@ceph.io> 主题: Re: is it possible to remove the db+wal from an external device (nvme) The

[ceph-users] 回复: Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-28 Thread 胡玮文

You may need to use `ceph-volume lvm migrate’ [1] instead of ceph-bluestore-tool. If I recall correctly, this is a pretty new feature, I’m not sure whether it is available to your version. If you use ceph-bluestore-tool, then you need to modify the LVM tags manually. Please refer to the

[ceph-users] Re: Billions of objects upload with bluefs spillover cause osds down?

2021-09-28 Thread 胡玮文

RGW stores a lot of metadata in the db of OSDs [1], so I would expect to see extensive usage on db device if you store billions of objects through RGW. Anyway, splitting over should not cause OSDs to reboot, and should still work better than not having dedicated db device, unless the db device

[ceph-users] Re: Cluster downtime due to unsynchronized clocks

2021-09-23 Thread 胡玮文

> 在 2021年9月23日，15:50，Mark Schouten 写道： > > Hi, > > Last night we’ve had downtime on a simple three-node cluster. Here’s > what happened: > 2021-09-23 00:18:48.331528 mon.node01 (mon.0) 834384 : cluster [WRN] > message from mon.2 was stamped 8.401927s in the future, clocks not > synchronized >

[ceph-users] Re: Cephfs - MDS all up:standby, not becoming up:active

2021-09-18 Thread 胡玮文

在 2021年9月18日，22:50，Eric Dold 写道： Hi Patrick Thanks a lot! After setting ceph fs compat cephfs add_incompat 7 "mds uses inline data" the filesystem is working again. So should I leave this setting as it is now, or do I have to remove it again in a future update? If I understand it[1]

[ceph-users] 回复: No active MDS after upgrade to 16.2.6

2021-09-18 Thread 胡玮文

Hi Robert, You may hit the same bug as me. You can read this thread for details https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/KQ5A5OWRIUEOJBC7VILBGDIKPQGJQIWN/ In short, ensure you have no active MDS, then run: ceph fs compat add_incompat 7 "mds uses inline data" Weiwen Hu

[ceph-users] Re: Cephfs - MDS all up:standby, not becoming up:active

2021-09-17 Thread 胡玮文

iled name=role,type=CephString " "name=yes_i_really_mean_it,type=CephBool,req=false", "remove failed rank", "mds", "rw", FLAG(HIDDEN)) COMMAND_WITH_FLAG("mds cluster_down", "take MDS cluster down", "mds", &

[ceph-users] 回复: Cephfs - MDS all up:standby, not becoming up:active

2021-09-17 Thread 胡玮文

> Did you run the command I suggested before or after you executed `rmfailed` > below? I run “rmfailed” before reading your mail. Then I got MON crashed. I fixed the crash by setting max_mds=2. Then I tried the command you suggested. By reading the code[1], I think I really need to undo the

[ceph-users] Re: Cephfs - MDS all up:standby, not becoming up:active

2021-09-17 Thread 胡玮文

[v2:202.38.247.187:6800/94739959,v1:202.38.247.187:6801/94739959] compat {c=[1],r=[1],i=[7ff]}] dumped fsmap epoch 41448 发件人: Patrick Donnelly<mailto:pdonn...@redhat.com> 发送时间: 2021年9月18日 0:24 收件人: 胡玮文<mailto:huw...@outlook.com> 抄送: Eric Dold<mailto:dold.e...@gmail.com>; ceph

[ceph-users] 回复: Re: Cephfs - MDS all up:standby, not becoming up:active

2021-09-17 Thread 胡玮文

We are experiencing the same when upgrading to 16.2.6 with cephadm. I tried ceph fs set cephfs max_mds 1 ceph fs set cephfs allow_standby_replay false , but still all MDS goes to standby. It seems all ranks are marked failed. Do we have a way to clear this flag? Please help. Our cluster is

[ceph-users] Re: Data loss on appends, prod outage

2021-09-10 Thread 胡玮文

Thanks for sharing this. Following this thread, I realize we are also affected by this bug. We have multiple reports on corrupted tensorboard event file, which I think are caused by this bug. We are using Ubuntu 20.04, the affected kernel version should be HWE kernel > 5.11 and < 5.11.0-34.

[ceph-users] Re: podman daemons in error state - where to find logs?

2021-09-02 Thread 胡玮文

> Really? Under what user do these containers (osd,msd,mgr etc) run then? I > have practically all containers running with specific users. Just to make > sure that if there is some sort of issue with the orchestrator, the issue > will be limited to the used userid. I think with

[ceph-users] Re: podman daemons in error state - where to find logs?

2021-08-31 Thread 胡玮文

With cephadm, You can find the logs with “journalctl” command outside of the container. Or you can change config to use traditional log files: ceph config set global log_to_file true > 在 2021年9月1日，09:50，Nigel Williams 写道： > > to answer my own question, the logs are meant to be in >

[ceph-users] 回复: Re: Ceph as a HDFS alternative?

2021-08-26 Thread 胡玮文

RBD now support “read_from_replica=localize”, which may reduce some network traffic. But CephFS seems not to support this. But I think it is not easy to tell Hadoop to schedule jobs taking the data location into account. 发件人: Serkan Çoban 发送时间: 2021年8月26日 17:11

[ceph-users] Re: 回复：[ceph-users]

2021-08-06 Thread 胡玮文

According to release note at https://github.com/ceph/ceph-csi/releases (specifically, "Update ceph to 15.2.11"), you need: * >= 3.3.1 or * >= 3.2.5 and < 3.3 ?? 2021/8/7 9:47, ?? : Is it to update the ceph csi version? I suspected that it was a version problem, so I updated from ceph

[ceph-users] 回复: ceph csi issues

2021-08-06 Thread 胡玮文

Upgrade the client software. 发件人: 峰发送时间: 2021年8月6日 17:05 收件人: ceph-users 主题: [ceph-users] ceph csi issues hi,When the ceph cluster is set to safe mode，(ceph config set mon auth_allow_insecure_global_id_reclaim false). kubernetes deploy ceph

[ceph-users] MDS stop reporting stats

2021-08-06 Thread 胡玮文

Hi all, We are seeing this several times. Some of our MDS stop reporting stats for no obvious reason. And a rolling restart of all MDS in question could resolve this. But restarting active MDS could cause downtime up to several minutes, we don’t want to do this constantly. Client count, MDS

[ceph-users] 回复: How to create single OSD with SSD db device with cephadm

2021-08-03 Thread 胡玮文

Hi all, I want to update this old thread. With the latest ceph version, we are able to replace the step 5-8 with a single command “ceph cephadm osd activate ”. This makes the process easier. Thanks, ceph developers. 发件人: 胡玮文<mailto:huw...@outlook.com> 发送时间: 2020年12月3日 15:05 收件人: Eugen

[ceph-users] Re: PG scaling questions

2021-08-03 Thread 胡玮文

splitting pgs and peering ? I didn’t observe any significant downtime the last time I did this. I think it is several seconds at most. I'm sorry for asking too many questions , i'm trying not to break stuff :) On Tue, Aug 3, 2021 at 3:46 PM 胡玮文 mailto:huw...@outlook.com>> wrote: Hi, Each pla

[ceph-users] 回复: PG scaling questions

2021-08-03 Thread 胡玮文

Hi, Each placement group will get split in 4 pieces in-place all at nearly the same time, no empty pgs will be created. Normally, you only set pg_num, but do not touch pgp_num. Instead, you can set “target_max_misplaced_ratio” (default 5%). Then mgr will increase pgp_num for you. It will

[ceph-users] 回复: 100.000% pgs unknown

2021-08-03 Thread 胡玮文

Also check if you have a half-upgraded cluster and have set “auth_allow_insecure_global_id_reclaim” to false. If that is the case, revert this option to true. 发件人: 峰发送时间: 2021年8月3日 15:37 收件人: ceph-users 主题: [ceph-users] 100.000% pgs unknown

[ceph-users] One slow OSD, causing a dozen of warnings

2021-07-17 Thread 胡玮文

Hi all, We have experienced something strange and scary on our ceph cluster yesterday. Now our cluster is back to health. I want to share our experience here and hopefully, someone can help us find the root cause and prevent it from happening again. TL;DR--An OSD became very slow for an

[ceph-users] Re: Cephfs mount not recovering after icmp-not-reachable

2021-06-14 Thread 胡玮文

Hi Simon, If you have a recent enough kernel, you can try the "recover_session" mount option [1]. Read the doc and be aware of what will happen if the client try to recover after being blacklisted. [1]: https://docs.ceph.com/en/latest/man/8/mount.ceph/#basic Weiwen Hu 在 2021/6/14 下午11:07,

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-06-04 Thread 胡玮文

> 在 2021年6月4日，21:51，Eneko Lacunza 写道： > > Hi, > > We operate a few Ceph hyperconverged clusters with Proxmox, that provides a > custom ceph package repository. They do a great work; and deployment is a > brezee. > > So, even as currently we would rely on Proxmox packages/distribution and

[ceph-users] Re: XFS on RBD on EC painfully slow

2021-05-28 Thread 胡玮文

Hi Reed, Have you tried just start multiple rsync process simultaneously to transfer different directories? Distributed system like ceph often benefits from more parallelism. Weiwen Hu > 在 2021年5月28日，03:54，Reed Dier 写道： > > Hoping someone may be able to help point out where my

[ceph-users] Re: Remapping OSDs under a PG

2021-05-27 Thread 胡玮文

在 2021年5月28日，08:18，Jeremy Hansen 写道： I’m very new to Ceph so if this question makes no sense, I apologize. Continuing to study but I thought an answer to this question would help me understand Ceph a bit more. Using cephadm, I set up a cluster. Cephadm automatically creates a pool for

[ceph-users] Re: MDS stuck in up:stopping state

2021-05-27 Thread 胡玮文

> 在 2021年5月27日，19:11，Mark Schouten 写道： > > On Thu, May 27, 2021 at 12:38:07PM +0200, Mark Schouten wrote: >>> On Thu, May 27, 2021 at 06:25:44AM +, Martin Rasmus Lundquist Hansen >>> wrote: >>> After scaling the number of MDS daemons down, we now have a daemon stuck in >>> the >>>

[ceph-users] 回复: MDS stuck in up:stopping state

2021-05-27 Thread 胡玮文

Hi Martin, You may hit https://tracker.ceph.com/issues/50112, which we failed to find the root cause yet. I resolved this by restart rank 0. (I have only 2 active MDSs) Weiwen Hu 发送自 Windows 10 版邮件应用发件人: Martin Rasmus Lundquist

[ceph-users] Re: Ceph osd will not start.

2021-05-24 Thread 胡玮文

> 在 2021年5月25日，03:52，Peter Childs 写道： > > I'm attempting to get get ceph up and running, and currently feel like I'm > going around in circles. > > I'm attempting to use cephadm and Pacific, currently on debian buster, > mostly because centos7 ain't supported any more and cenotos8 ain't

[ceph-users] Re: Ceph Pacific mon is not starting after host reboot

2021-05-23 Thread 胡玮文

d the third mon was completely removed from config , it > wasn't reported as a failure on 'ceph status'. > > >> On 5/23/2021 7:30 PM, 胡玮文 wrote: >> Hi Adrian, >> >> I have not tried, but I think it will resolve itself automatically after >> som

[ceph-users] Re: Ceph Pacific mon is not starting after host reboot

2021-05-23 Thread 胡玮文

Hi Adrian, I have not tried, but I think it will resolve itself automatically after some minutes. How long have you waited before you do the manual redeploy? Could you also try “ceph mon dump” to see whether mon.node03 is actually removed from monmap when it failed to start? > 在

[ceph-users] Re: RGW failed to start after upgrade to pacific

2021-05-08 Thread 胡玮文

iV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=9RybUWq4I22cnyeqdooPEbMTH99wAUQFgjc%2BwcI1VEg%3Dreserved=0 >> YouTube: >> https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgoo.gl%2FPGE1Bxdata=04%7C01%7C%7Cb4002f6c546e41fd79ee08d8f91cada6%7C84df9e7fe9f640afb

[ceph-users] Re: Weird PG Acting Set

2021-05-07 Thread 胡玮文

在 2021/5/7 下午6:46, Lazuardi Nasution 写道: Hi, After recreating some related OSDs (3, 71 and 237), now the acting set is normal but the PG is incomplete now and there are slow ops on primary OSD (3). I have tried to make it normal Hi Lazuardi, I assume this pg is in a EC 4+2 pool, so you can

[ceph-users] Re: Where is the MDS journal written to?

2021-05-04 Thread 胡玮文

> 在 2021年5月4日，22:31，mabi 写道： > > Hello, > > I have a small Octopus cluster (3 mon/mgr nodes, 3 osd nodes) installed with > cephadm and hence running inside podman containers on Ubuntu 20.04. > > I want to use CephFS so I created a fs volume and saw that two MDS containers > have been

[ceph-users] Re: s3 requires twice the space it should use

2021-04-15 Thread 胡玮文

Hi Boris, Could you check something like ceph daemon osd.23 perf dump | grep numpg to see if there are some stray or removing PG? Weiwen Hu > 在 2021年4月15日，22:53，Boris Behrens 写道： > > Ah you are right. > [root@s3db1 ~]# ceph daemon osd.23 config get bluestore_min_alloc_size_hdd > { >

[ceph-users] Re: cephadm:: how to change the image for services

2021-04-05 Thread 胡玮文

> 在 2021年4月5日，20:48，Adrian Sevcenco 写道： > > On 4/5/21 3:27 PM, 胡玮文 wrote: >>>> 在 2021年4月5日，19:29，Adrian Sevcenco 写道： >>> >>> Hi! How/where can i change the image configured for a service? >>> I tried to modify /var/lib/ceph///unit.{image,run}

[ceph-users] Re: cephadm:: how to change the image for services

2021-04-05 Thread 胡玮文

在 2021年4月5日，19:29，Adrian Sevcenco 写道： Hi! How/where can i change the image configured for a service? I tried to modify /var/lib/ceph///unit.{image,run} but after restarting ceph orch ps shows that the service use the same old image. Hi Adrian, Try “ceph config set container_image ” where

[ceph-users] RGW failed to start after upgrade to pacific

2021-04-04 Thread 胡玮文

Hi all, I upgraded to the pacific version with cephadm. However, all our RGW daemons cannot start anymore. any help is appreciated. Here are the logs when starting RGW. I set debug_rados and debug_rgw to 20/20 systemd[1]: Started Ceph rgw.smil.b7-1.gpu006.twfefs for

[ceph-users] Re: New Issue - Mapping Block Devices

2021-03-23 Thread 胡玮文

> 在 2021年3月23日，13:12，duluxoz 写道： > > Hi All, > > I've got a new issue (hopefully this one will be the last). > > I have a working Ceph (Octopus) cluster with a replicated pool (my-pool), an > erasure-coded pool (my-pool-data), and an image (my-image) created - all > *seems* to be working

[ceph-users] Re: ceph octopus mysterious OSD crash

2021-03-18 Thread 胡玮文

“podman logs ceph-xxx-osd-xxx” may contains additional logs. > 在 2021年3月19日，04:29，Philip Brown 写道： > > I've been banging on my ceph octopus test cluster for a few days now. > 8 nodes. each node has 2 SSDs and 8 HDDs. > They were all autoprovisioned so that each HDD gets an LVM slice of an

[ceph-users] Re: Container deployment - Ceph-volume activation

2021-03-11 Thread 胡玮文

Hi, Assuming you are using cephadm? Checkout this https://docs.ceph.com/en/latest/cephadm/osd/#activate-existing-osds ceph cephadm osd activate ... 在 2021年3月11日，23:01，Cloud Guy 写道： Hello, TL;DR Looking for guidance on ceph-volume lvm activate --all as it would apply to a containerized

[ceph-users] Re: Ceph orch syntax to create OSD on a partition?

2021-01-08 Thread 胡玮文

Hi Marc, You can create LVM logical volumes manually, then replace “/dev/sdX” with “vg-name/lv-name”. It should work. Weiwen Hu > 在 2021年1月8日，05:26，Marc Spencer 写道： > > All, > > I acknowledge it’s not a fantastic idea, but I wish to create an OSD on a > partition rather than an entire

[ceph-users] Re: device management and failure prediction

2020-12-28 Thread 胡玮文

Hi Suresh, I have tried to enable failure prediction but can’t get it to work. Cloud prediction is not yet implemented if I read the source code correctly. And if I enable the “diskprediction_local” mgr module, the mgr stuck initializing for hours. I guess it is downloading some large machine

[ceph-users] Re: changing OSD IP addresses in octopus/docker environment

2020-12-17 Thread 胡玮文

What if you just stop the containers, configure the new IP address for that server, then restart the containers? I think it should just work as long as this server can still reach the MONs. > 在 2020年12月18日，03:18，Philip Brown 写道： > > I was wondering how to change the IPs used for the OSD

[ceph-users] Re: ceph stuck removing image from trash

2020-12-15 Thread 胡玮文

Hi Andre, I once faced the same problem. It turns out that ceph need to scan every object in the image when deleting it, if object map is not enabled. This will take years on such a huge image. I ended up deleted the whole pool to get rid of the huge image. Maybe you can scan all the objects

[ceph-users] Re: Running Mons on msgrv2/3300 only.

2020-12-09 Thread 胡玮文

Out of curiosity, how DNS lookup [1] works with v2 only mons? Since we can’t specify whether the port is v1 or v2 in SRV records. Or it just works if I specify 3300 port in SRV records? [1]: https://docs.ceph.com/en/latest/rados/configuration/mon-lookup-dns/ 在 2020年12月9日，18:10，Wido den

[ceph-users] Re: add server in crush map before osd

2020-12-03 Thread 胡玮文

You can also just use a single command: ceph osd crush add-bucket host room= > 在 2020年12月4日，00:00，Francois Legrand 写道： > > Thank for your advices. > > it was exactly what I needed. > > Indeed, I did a : > > ceph osd crush add-bucket host > ceph osd crush move room= > > > But also set

[ceph-users] Re: Increase number of objects in flight during recovery

2020-12-03 Thread 胡玮文

Hi, There is a “OSD recovery priority” dialog box in web dashboard. Configurations it will change includes: osd_max_backfill osd_recovery_max_active osd_recovery_max_single_start osd_recovery_sleep Tune these config may helps. “High” priority corresponding to 4, 4, 4, 0, respectively. Some of

[ceph-users] Re: How to create single OSD with SSD db device with cephadm

2020-12-02 Thread 胡玮文

be a lot more convenient. 发件人: Eugen Block<mailto:ebl...@nde.ag> 发送时间: 2020年10月3日 1:18 收件人: 胡玮文<mailto:huw...@outlook.com> 抄送: ceph-users@ceph.io<mailto:ceph-users@ceph.io> 主题: Re: How to create single OSD with SSD db device with cephadm Doing it in the container seems the righ

1 2 >

1 - 100 of 124 matches

Mail list logo