[ceph-users] Re: monitor sst files continue growing

2020-11-13 Thread Zhenshi Zhou
Hi Wido, thanks for the explanation. I think the root cause is the disks are too slow for campaction. I add two new mon with ssd to the cluter to speed it up and the issue resolved. That's a good advice and I have plan to migrate my mon to bigger SSD disks. Thanks again. Wido den Hollander

[ceph-users] Re: monitor sst files continue growing

2020-10-30 Thread Wido den Hollander
On 29/10/2020 19:29, Zhenshi Zhou wrote: Hi Alex, We found that there were a huge number of keys in the "logm" and "osdmap" table while using ceph-monstore-tool. I think that could be the root cause. But that is exactly how Ceph works. It might need that very old OSDMap to get all the PGs

[ceph-users] Re: monitor sst files continue growing

2020-10-29 Thread Zhenshi Zhou
Hi Alex, We found that there were a huge number of keys in the "logm" and "osdmap" table while using ceph-monstore-tool. I think that could be the root cause. Well, some pages also say that disable 'insight' module can resolve this issue, but I checked our cluster and we didn't enable this

[ceph-users] Re: monitor sst files continue growing

2020-10-29 Thread Frank Schilder
I think you really need to sit down and explain the full story. Dropping one-liners with new information will not work via e-mail. I have never heard of the problem you are facing, so you did something that possibly no-one else has done before. Unless we know the full history from the last

[ceph-users] Re: monitor sst files continue growing

2020-10-29 Thread Alex Gracie
We hit this issue over the weekend on our HDD backed EC Nautilus cluster while removing a single OSD. We also did not have any luck using compaction. The mon-logs filled up our entire root disk on the mon servers and we were running on a single monitor for hours while we tried to finish

[ceph-users] Re: monitor sst files continue growing

2020-10-29 Thread Zhenshi Zhou
I then follow someone's guidance, add 'mon compact on start = true' to the config and restart one mon. That mon has not joined the cluster until I added two mon deployed on virtual machines with ssd into the cluster. And now the cluster is fine except the pg status. [image: image.png] [image:

[ceph-users] Re: monitor sst files continue growing

2020-10-29 Thread Zhenshi Zhou
Hi, I was so anxious a few hours ago cause the sst files were growing so fast and I don't think the space on mon servers could afford it. Let me talk it from the beginning. I have a cluster with OSD deployed on SATA(7200rpm). 10T each OSD and I used ec pool for more space.I added new OSDs into

[ceph-users] Re: monitor sst files continue growing

2020-10-29 Thread Frank Schilder
This does not explain incomplete and inactive PGs. Are you hitting https://tracker.ceph.com/issues/46847 (see also thread "Ceph does not recover from OSD restart"? In that case, temporarily stopping and restarting all new OSDs might help. Best regards, = Frank Schilder AIT Risø

[ceph-users] Re: monitor sst files continue growing

2020-10-29 Thread Frank Schilder
Your problem is the overall cluster health. The MONs store cluster history information that will be trimmed once it reaches HEALTH_OK. Restarting the MONs only makes things worse right now. The health status is a mess, no MGR, a bunch of PGs inactive, etc. This is what you need to resolve. How

[ceph-users] Re: monitor sst files continue growing

2020-10-29 Thread Zhenshi Zhou
I reset the pg_num after adding osd, it made some pg inactive(in activating state) Frank Schilder 于2020年10月29日周四 下午3:56写道: > This does not explain incomplete and inactive PGs. Are you hitting > https://tracker.ceph.com/issues/46847 (see also thread "Ceph does not > recover from OSD restart"? In

[ceph-users] Re: monitor sst files continue growing

2020-10-29 Thread Zhenshi Zhou
After add OSDs into the cluster, the recovery and backfill progress has not finished yet Zhenshi Zhou 于2020年10月29日周四 下午3:29写道: > MGR is stopped by me cause it took too much memories. > For pg status, I added some OSDs in this cluster, and it > > Frank Schilder 于2020年10月29日周四 下午3:27写道: > >>

[ceph-users] Re: monitor sst files continue growing

2020-10-29 Thread Zhenshi Zhou
MGR is stopped by me cause it took too much memories. For pg status, I added some OSDs in this cluster, and it Frank Schilder 于2020年10月29日周四 下午3:27写道: > Your problem is the overall cluster health. The MONs store cluster history > information that will be trimmed once it reaches HEALTH_OK.

[ceph-users] Re: monitor sst files continue growing

2020-10-29 Thread Zhenshi Zhou
MISTAKE: version is 14.2.12 Zhenshi Zhou 于2020年10月29日周四 下午2:38写道: > My cluster is 12.2.12, with all sata disks. > the space of store.db: > [image: image.png] > > How can I deal with it? > > Zhenshi Zhou 于2020年10月29日周四 下午2:37写道: > >> Hi all, >> >> My cluster is in wrong state. SST files in

[ceph-users] Re: monitor sst files continue growing

2020-10-29 Thread Zhenshi Zhou
My cluster is 12.2.12, with all sata disks. the space of store.db: [image: image.png] How can I deal with it? Zhenshi Zhou 于2020年10月29日周四 下午2:37写道: > Hi all, > > My cluster is in wrong state. SST files in /var/lib/ceph/mon/xxx/store.db > continue growing. It claims mon are using a lot of disk