>undergo deepscrub and regular scrub cannot be completed in a timely manner. I
>have noticed that these PGs appear to be concentrated on a single OSD. I am
>seeking your guidance on how to address this issue and would appreciate any
>insights or suggestions you may have.
>
The usual "see if
Hi Emmanuel,
This should be one known issue as https://tracker.ceph.com/issues/58392
and there is one fix in https://github.com/ceph/ceph/pull/49652.
Could you just stop all the clients first and then set the 'max_mds' to
1 and then restart the MDS daemons ?
Thanks
On 5/3/23 16:01,
On 5/1/23 17:35, Frank Schilder wrote:
Hi all,
I think we might be hitting a known problem
(https://tracker.ceph.com/issues/57244). I don't want to fail the mds yet,
because we have troubles with older kclients that miss the mds restart and hold
on to cache entries referring to the killed
Several users have complained for some time that our DMARC/DKIM handling
is not correct. I've recently had time to go study DMARC, DKIM, SPF,
SRS, and other tasty morsels of initialisms, and have thus made a change
to how Mailman handles DKIM signatures for the list:
If a domain advertises
Dear all,
I am writing to seek your assistance in resolving an issue with my Ceph cluster.
Currently, the cluster is experiencing a problem where the number of Placement
Groups (PGs) that need to undergo deepscrub and regular scrub cannot be
completed in a timely manner. I have noticed that
On Thu, May 4, 2023 at 11:35 AM Chris Palmer wrote:
>
> Hi
>
> Grateful if someone could clarify some things about CephFS Scrubs:
>
> 1) Am I right that a command such as "ceph tell mds.cephfs:0 scrub start
> / recursive" only triggers a forward scrub (not a backward scrub)?
The naming here that
If we get some time, I would like to include:
https://github.com/ceph/ceph/pull/50894.
Regards,
Radek
On Thu, May 4, 2023 at 5:56 PM Venky Shankar wrote:
>
> Hi Yuri,
>
> On Wed, May 3, 2023 at 7:10 PM Venky Shankar wrote:
> >
> > On Tue, May 2, 2023 at 8:25 PM Yuri Weinstein wrote:
> > >
In summary:
Release Notes: https://github.com/ceph/ceph/pull/51301
We plan to finish this release next week and we have the following PRs
planned to be added:
https://github.com/ceph/ceph/pull/51232 -- Venky approved
https://github.com/ceph/ceph/pull/51344 -- Venky in progress
Den tors 4 maj 2023 kl 17:07 skrev :
>
> The radosgw has been configured like this:
>
> [client.rgw.ceph1]
> host = ceph1
> rgw_frontends = beast port=8080 ssl_port=443 ssl_certificate=/root/ssl/ca.crt
> ssl_private_key=/root/ssl/ca.key
> #rgw_frontends = beast port=8080 ssl_port=443
>
Hi Yuri,
On Wed, May 3, 2023 at 7:10 PM Venky Shankar wrote:
>
> On Tue, May 2, 2023 at 8:25 PM Yuri Weinstein wrote:
> >
> > Venky, I did plan to cherry-pick this PR if you approve this (this PR
> > was used for a rerun)
>
> OK. The fs suite failure is being looked into
>
for setting the user, `ceph cephadm set-user` command should do it. Bit
surprised by the second part of that though. With passwordless sudo access
I would have expected that to start working.
On Thu, May 4, 2023 at 11:27 AM Reza Bakhshayeshi
wrote:
> Thank you.
> I don't see any more errors
Hi
Grateful if someone could clarify some things about CephFS Scrubs:
1) Am I right that a command such as "ceph tell mds.cephfs:0 scrub start
/ recursive" only triggers a forward scrub (not a backward scrub)?
2) I couldn't find any reference to forward scrubs being done
automatically and
Thank you.
I don't see any more errors rather than:
2023-05-04T15:07:38.003+ 7ff96cbe0700 0 log_channel(cephadm) log [DBG]
: Running command: sudo which python3
2023-05-04T15:07:38.025+ 7ff96cbe0700 0 log_channel(cephadm) log [DBG]
: Connection to host1 failed. Process exited with
The radosgw has been configured like this:
[client.rgw.ceph1]
host = ceph1
rgw_frontends = beast port=8080 ssl_port=443 ssl_certificate=/root/ssl/ca.crt
ssl_private_key=/root/ssl/ca.key
#rgw_frontends = beast port=8080 ssl_port=443 ssl_certificate=/root/ssl/ca.crt
I uploaded the output there:
https://nextcloud.widhalm.or.at/nextcloud/s/FCqPM8zRsix3gss
IP 192.168.23.62 is one of my OSDs that were still booting when the
reconnect tries happened. What makes me wonder is that it's the only one
listed when there are a few similar ones in the cluster.
On
what does specifically `ceph log last 200 debug cephadm` spit out? The log
lines you've posted so far I don't think are generated by the orchestrator
so curious what the last actions it took was (and how long ago).
On Thu, May 4, 2023 at 10:35 AM Thomas Widhalm
wrote:
> To completely rule out
Hi all,
there was another election after about 2 hours. trying the stop+reboot
procedure on another mon now. Just for the record, I observe that when I stop
one mon another goes down as a consequence:
[root@ceph-02 ~]# docker stop ceph-mon
ceph-mon
[root@ceph-02 ~]# ceph status
cluster:
Yep, reading but not using LRC. Please keep it on the ceph user list for future
reference -- thanks!
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
From: Eugen Block
Sent: Thursday, May 4, 2023 3:07 PM
To: ceph-users@ceph.io
To completely rule out hung processes, I managed to get another short
shutdown.
Now I'm seeing lots of:
mgr.server handle_open ignoring open from mds.mds01.ceph01.usujbi
v2:192.168.23.61:6800/2922006253; not ready for session (expect reconnect)
mgr finish mon failed to return metadata for
Hello,
After you delete the OSD, the now "invalid" upmap rule will be
automatically removed.
Cheers, Dan
__
Clyso GmbH | https://www.clyso.com
On Wed, May 3, 2023 at 10:13 PM Nguetchouang Ngongang Kevin
wrote:
>
> Hello, I have a question, when happened when i
Hi,
What I'm seeing a lot is this: "[stats WARNING root] cmdtag not found
in client metadata" Can't make anything of it but I guess it's not
showing the initial issue.
Now that I think of it - I started the cluster with 3 nodes which are
now only used as OSD. Could it be there's something
Dear Josh,
Thanks a lot. Your clarification really gives me much courage on using pgmap
tool set for re-balancing.
best regards,
Samuel
huxia...@horebdata.cn
From: Josh Baergen
Date: 2023-05-04 15:46
To: huxia...@horebdata.cn
CC: Janne Johansson; ceph-users
Subject: Re: [ceph-users] Re:
Hi Samuel,
Both pgremapper and the CERN scripts were developed against Luminous,
and in my experience 12.2.13 has all of the upmap patches needed for
the scheme that Janne outlined to work. However, if you have a complex
CRUSH map sometimes the upmap balancer can struggle, and I think
that's true
Thanks.
I set the log level to debug, try a few steps and then come back.
On 04.05.23 14:48, Eugen Block wrote:
Hi,
try setting debug logs for the mgr:
ceph config set mgr mgr/cephadm/log_level debug
This should provide more details what the mgr is trying and where it's
failing, hopefully.
Thanks for the reply.
"Refreshed" is "3 weeks ago" on most lines. The running mds and
osd.cost_capacity are both "-" in this column.
I'm already done with "mgr fail", that didn't do anything. And I even
tried a complete shut down during a maintenance windows that was not 3
weeks ago but
Hi,
I don't think you've shared your osd tree yet, could you do that?
Apparently nobody else but us reads this thread or nobody reading this
uses the LRC plugin. ;-)
Thanks,
Eugen
Zitat von Michel Jouvin :
Hi,
I had to restart one of my OSD server today and the problem showed
up
First thing I always check when it seems like orchestrator commands aren't
doing anything is "ceph orch ps" and "ceph orch device ls" and check the
REFRESHED column. If it's well above 10 minutes for orch ps or 30 minutes
for orch device ls, then it means the orchestrator is most likely hanging
on
Hi,
try setting debug logs for the mgr:
ceph config set mgr mgr/cephadm/log_level debug
This should provide more details what the mgr is trying and where it's
failing, hopefully. Last week this helped to identify an issue between
a lower pacific issue for me.
Do you see anything in the
Hi,
I'm in the process of upgrading my cluster from 17.2.5 to 17.2.6 but the
following problem existed when I was still everywhere on 17.2.5 .
I had a major issue in my cluster which could be solved with a lot of
your help and even more trial and error. Right now it seems that most is
Hi all,
I think I can reduce the defcon level a bit. Since I couldn't see something in
the mon log, I started to try if its a specific mon that causes trouble by
shutting one by one down for a while. I got lucky at the first try. Shutting
down the leader stopped the voting from happening.
I
Janne,
thanks a lot for the detailed scheme. I totally agree that the upmap approach
would be one of best methods, however, my current cluster is working on
Luminious 12.2.13 version and upmap seems not work reliably on Lumnious.
samuel
huxia...@horebdata.cn
From: Janne Johansson
Date:
On Thu, May 4, 2023 at 11:27 AM Kamil Madac wrote:
>
> Thanks for the info.
>
> As a solution we used rbd-nbd which works fine without any issues. If we will
> have time we will also try to disable ipv4 on the cluster and will try kernel
> rbd mapping again. Are there any disadvantages when
Hi all,
I have to get back to this case. On Monday I had to restart an MDS to get rid
of a stuck client caps recall. Right after that fail-over, the MONs went into a
voting frenzy again. I already restarted all of them like last time, but this
time this doesn't help. I might be in a different
Hi,
I had to restart one of my OSD server today and the problem showed up
again. This time I managed to capture "ceph health detail" output
showing the problem with the 2 PGs:
[WRN] PG_AVAILABILITY: Reduced data availability: 2 pgs inactive, 2 pgs down
pg 56.1 is down, acting
Den tors 4 maj 2023 kl 10:39 skrev huxia...@horebdata.cn
:
> Dear Ceph folks,
>
> I am writing to ask for advice on best practice of expanding ceph cluster. We
> are running an 8-node Ceph cluster and RGW, and would like to add another 10
> node, each of which have 10x 12TB HDD. The current
Thanks for the info.
As a solution we used rbd-nbd which works fine without any issues. If we
will have time we will also try to disable ipv4 on the cluster and will try
kernel rbd mapping again. Are there any disadvantages when using NBD
instead of kernel driver?
Thanks
On Wed, May 3, 2023 at
help
----
??:
"ceph-users"
ceph-volume approved https://jenkins.ceph.com/job/ceph-volume-test/553/
On Wed, 3 May 2023 at 22:43, Guillaume Abrioux wrote:
> The failure seen in ceph-volume tests isn't related.
> That being said, it needs to be fixed to have a better view of the current
> status.
>
> On Wed, 3 May 2023 at
Dear Ceph folks,
I am writing to ask for advice on best practice of expanding ceph cluster. We
are running an 8-node Ceph cluster and RGW, and would like to add another 10
node, each of which have 10x 12TB HDD. The current 8-node has ca. 400TB user
data.
I am wondering whether to add 10 nodes
After running the tool for 11 hours straight, it exited with the
following exception:
Traceback (most recent call last):
File "/home/webis/first-damage.py", line 156, in
traverse(f, ioctx)
File "/home/webis/first-damage.py", line 84, in traverse
for (dnk, val) in it:
File
Hi Emmaneul
It was a while ago, but as I recall I evicted all clients and that allowed
me to restart the MDS servers. There was something clearly "broken" in how
at least one of the clients was interacting with the system.
Peter
On Thu, 4 May 2023 at 07:18, Emmanuel Jaep wrote:
> Hi,
>
> did
41 matches
Mail list logo