Hi Eugen,
thanks for your answer. I gave a search another try and did indeed find
something:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/TN6WJVCHTVJ4YIA4JH2D2WYYZFZRMSXI/
Quote: " ... And I've also observed that the repair req isn't queued up -- if
the OSDs are busy with
Hi,
I’m not sure if I remember correctly but I believe the backfill is
preventing the repair to happen. I think it has been discussed a
couple of times on this list but I don’t know right now if you can
tweak anything to prioritize the repair, I believe there is, but not
sure. It looks
Hi all,
we have an inconsistent PG for a couple of days now (octopus latest):
# ceph status
cluster:
id:
health: HEALTH_ERR
1 scrub errors
Possible data damage: 1 pg inconsistent
services:
mon: 5 daemons, quorum
I was wondering what is a best practice for monitoring drives. I am
transitioning from sata to sas drives which have less smartctl information not
even power on hours.
eg. is ceph registering somewhere when an osd has been created?
___
ceph-users
Unfortunately I can't verify if ceph reports any inactive PG. As soon as
the second host disconnects practically everything is locked, nothing
appears even using "ceph -w". It only appears that the OSDs are offline
when dcs2 returns.
Note: Apparently there was a new update recently. When I was in
Dan,
Again i am using 16.2.10 on rocky 8
I decided to take a step back and check a variety of options before I do
anything. Here are my results.
If I use this rule:
rule mypoolname {
id -5
type erasure
step take myroot
step choose indep 4 type rack step choose indep 2 type
If you do not mind data loss, why do you care about needing to have 2x?
Alternative would be to change the replication so it is not over hosts but just
on osd's that can reside on one host.
> Marc, but there is no mechanism to prevent IO pause? At the moment I
> don't worry about data loss.
> I
Could you share more details? Does ceph report inactive PGs when one
node is down? Please share:
ceph osd tree
ceph osd pool ls detail
ceph osd crush rule dump
ceph pg ls-by-pool
ceph -s
Zitat von Murilo Morais :
Thanks for answering.
Marc, but there is no mechanism to prevent IO pause? At
Thanks for answering.
Marc, but there is no mechanism to prevent IO pause? At the moment I don't
worry about data loss.
I understand that putting it as replica x1 can work, but I need it to be x2.
Em qui., 13 de out. de 2022 às 12:26, Marc
escreveu:
>
> >
> > I'm having strange behavior on a
Hi Liang,
My guess would be this bug:
https://tracker.ceph.com/issues/44660
https://www.spinics.net/lists/ceph-users/msg30151.html
It's actually existed for at least 6 years:
https://tracker.ceph.com/issues/16767
Which occurs any time you reupload the same *part* in a single Multipart Upload
>
> I'm having strange behavior on a new cluster.
Not strange, by design
> I have 3 machines, two of them have the disks. We can name them like
> this:
> dcs1 to dcs3. The dcs1 and dcs2 machines contain the disks.
>
> I started bootstrapping through dcs1, added the other hosts and left mgr
>
I'm using Host as Failure Domain.
Em qui., 13 de out. de 2022 às 11:41, Eugen Block escreveu:
> What is your failure domain? If it's osd you'd have both PGs on the
> same host and then no replica is available.
>
> Zitat von Murilo Morais :
>
> > Eugen, thanks for responding.
> >
> > In the
What is your failure domain? If it's osd you'd have both PGs on the
same host and then no replica is available.
Zitat von Murilo Morais :
Eugen, thanks for responding.
In the current scenario there is no way to insert disks into dcs3.
My pools are size 2, at the moment we can't add more
Eugen, thanks for responding.
In the current scenario there is no way to insert disks into dcs3.
My pools are size 2, at the moment we can't add more machines with disks,
so it was sized in this proportion.
Even with min_size=1, if dcs2 stops the IO also stops.
Em qui., 13 de out. de 2022 às
Hi,
if your pools have a size 2 (don't do that except in test
environments) and host is your failure domain then all IO is paused if
one osd host goes down, depending on your min_size. Can you move some
disks to dcs3 so you can have size 3 pools with min_size 2?
Zitat von Murilo Morais :
Good morning everyone.
I'm having strange behavior on a new cluster.
I have 3 machines, two of them have the disks. We can name them like this:
dcs1 to dcs3. The dcs1 and dcs2 machines contain the disks.
I started bootstrapping through dcs1, added the other hosts and left mgr on
dcs3 only.
Hi Yoann,
I'm not using pacific yet, but this here looks very strange to me:
cephfs_data data 243T 19.7T
usage: 245 TiB used, 89 TiB / 334 TiB avail
I'm not sure if there is a mix of raw vs. stored here. Assuming the cephfs_data
allocation is right, I'm wondering what your
On 10/13/22 13:47, Yoann Moulin wrote:
Also, you mentioned you're using 7 active MDS. How's that working out
for you? Do you use pinning?
I don't really know how to do that, I have 55 worker nodes in my K8s
cluster, each one can run pods that have access to a cephfs pvc. we have
28 cephfs
Hello Patrick,
Unfortunately, increasing the number of PG did not help a lot in the end, my
cluster is still in trouble...
Here the current state of my cluster : https://pastebin.com/Avw5ybgd
Is 256 good value in our case ? We have 80TB of data with more than 300M files.
You want at least
Hi Christian,
resharding is not an issue, because we only sync the metadata. Like aws s3.
But this looks very broken to me, does anyone got an idea how to fix that?
> Am 13.10.2022 um 11:58 schrieb Christian Rohmann
> :
>
> Hey Boris,
>
>> On 07/10/2022 11:30, Boris Behrens wrote:
>> I just
Hey Boris,
On 07/10/2022 11:30, Boris Behrens wrote:
I just wanted to reshard a bucket but mistyped the amount of shards. In a
reflex I hit ctrl-c and waited. It looked like the resharding did not
finish so I canceled it, and now the bucket is in this state.
How can I fix it. It does not show
Hi Stefan,
the cluster is built of several old machines, with different numbers of
disks (from 8 to 16) and disk sizes (from 500 GB to 4 TB). After the PG
increase it is still recovering: the number of PGP is at 213 and has to
grow up to 256. The balancer status gives:
{
"active": true,
Hi,
I'm seeing constant 25-50MB/s writes to the metadata pool even when all
clients and the cluster is idling and in clean state. This surely can't be
normal?
There's no apparent issues with the performance of the cluster but this
write rate seems excessive and I don't know where to look for the
On 10/13/22 09:32, Nicola Mori wrote:
Dear Ceph users,
I'd need some help in understanding the total space in a CephFS. My
cluster is currently built of 8 machines, the one with the smallest
capacity has 8 TB of total disk space, and the total available raw space
is 153 TB. I set up a 3x
Hi,
Unfortunately, the "large omap objects" message recurred last weekend. So I ran
the script you showed to check the situation. `used_.*` is small, but `omap_.*`
is large, which is strange. Do you have any idea what it is?
id used_mbytes used_objects omap_used_mbytes omap_used_keys
Dear Ceph users,
I'd need some help in understanding the total space in a CephFS. My
cluster is currently built of 8 machines, the one with the smallest
capacity has 8 TB of total disk space, and the total available raw space
is 153 TB. I set up a 3x replicated metadata pool and a 6+2 erasure
Thank you Frank for the insight. I'd need to study a bit more the
details of all of this, but for sure now I understand it a bit better.
Nicola
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
Hi All,
Is there any way to configure capabilities for a user to allow the client to
*only* create/delete snapshots? I can't find anything which suggests this is
possible on https://docs.ceph.com/en/latest/rados/operations/user-management/.
Context: I'm writing a script to automatically create
28 matches
Mail list logo