[ceph-users] MDS corrupt (also RADOS-level copy?)

2023-05-31 Thread Jake Grimmett
wanted to get a feeling from others about how dangerous this could be? We have a backup, but as there is 1.8PB of data, it's going to take a few weeks to restore.... any ideas gratefully received. Jake -- Dr Jake Grimmett Head Of Scientific Computing MRC Laboratory of

[ceph-users] Re: MDS corrupt (also RADOS-level copy?)

2023-05-31 Thread Jake Grimmett
Dear All, My apologies, I forgot to state we are using Quincy 17.2.6 thanks again, Jake root@wilma-s1 15:22 [~]: ceph -v ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable) Dear All, we are trying to recover from what we suspect is a corrupt MDS :( and have been

[ceph-users] Re: [Ceph-announce] v18.2.4 Reef released

2024-07-29 Thread Jake Grimmett
ribe send an email to ceph-users-le...@ceph.io For help, read https://www.mrc-lmb.cam.ac.uk/scicomp/ then contact unixad...@mrc-lmb.cam.ac.uk -- Dr Jake Grimmett Head Of Scientific Computing MRC Laboratory of Molecular Biology Francis Crick Avenue, Cambridge CB2 0QH, UK. Phone 01223 267019 / Mobil

[ceph-users] Re: Has anyone contact Data for Samsung Datacenter SSD Support ?

2021-03-16 Thread Jake Grimmett
rs-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io Note: I am working from home until further notice. For help, contact unixad...@mrc-lmb.cam.ac.uk -- Dr Jake Grimmett Head Of Scientif

[ceph-users] dashboard with grafana embedding in 16.2.6

2021-11-25 Thread Jake Grimmett
ard1 in grafana? The grafana install docs here: https://docs.ceph.com/en/latest/mgr/dashboard/ State: "Add Prometheus as data source to Grafana using the Grafana Web UI." If the data source is now hard coded to "Dashboard1", can we update the docs? best regards, Jake -

[ceph-users] Re: dashboard with grafana embedding in 16.2.6

2021-11-26 Thread Jake Grimmett
ashboard1", so we could add a setting to customize that if required. Kind Regards, Ernesto -- Dr Jake Grimmett Head Of Scientific Computing MRC Laboratory of Molecular Biology Francis Crick Avenue, Cambridge CB2 0QH, UK. ___ ceph-users mailing lis

[ceph-users] Disk Failure Predication cloud module?

2022-01-20 Thread Jake Grimmett
module useful? many thanks Jake -- Dr Jake Grimmett Head Of Scientific Computing MRC Laboratory of Molecular Biology Francis Crick Avenue, Cambridge CB2 0QH, UK. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users

[ceph-users] Re: Disk Failure Predication cloud module?

2022-01-21 Thread Jake Grimmett
;s wizard. If for some reason you can not or wish not to opt-it, please share the reason with us. Thanks, Yaarit On Thu, Jan 20, 2022 at 6:39 AM Jake Grimmett <mailto:j...@mrc-lmb.cam.ac.uk>> wrote: Dear All, Is the cloud option for the diskprediction module depreca

[ceph-users] Pause cluster if node crashes?

2022-02-18 Thread Jake Grimmett
look at turning the watchdog on, giving nagios an action, etc, but I'd rather use any tools that ceph has built in. BTW, this is an Octopus cluster 15.2.15, 580 x OSDs, using EC 8+2 best regards, Jake -- Dr Jake Grimmett Head Of Scientific Computing MRC Laboratory of Molecular Biology

[ceph-users] Re: Pause cluster if node crashes?

2022-02-18 Thread Jake Grimmett
n/#confval-mon_osd_down_out_subtree_limit <https://docs.ceph.com/en/latest/rados/configuration/mon-osd-interaction/#confval-mon_osd_down_out_subtree_limit> The default is rack -- you want to set that to "host". Cheers, Dan On Fri., Feb. 18, 2022, 11:23 Jake Grimmett, <mailto:j...@mrc-lmb.cam.ac.uk>&g

[ceph-users] Bug with autoscale-status in 17.2.0 ?

2022-06-10 Thread Jake Grimmett
1.25 7200T 0. 1.01024 32 offFalse Any ideas on what might be going on? We get a similar problem if we specify hdd as the class. best regards Jake -- Dr Jake Grimmett Head Of Scientific Computing MRC Laboratory of Molecular Biology Fra

[ceph-users] Re: Bug with autoscale-status in 17.2.0 ?

2022-06-10 Thread Jake Grimmett
df --- RAW STORAGE --- CLASS SIZEAVAIL USED RAW USED %RAW USED hdd7.0 PiB 6.9 PiB 126 TiB 126 TiB 1.75 ssd2.7 TiB 2.7 TiB 3.2 GiB 3.2 GiB 0.12 TOTAL 7.0 PiB 6.9 PiB 126 TiB 126 TiB 1.75 --- POOLS --- POOL ID PGS STO

[ceph-users] Re: Suggestion to build ceph storage

2022-06-20 Thread Jake Grimmett
bscribe send an email to ceph-users-le...@ceph.io For help, read https://www.mrc-lmb.cam.ac.uk/scicomp/ then contact unixad...@mrc-lmb.cam.ac.uk -- Dr Jake Grimmett Head Of Scientific Computing MRC Laboratory of Molecular Biology Francis Crick Avenue, Cambridge CB2 0QH, UK.

[ceph-users] Quincy: cephfs "df" used 6x higher than "du"

2022-07-20 Thread Jake Grimmett
coded data pool (hdd with NVMe db/wal), and a 3x replicated default data pool (primary_fs_data - NVMe) bluestore_min_alloc_size_hdd is 4096 ceph pool set ec82pool compression_algorithm lz4 ceph osd pool set ec82pool compression_mode aggressive many thanks for any help Jake -- Dr Jake Grimmett

[ceph-users] Re: Quincy: cephfs "df" used 6x higher than "du"

2022-07-20 Thread Jake Grimmett
5 GiB 67 GiB1 KiB 914 MiB 16 TiB 2.14 1.01 99 up thanks Jake On 20/07/2022 11:52, Jake Grimmett wrote: Dear All, We have just built a new cluster using Quincy 17.2.1 After copying ~25TB to the cluster (from a mimic cluster), we see 152 TB used, which is ~6x disparity. Is t

[ceph-users] Re: cephfs and samba

2022-08-19 Thread Jake Grimmett
kernel driver in AlmaLinux 8.6, plus a recent version of Samba, together with Quincy improve performance... best regards Jake -- Dr Jake Grimmett Head Of Scientific Computing MRC Laboratory of Molecular Biology Francis Crick Avenue, Cambridge CB2 0QH, UK.

[ceph-users] objects misplaced jumps up at 5%

2020-09-28 Thread Jake Grimmett
nnel(cluster) log [DBG] : 5.157ds0 starting backfill to osd.469(7) from (0'0,0'0] MAX to 106803'6043528 2020-09-24 14:44:38.938 7f2e569e9700 0 log_channel(cluster) log [DBG] : 5.157ds0 starting backfill to osd.508(1) from (0'0,0'0] MAX to 106803'6043528 2020-09-24 14:44:38.947 7f2e569e9700 0 log_channel(clus

[ceph-users] Re: objects misplaced jumps up at 5%

2020-09-28 Thread Jake Grimmett
gt; On 2020-09-28 11:45, Jake Grimmett wrote: > >> To show the cluster before and immediately after an "episode" >> >> *** >> >> [root@ceph7 ceph]# ceph -s >> cluster: >> id: 36ed7113-08

[ceph-users] Re: objects misplaced jumps up at 5%

2020-09-29 Thread Jake Grimmett
ou can check this by running "ceph osd pool ls detail" and check for > the value of pg target. > > Also: Looks like you've set osd_scrub_during_recovery = false, this > setting can be annoying on large erasure-coded setups on HDDs that see > long recovery times. It's better to get IO priorities right; search > mailing list for osd op queue cut off high. > > Paul -- Dr Jake Grimmett Head Of Scientific Computing MRC Laboratory of Molecular Biology Francis Crick Avenue, Cambridge CB2 0QH, UK. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: objects misplaced jumps up at 5%

2020-09-30 Thread Jake Grimmett
that’s helpful. > > Sent from my iPad > >> On Sep 29, 2020, at 18:34, Jake Grimmett wrote: >> >> Hi Paul, >> >> I think you found the answer! >> >> When adding 100 new OSDs to the cluster, I increased both pg and pgp >> from 4096 to 1

[ceph-users] recovery_unfound

2020-02-03 Thread Jake Grimmett
0'0", "flags": "none", "locations": [ "189(8)", "263(9)" ] } ], "more": false } While it would be nice

[ceph-users] Re: recovery_unfound

2020-02-04 Thread Jake Grimmett
pg: [root@ceph1 ~]# ceph osd down 347 This doesn't change the output of "ceph pg 5.5c9 query", apart from updating the Started time, and ceph health still shows unfound objects. To fix this, do we need to issue a scrub (or deep scrub) so that the objects

[ceph-users] Re: recovery_unfound

2020-02-05 Thread Jake Grimmett
ed various OSD restarts, deep-scrubs, with no change. I'm leaving > things alone hoping that croit.io will update their package to 13.2.8 > soonish.  Maybe that will help kick it in the pants. > > Chad. > ___ > ceph-users mailing list --

[ceph-users] Fwd: PrimaryLogPG.cc: 11550: FAILED ceph_assert(head_obc)

2020-02-10 Thread Jake Grimmett
failing it's primary OSD) * thread describing the bad restart :> <https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/IRKCDRRAH7YZEVXN5CH4JT2NH4EWYRGI/#IRKCDRRAH7YZEVXN5CH4JT2NH4EWYRGI> many thanks! Jake -- Dr Jake Grimmett MRC Laboratory of Molecular Biology Francis Crick Av

[ceph-users] Re: Fwd: PrimaryLogPG.cc: 11550: FAILED ceph_assert(head_obc)

2020-02-11 Thread Jake Grimmett
43 --data /dev/sdab activate the OSD # ceph-volume lvm activate 443 6e252371-d158-4d16-ac31-fed8f7d0cb1f Now watching to see if the cluster recovers... best, Jake On 2/10/20 3:31 PM, Jake Grimmett wrote: > Dear All, > > Following a clunky* cluster restart, we had > > 23 &

[ceph-users] Re: Need clarification on CephFS, EC Pools, and File Layouts

2020-03-04 Thread Jake Grimmett
ile system at this time. Someday we would like > to change this but there is no timeline. > -- Dr Jake Grimmett MRC Laboratory of Molecular Biology Francis Crick Avenue, Cambridge CB2 0QH, UK. ___ ceph-users mailing list -- ceph-users@ceph.io To u

[ceph-users] Re: v14.2.8 Nautilus released

2020-03-17 Thread Jake Grimmett
his was possible and there > was no suggestion to use a default replicated pool and than add the EC > pool. We did exactly the oder way around :-/ > > Best > Dietmar > -- Dr Jake Grimmett Head Of Scientific Computing MRC Laboratory of Molecular Biology Francis Crick Avenue, C

[ceph-users] OSD: FAILED ceph_assert(clone_size.count(clone))

2020-03-23 Thread Jake Grimmett
471'3200829 2020-01-28 15:48:35.574934 This cluster is being used to backup a live cephfs cluster and has 1.8PB of data, including 30 days of snapshots. We are using 8+2 EC. Any help appreciated, Jake Note: I am working from home until further notice. For help, contact unixad...@m

[ceph-users] Help: corrupt pg

2020-03-25 Thread Jake Grimmett
uot; or other advice gratefully received, best regards, Jake Note: I am working from home until further notice. For help, contact unixad...@mrc-lmb.cam.ac.uk -- Dr Jake Grimmett Head Of Scientific Computing MRC Laboratory of Molecular Biology Francis Crick Avenue,

[ceph-users] Re: Help: corrupt pg

2020-03-25 Thread Jake Grimmett
regards, Jake On 25/03/2020 14:22, Eugen Block wrote: Hi, is there any chance to recover the other failing OSDs that seem to have one chunk of this PG? Do the other OSDs fail with the same error? Zitat von Jake Grimmett : Dear All, We are "in a bit of a pickle"... No reply t

[ceph-users] Re: Help: corrupt pg

2020-03-27 Thread Jake Grimmett
(clone_size.count(clone)) leaving us with a pg in a very bad state... I will see if we can buy some consulting time, the alternative is several weeks of rsync. Many thanks again for your advice, it's very much appreciated, Jake On 26/03/2020 17:21, Gregory Farnum wrote: On Wed, Mar 25

[ceph-users] kernel: ceph: mdsmap_decode got incorrect state(up:standby-replay)

2020-04-29 Thread Jake Grimmett
dby_replay ? any advice appreciated, many thanks Jake Note: I am working from home until further notice. For help, contact unixad...@mrc-lmb.cam.ac.uk -- Dr Jake Grimmett Head Of Scientific Computing MRC Laboratory of Molecular Biology Francis Crick Avenue, Cambridge CB2 0QH, UK. Phone 01223 267

[ceph-users] Re: kernel: ceph: mdsmap_decode got incorrect state(up:standby-replay)

2020-04-29 Thread Jake Grimmett
5.656 7f3cfe5f9700 0 mds.0.cache creating system inode with ino:0x1 best regards, Jake On 29/04/2020 14:33, Jake Grimmett wrote: > Dear all, > > After enabling "allow_standby_replay" on our cluster we are getting > (lots) of identical errors on the client /var/log/messa