[ceph-users] Mon crashes virtual void LogMonitor::update_from_paxos(bool*)

2020-01-15 Thread Kevin Hrpcek
n+1, bl); assert(err == 0); assert(bl.length()); Has anyone seen similar or have any ideas? ceph 13.2.8 Thanks! Kevin The first crash/restart Jan 14 20:47:11 sephmon5 ceph-mon: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/D

[ceph-users] January Ceph Science Group Virtual Meeting

2020-01-13 Thread Kevin Hrpcek
2.) Enter Meeting ID: 908675367 3.) Press # Want to test your video connection? https://bluejeans.com/111<https://www.google.com/url?q=https://bluejeans.com/111=D=1579363980705000=AOvVaw3UlW-AxGCX7TXfn8VAGfH4> Kevin -- Kevin Hrpcek NASA VIIRS Atmosphere SIPS Space Science &

[ceph-users] Ceph Science User Group Call October

2019-10-21 Thread Kevin Hrpcek
ttps://www.google.com/url?q=https://bluejeans.com/111=D=1572095869727000=AOvVaw1bRfUtekflHoeS36FKwXw2> -- Kevin Hrpcek NASA VIIRS Atmosphere SIPS Space Science & Engineering Center University of Wisconsin-Madison ___ ceph-users mailing l

Re: [ceph-users] slow ops for mon slowly increasing

2019-09-20 Thread Kevin Olbrich
OK, looks like clock skew is the problem. I thought this is caused by the reboot but it did not fix itself after some minutes (mon3 was 6 seconds ahead). After forcing time sync from the same server, it seems to be solved now. Kevin Am Fr., 20. Sept. 2019 um 07:33 Uhr schrieb Kevin Olbrich

[ceph-users] slow ops for mon slowly increasing

2019-09-19 Thread Kevin Olbrich
"time": "2019-09-20 05:31:52.315083", "event": "psvc:dispatch" }, { "time": "2019-09-20 05:31:52.315161", "event": "auth:wait_for_readable" }, { "time": "2019-09-20 05:31:52.315167", "event": "auth:wait_for_readable/paxos" }, { "time": "2019-09-20 05:31:52.315230", "event": "paxos:wait_for_readable" } ], "info": { "seq": 1709, "src_is_mon": false, "source": "client.? [fd91:462b:4243:47e::1:3]:0/997594187", "forwarded_to_leader": false } } } This is a new situation for me. What am I supposed to do in this case? Thank you! Kind regards Kevin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph Scientific Computing User Group

2019-08-27 Thread Kevin Hrpcek
<https://www.google.com/url?q=https://bluejeans.com/111=D=1567360416884000=AOvVaw0Euz9flNV7X85AWSYNZ2R-> Kevin On 8/2/19 12:08 PM, Mike Perez wrote: We have scheduled the next meeting on the community calendar for August 28 at 14:30 UTC. Each meeting will then take place on the last

Re: [ceph-users] How to add 100 new OSDs...

2019-07-24 Thread Kevin Hrpcek
to a weight of 11 if I did anything smaller. for i in {264..311}; do ceph osd crush reweight osd.${i} 11.0;sleep 4;done Kevin On 7/24/19 12:33 PM, Xavier Trilla wrote: Hi Kevin, Yeah, that makes a lot of sense, and looks even safer than adding OSDs one by one. What do you change, the crush weight

Re: [ceph-users] How to add 100 new OSDs...

2019-07-24 Thread Kevin Hrpcek
. Let the cluster balance and get healthy or close to healthy. Then repeat the previous 2 steps increasing weight by +0.5 or +1.0 until I am at the desired weight. Kevin On 7/24/19 11:44 AM, Xavier Trilla wrote: Hi, What would be the proper way to add 100 new OSDs to a cluster? I have to add

Re: [ceph-users] Ceph Scientific Computing User Group

2019-07-23 Thread Kevin Hrpcek
Update We're going to hold off until August for this so we can promote it on the Ceph twitter with more notice. Sorry for the inconvenience if you were planning on the meeting tomorrow. Keep a watch on the list, twitter, or ceph calendar for updates. Kevin On 7/5/19 11:15 PM, Kevin Hrpcek

Re: [ceph-users] Ceph Scientific Computing User Group

2019-07-05 Thread Kevin Hrpcek
a topic for meetings. I will be brainstorming some conversation starters but it would also be interesting to have people give a deep dive into their use of ceph and what they have built around it to support the science being done at their facility. Kevin On 6/17/19 10:43 AM, Kevin Hrpcek

[ceph-users] Ceph Scientific Computing User Group

2019-06-17 Thread Kevin Hrpcek
. It will be impossible to pick a time that works well for everyone but initially we considered something later in the work day for EU countries. Reply to me if you're interested and please include your timezone. Kevin ___ ceph-users mailing list ceph-users

Re: [ceph-users] QEMU/KVM client compatibility

2019-05-28 Thread Kevin Olbrich
Am Di., 28. Mai 2019 um 10:20 Uhr schrieb Wido den Hollander : > > > On 5/28/19 10:04 AM, Kevin Olbrich wrote: > > Hi Wido, > > > > thanks for your reply! > > > > For CentOS 7, this means I can switch over to the "rpm-nautilus/el7" > > repos

Re: [ceph-users] QEMU/KVM client compatibility

2019-05-28 Thread Kevin Olbrich
Hi Wido, thanks for your reply! For CentOS 7, this means I can switch over to the "rpm-nautilus/el7" repository and Qemu uses a nautilus compatible client? I just want to make sure, I understand correctly. Thank you very much! Kevin Am Di., 28. Mai 2019 um 09:46 Uhr schrieb Wido den

[ceph-users] QEMU/KVM client compatibility

2019-05-27 Thread Kevin Olbrich
much! Kind regards Kevin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Major ceph disaster

2019-05-24 Thread Kevin Flöh
ase excuse any typos. On Fri, May 24, 2019, 4:42 AM Kevin Flöh <mailto:kevin.fl...@kit.edu>> wrote: Hi, we already tried "rados -p ec31 getxattr 10004dfce92.003d parent" but this is just hanging forever if we are looking for unfound objects. It works fine f

Re: [ceph-users] Major ceph disaster

2019-05-24 Thread Kevin Flöh
and found nothing. This is also working for non unfound objects. Is there another way to find the corresponding file? On 24.05.19 11:12 vorm., Burkhard Linke wrote: Hi, On 5/24/19 9:48 AM, Kevin Flöh wrote: We got the object ids of the missing objects with|ceph pg 1.24c li

Re: [ceph-users] Major ceph disaster

2019-05-24 Thread Kevin Flöh
ose objects with:| ceph pg 1.24c mark_unfound_lost revert But first we would like to know which file(s) is affected. Is there a way to map the object id to the corresponding file? || On 23.05.19 3:52 nachm., Alexandre Marangone wrote: The PGs will stay active+recovery_wait+degraded until you so

Re: [ceph-users] Major ceph disaster

2019-05-23 Thread Kevin Flöh
. If anything else happens, you should stop and let us know. -- dan On Thu, May 23, 2019 at 10:59 AM Kevin Flöh wrote: This is the current status of ceph: cluster: id: 23e72372-0d44-4cad-b24f-3641b14b86f4 health: HEALTH_ERR 9/125481144 objects unfound (0.000

Re: [ceph-users] Major ceph disaster

2019-05-23 Thread Kevin Flöh
ing another PG. On Thu, May 23, 2019 at 10:53 AM Kevin Flöh wrote: Hi, we have set the PGs to recover and now they are stuck in active+recovery_wait+degraded and instructing them to deep-scrub does not change anything. Hence, the rados report is empty. Is there a way to stop the recovery w

Re: [ceph-users] Major ceph disaster

2019-05-23 Thread Kevin Flöh
the recovery_wait might be caused by missing objects. Do we need to delete them first to get the recovery going? Kevin On 22.05.19 6:03 nachm., Robert LeBlanc wrote: On Wed, May 22, 2019 at 4:31 AM Kevin Flöh <mailto:kevin.fl...@kit.edu>> wrote: Hi, thank you, it worked. The PGs are not i

Re: [ceph-users] Major ceph disaster

2019-05-22 Thread Kevin Flöh
to repair? Regards, Kevin On 21.05.19 4:52 nachm., Wido den Hollander wrote: On 5/21/19 4:48 PM, Kevin Flöh wrote: Hi, we gave up on the incomplete pgs since we do not have enough complete shards to restore them. What is the procedure to get rid of these pgs? You need to start with markin

Re: [ceph-users] Major ceph disaster

2019-05-21 Thread Kevin Flöh
Hi, we gave up on the incomplete pgs since we do not have enough complete shards to restore them. What is the procedure to get rid of these pgs? regards, Kevin On 20.05.19 9:22 vorm., Kevin Flöh wrote: Hi Frederic, we do not have access to the original OSDs. We exported the remaining

Re: [ceph-users] Major ceph disaster

2019-05-20 Thread Kevin Flöh
then. Best, Kevin On 17.05.19 2:36 nachm., Frédéric Nass wrote: Le 14/05/2019 à 10:04, Kevin Flöh a écrit : On 13.05.19 11:21 nachm., Dan van der Ster wrote: Presumably the 2 OSDs you marked as lost were hosting those incomplete PGs? It would be useful to double confirm that: check with `ceph

Re: [ceph-users] Major ceph disaster

2019-05-17 Thread Kevin Flöh
-id} mark_unfound_lost revert|delete Cheers, Kevin On 15.05.19 8:55 vorm., Kevin Flöh wrote: The hdds of OSDs 4 and 23 are completely lost, we cannot access them in any way. Is it possible to use the shards which are maybe stored on working OSDs as shown in the all_participants list? On 14.05.19

Re: [ceph-users] Major ceph disaster

2019-05-15 Thread Kevin Flöh
ceph osd pool get ec31 min_size min_size: 3 On 15.05.19 9:09 vorm., Konstantin Shalygin wrote: ceph osd pool get ec31 min_size ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Major ceph disaster

2019-05-15 Thread Kevin Flöh
The hdds of OSDs 4 and 23 are completely lost, we cannot access them in any way. Is it possible to use the shards which are maybe stored on working OSDs as shown in the all_participants list? On 14.05.19 5:24 nachm., Dan van der Ster wrote: On Tue, May 14, 2019 at 5:13 PM Kevin Flöh wrote

Re: [ceph-users] Major ceph disaster

2019-05-15 Thread Kevin Flöh
Hi, since we have 3+1 ec I didn't try before. But when I run the command you suggested I get the following error: ceph osd pool set ec31 min_size 2 Error EINVAL: pool min_size must be between 3 and 4 On 14.05.19 6:18 nachm., Konstantin Shalygin wrote: peering does not seem to be blocked

Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Kevin Flöh
quot;: "4(1),23(2),24(0)"     }     ]     }     ],     "probing_osds": [     "2(0)",     "4(1)",     "23(2)",     "24(0)",   

Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Kevin Flöh
On 14.05.19 10:08 vorm., Dan van der Ster wrote: On Tue, May 14, 2019 at 10:02 AM Kevin Flöh wrote: On 13.05.19 10:51 nachm., Lionel Bouton wrote: Le 13/05/2019 à 16:20, Kevin Flöh a écrit : Dear ceph experts, [...] We have 4 nodes with 24 osds each and use 3+1 erasure coding. [...] Here

Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Kevin Flöh
he old one and copy whatever is left. Best regards, Kevin On Mon, May 13, 2019 at 4:20 PM Kevin Flöh wrote: Dear ceph experts, we have several (maybe related) problems with our ceph cluster, let me first show you the current ceph status: cluster: id: 23e72372-0d44-4cad-b24f-36

Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Kevin Flöh
On 13.05.19 10:51 nachm., Lionel Bouton wrote: Le 13/05/2019 à 16:20, Kevin Flöh a écrit : Dear ceph experts, [...] We have 4 nodes with 24 osds each and use 3+1 erasure coding. [...] Here is what happened: One osd daemon could not be started and therefore we decided to mark the osd as lost

[ceph-users] Major ceph disaster

2019-05-13 Thread Kevin Flöh
for the affected osds, which had no effect. Furthermore, the cluster is behind on trimming by more than 40,000 segments and we have folders and files which cannot be deleted or moved. (which are not on the 2 incomplete pgs). Is there any way to solve these problems? Best regards, Kevin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] cluster is not stable

2019-03-12 Thread Kevin Olbrich
Are you sure that firewalld is stopped and disabled? Looks exactly like that when I missed one host in a test cluster. Kevin Am Di., 12. März 2019 um 09:31 Uhr schrieb Zhenshi Zhou : > Hi, > > I deployed a ceph cluster with good performance. But the logs > indicate that

Re: [ceph-users] Usage of devices in SSD pool vary very much

2019-01-26 Thread Kevin Olbrich
3660 1.0 447GiB 142GiB 305GiB 31.84 0.70 43 40 ssd 0.87329 1.0 894GiB 407GiB 487GiB 45.53 1.00 98 41 ssd 0.87329 1.0 894GiB 353GiB 541GiB 39.51 0.87 102 TOTAL 29.9TiB 13.7TiB 16.3TiB 45.66 MIN/MAX VAR: 0.63/1.72 STDDEV: 13.59 Kevin Am So., 6. Jan.

Re: [ceph-users] Rezising an online mounted ext4 on a rbd - failed

2019-01-26 Thread Kevin Olbrich
Am Sa., 26. Jan. 2019 um 13:43 Uhr schrieb Götz Reinicke : > > Hi, > > I have a fileserver which mounted a 4TB rbd, which is ext4 formatted. > > I grow that rbd and ext4 starting with an 2TB rbd that way: > > rbd resize testpool/disk01--size 4194304 > > resize2fs /dev/rbd0 > > Today I wanted to

Re: [ceph-users] Bluestore 32bit max_object_size limit

2019-01-18 Thread KEVIN MICHAEL HRPCEK
On 1/18/19 7:26 AM, Igor Fedotov wrote: Hi Kevin, On 1/17/2019 10:50 PM, KEVIN MICHAEL HRPCEK wrote: Hey, I recall reading about this somewhere but I can't find it in the docs or list archive and confirmation from a dev or someone who knows for sure would be nice. What I recall

[ceph-users] Bluestore 32bit max_object_size limit

2019-01-17 Thread KEVIN MICHAEL HRPCEK
ceph/blob/master/src/os/bluestore/BlueStore.cc#L12331 if (offset + length >= OBJECT_MAX_SIZE) { r = -E2BIG; } else { _assign_nid(txc, o); r = _do_write(txc, c, o, offset, length, bl, fadvise_flags); txc->write_onode(o); } Thanks

Re: [ceph-users] pgs stuck in creating+peering state

2019-01-17 Thread Kevin Olbrich
seconds. Kevin Am Do., 17. Jan. 2019 um 11:57 Uhr schrieb Johan Thomsen : > > Hi, > > I have a sad ceph cluster. > All my osds complain about failed reply on heartbeat, like so: > > osd.10 635 heartbeat_check: no reply from 192.168.160.237:6810 osd.42 > ever on either front

Re: [ceph-users] Problem with CephFS - No space left on device

2019-01-08 Thread Kevin Olbrich
It would but you should not: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-December/014846.html Kevin Am Di., 8. Jan. 2019 um 15:35 Uhr schrieb Rodrigo Embeita : > > Thanks again Kevin. > If I reduce the size flag to a value of 2, that should fix the problem? > > Reg

Re: [ceph-users] Problem with CephFS - No space left on device

2019-01-08 Thread Kevin Olbrich
You use replication 3 failure-domain host. OSD 2 and 4 are full, thats why your pool is also full. You need to add two disks to pf-us1-dfs3 or swap one from the larger nodes to this one. Kevin Am Di., 8. Jan. 2019 um 15:20 Uhr schrieb Rodrigo Embeita : > > Hi Yoann, thanks for your re

Re: [ceph-users] Problem with CephFS - No space left on device

2019-01-08 Thread Kevin Olbrich
Looks like the same problem like mine: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-January/032054.html The free space is total while Ceph uses the smallest free space (worst OSD). Please check your (re-)weights. Kevin Am Di., 8. Jan. 2019 um 14:32 Uhr schrieb Rodrigo Embeita

Re: [ceph-users] Balancer=on with crush-compat mode

2019-01-05 Thread Kevin Olbrich
If I understand the balancer correct, it balances PGs not data. This worked perfectly fine in your case. I prefer a PG count of ~100 per OSD, you are at 30. Maybe it would help to bump the PGs. Kevin Am Sa., 5. Jan. 2019 um 14:39 Uhr schrieb Marc Roos : > > > I have straw2, balancer=

Re: [ceph-users] Usage of devices in SSD pool vary very much

2019-01-05 Thread Kevin Olbrich
48.94 3.92TiB 992255 rbd_vms_ssd_014 372KiB 0662GiB 148 rbd_vms_ssd_01_ec 6 2.85TiB 68.81 1.29TiB 770506 ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable) Kevin Am Sa., 5. Jan. 2019 um 0

Re: [ceph-users] Help Ceph Cluster Down

2019-01-04 Thread Kevin Olbrich
sure code pools where too many disks failed at the same time, you will then see negative values as OSD IDs. Maybe this helps a little bit. Kevin Am Sa., 5. Jan. 2019 um 00:20 Uhr schrieb Arun POONIA : > > Hi Kevin, > > I tried deleting newly added server from Ceph Cluster and loo

Re: [ceph-users] Help Ceph Cluster Down

2019-01-04 Thread Kevin Olbrich
found_lost revert|delete Src: http://docs.ceph.com/docs/mimic/rados/troubleshooting/troubleshooting-pg/ Kevin Am Fr., 4. Jan. 2019 um 20:47 Uhr schrieb Arun POONIA : > > Hi Kevin, > > Can I remove newly added server from Cluster and see if it heals cluster ? > > When I check

Re: [ceph-users] Help Ceph Cluster Down

2019-01-04 Thread Kevin Olbrich
start a new one and bring back the backups (using a better PG count). Kevin Am Fr., 4. Jan. 2019 um 20:25 Uhr schrieb Arun POONIA : > > Can anyone comment on this issue please, I can't seem to bring my cluster > healthy. > > On Fri, Jan 4, 2019 at 6:26 AM Arun POONIA > wrot

Re: [ceph-users] Usage of devices in SSD pool vary very much

2019-01-04 Thread Kevin Olbrich
PS: Could be http://tracker.ceph.com/issues/36361 There is one HDD OSD that is out (which will not be replaced because the SSD pool will get the images and the hdd pool will be deleted). Kevin Am Fr., 4. Jan. 2019 um 19:46 Uhr schrieb Kevin Olbrich : > > Hi! > > I did what you wrote

Re: [ceph-users] Usage of devices in SSD pool vary very much

2019-01-04 Thread Kevin Olbrich
000 max_new 1000 log_file /var/log/ceph/ceph-mgr.mon01.ceph01.srvfarm.net.log --- end dump of recent events --- Kevin Am Mi., 2. Jan. 2019 um 17:35 Uhr schrieb Konstantin Shalygin : > > On a medium sized cluster with device-classes, I am experiencing a > problem with the SSD pool: >

[ceph-users] TCP qdisc + congestion control / BBR

2019-01-02 Thread Kevin Olbrich
of VMs with BBR but the hypervisors run fq_codel + cubic (OSDs run Ubuntu defaults). Did someone test qdisc and congestion control settings? Kevin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users

[ceph-users] Usage of devices in SSD pool vary very much

2019-01-02 Thread Kevin Olbrich
ool to freeze (because the smallest OSD is taken into account for free space calculation). This would be the worst case as over 100 VMs would freeze, causing lot of trouble. This is also the reason I did not try to enable the balancer again. Kind regards Ke

Re: [ceph-users] KVM+Ceph: Live migration of I/O-heavy VM

2018-12-11 Thread Kevin Olbrich
this in mind as this is still better than shutting down the whole VM. @all Thank you very much for your inputs. I will try some less important VMs and then start migration of the big one. Kind regards Kevin ___ ceph-users mailing list ceph-users@lists.

[ceph-users] KVM+Ceph: Live migration of I/O-heavy VM

2018-12-11 Thread Kevin Olbrich
causes de-deplication on RAM and this host runs about 10 Windows VMs. During reboots or updates, RAM can get full again. Maybe I am to cautious about live-storage-migration, maybe I am not. What are your experiences or advices? Thank you very much! Kind reg

Re: [ceph-users] Packages for debian in Ceph repo

2018-11-15 Thread Kevin Olbrich
I now had the time to test and after installing this package, uploads to rbd are working perfectly. Thank you very much fur sharing this! Kevin Am Mi., 7. Nov. 2018 um 15:36 Uhr schrieb Kevin Olbrich : > Am Mi., 7. Nov. 2018 um 07:40 Uhr schrieb Nicolas Huillard < > nhuill...@do

Re: [ceph-users] Disabling write cache on SATA HDDs reduces write latency 7 times

2018-11-13 Thread Kevin Olbrich
I read the whole thread and it looks like the write cache should always be disabled as in the worst case, the performance is the same(?). This is based on this discussion. I will test some WD4002FYYZ which don't mention "media cache". Kevin Am Di., 13. Nov. 2018 um 09:27 Uhr schri

Re: [ceph-users] Ceph or Gluster for implementing big NAS

2018-11-12 Thread Kevin Olbrich
with scheduler set to noop as it is optimized to consume whole, non-shared devices. Just my 2 cents ;-) Kevin Am Mo., 12. Nov. 2018 um 15:08 Uhr schrieb Dan van der Ster < d...@vanderster.com>: > We've done ZFS on RBD in a VM, exported via NFS, for a couple years. > It's very stable and if y

Re: [ceph-users] Ceph or Gluster for implementing big NAS

2018-11-12 Thread Kevin Olbrich
mount. I had such a setup with nfs and switched to mount CephFS directly. If using NFS with the same data, you must make sure your HA works well to avoid data corruption. With ceph-fuse you directly connect to the cluster, one component less that breaks. Kevin Am Mo., 12. Nov. 2018 um 12:44 U

Re: [ceph-users] ceph 12.2.9 release

2018-11-07 Thread Kevin Olbrich
rors as apt is unable to use older versions (which does work on yum/dnf). Thats why we are implementing "mirror-sync" / rsync with a copy of the repo and the desired packages until such solution is available. Kevin >> Simon >> ___ >

Re: [ceph-users] Packages for debian in Ceph repo

2018-11-07 Thread Kevin Olbrich
Am Mi., 7. Nov. 2018 um 07:40 Uhr schrieb Nicolas Huillard < nhuill...@dolomede.fr>: > > > It lists rbd but still fails with the exact same error. > > I stumbled upon the exact same error, and since there was no answer > anywhere, I figured it was a very simple problem: don't forget to > install

Re: [ceph-users] ceph-deploy osd creation failed with multipath and dmcrypt

2018-11-06 Thread Kevin Olbrich
I met the same problem. I had to create GPT table for each disk, create first partition over full space and then fed these to ceph-volume (should be similar for ceph-deploy). Also I am not sure if you can combine fs-type btrfs with bluestore (afaik this is for filestore). Kevin Am Di., 6. Nov

Re: [ceph-users] Packages for debian in Ceph repo

2018-10-30 Thread Kevin Olbrich
blkdebug blkreplay blkverify bochs cloop dmg file ftp > ftps gluster host_cdrom host_device http https iscsi iser luks nbd nfs > null-aio null-co parallels qcow qcow2 qed quorum raw rbd replication > sheepdog ssh vdi vhdx vmdk vpc vvfat It lists rbd but still fails with the exact same err

Re: [ceph-users] Packages for debian in Ceph repo

2018-10-30 Thread Kevin Olbrich
Is it possible to use qemu-img with rbd support on Debian Stretch? I am on Luminous and try to connect my image-buildserver to load images into a ceph pool. root@buildserver:~# qemu-img convert -p -O raw /target/test-vm.qcow2 > rbd:rbd_vms_ssd_01/test_vm > qemu-img: Unknown protocol 'rbd'

[ceph-users] Command to check last change to rbd image?

2018-10-28 Thread Kevin Olbrich
Hi! Is there an easy way to check when an image was last modified? I want to make sure, that the images I want to clean up, were not used for a long time. Kind regards Kevin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com

Re: [ceph-users] nfs-ganesha version in Ceph repos

2018-10-09 Thread Kevin Olbrich
-ganesha as standalone VM. Kevin Am Di., 9. Okt. 2018 um 19:39 Uhr schrieb Erik McCormick < emccorm...@cirrusseven.com>: > On Tue, Oct 9, 2018 at 1:27 PM Erik McCormick > wrote: > > > > Hello, > > > > I'm trying to set up an nfs-ganesha server with the Ceph FSA

Re: [ceph-users] Fastest way to find raw device from OSD-ID? (osd -> lvm lv -> lvm pv -> disk)

2018-10-08 Thread Kevin Olbrich
Hi Jakub, "ceph osd metadata X" this is perfect! This also lists multipath devices which I was looking for! Kevin Am Mo., 8. Okt. 2018 um 21:16 Uhr schrieb Jakub Jaszewski < jaszewski.ja...@gmail.com>: > Hi Kevin, > Have you tried ceph osd metadata OSDid ? > > Ja

Re: [ceph-users] Fastest way to find raw device from OSD-ID? (osd -> lvm lv -> lvm pv -> disk)

2018-10-08 Thread Kevin Olbrich
Hi! Yes, thank you. At least on one node this works, the other node just freezes but this might by caused by a bad disk that I try to find. Kevin Am Mo., 8. Okt. 2018 um 12:07 Uhr schrieb Wido den Hollander : > Hi, > > $ ceph-volume lvm list > > Does that work for you? > &

[ceph-users] Fastest way to find raw device from OSD-ID? (osd -> lvm lv -> lvm pv -> disk)

2018-10-08 Thread Kevin Olbrich
Hi! Is there an easy way to find raw disks (eg. sdd/sdd1) by OSD id? Before I migrated from filestore with simple-mode to bluestore with lvm, I was able to find the raw disk with "df". Now, I need to go from LVM LV to PV to disk every time I need to check/smartctl a di

Re: [ceph-users] After 13.2.2 upgrade: bluefs mount failed to replay log: (5) Input/output error

2018-10-08 Thread Kevin Olbrich
of new disks to replace all. Most of current disks are of same age. Kevin Am Mi., 3. Okt. 2018 um 13:52 Uhr schrieb Paul Emmerich < paul.emmer...@croit.io>: > There's "ceph-bluestore-tool repair/fsck" > > In your scenario, a few more log files would be interesting: try

Re: [ceph-users] After 13.2.2 upgrade: bluefs mount failed to replay log: (5) Input/output error

2018-10-03 Thread Kevin Olbrich
of three disks back. Object corruption would not be a problem (regarding drop of a journal), as this cluster hosts backups which will fail validation and regenerate. Just marking the OSD lost does not seem to be an option. Is there some sort of fsck for BlueFS? Kevin Igor Fedotov schrieb am Mi

Re: [ceph-users] After 13.2.2 upgrade: bluefs mount failed to replay log: (5) Input/output error

2018-10-03 Thread Kevin Olbrich
Small addition: the failing disks are in the same host. This is a two-host, failure-domain OSD cluster. Am Mi., 3. Okt. 2018 um 10:13 Uhr schrieb Kevin Olbrich : > Hi! > > Yesterday one of our (non-priority) clusters failed when 3 OSDs went down > (EC 8+2) together. > *This is st

[ceph-users] After 13.2.2 upgrade: bluefs mount failed to replay log: (5) Input/output error

2018-10-03 Thread Kevin Olbrich
I have 8 PGs down, the remeining are active and recovery / rebalance.* Kind regards Kevin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Fwd: [Ceph-community] After Mimic upgrade OSD's stuck at booting.

2018-09-26 Thread KEVIN MICHAEL HRPCEK
. It seems like you may also benefit from setting mon_osd_cache_size to a very large number if you have enough memory on your mon servers. I'll hop on the irc today. Kevin On 09/25/2018 05:53 PM, by morphin wrote: After I tried too many things with so many helps on IRC. My pool health is still

Re: [ceph-users] Mimic upgrade failure

2018-09-24 Thread KEVIN MICHAEL HRPCEK
with to get all PGs active+clean, and the cephx change was rolled back to operate normally. Sage, thanks again for your assistance with this. Kevin tl;dr Cache as much as you can. On 09/24/2018 09:24 AM, Sage Weil wrote: Hi Kevin, Do you have an update on the state of the cluster? I've

Re: [ceph-users] data-pool option for qemu-img / ec pool

2018-09-23 Thread Kevin Olbrich
, is there a better way? Kevin Am So., 23. Sep. 2018 um 18:08 Uhr schrieb Paul Emmerich : > > The usual trick for clients not supporting this natively is the option > "rbd_default_data_pool" in ceph.conf which should also work here. > > > Paul > Am So., 23. Sep. 2018 um

[ceph-users] data-pool option for qemu-img / ec pool

2018-09-23 Thread Kevin Olbrich
would now take at least twice the time). Do I miss a parameter for qemu-kvm? Kind regards Kevin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Mimic upgrade failure

2018-09-20 Thread KEVIN MICHAEL HRPCEK
> >, std::less' to see where all of the encoding activity is coming from? I see two possibilities (the mon attempts to cache encoded maps, and the MOSDMap message itself will also reencode if/when that fails). Also: mon_osd_cache_size = 10 by default... try making that 500 or something.

Re: [ceph-users] Mimic upgrade failure

2018-09-20 Thread KEVIN MICHAEL HRPCEK
until the cluster is stable again. Kevin On 09/20/2018 08:13 AM, David Turner wrote: Out of curiosity, what disks do you have your mons on and how does the disk usage, both utilization% and full%, look while this is going on? On Wed, Sep 19, 2018, 1:57 PM Kevin Hrpcek mailto:kevin.hrp

Re: [ceph-users] Crush distribution with heterogeneous device classes and failure domain hosts

2018-09-20 Thread Kevin Olbrich
Thank you very much Paul. Kevin Am Do., 20. Sep. 2018 um 15:19 Uhr schrieb Paul Emmerich < paul.emmer...@croit.io>: > Hi, > > device classes are internally represented as completely independent > trees/roots; showing them in one tree is just syntactic sugar. > >

Re: [ceph-users] Crush distribution with heterogeneous device classes and failure domain hosts

2018-09-20 Thread Kevin Olbrich
To answer my own question: ceph osd crush tree --show-shadow Sorry for the noise... Am Do., 20. Sep. 2018 um 14:54 Uhr schrieb Kevin Olbrich : > Hi! > > Currently I have a cluster with four hosts and 4x HDDs + 4 SSDs per host. > I also have replication rules to distinguish between

[ceph-users] Crush distribution with heterogeneous device classes and failure domain hosts

2018-09-20 Thread Kevin Olbrich
device-class based rule)? Will the crush weight be calculated from the OSDs up to the failure-domain based on the crush rule? The only crush-weights I know and see are those shown by "ceph osd tree". Kind regards Kevin ___ ceph-users mailing

Re: [ceph-users] Mimic upgrade failure

2018-09-19 Thread Kevin Hrpcek
luminous features yet have 13.2.1 installed, so maybe that is normal. Kevin On 09/19/2018 09:35 AM, Sage Weil wrote: It's hard to tell exactly from the below, but it looks to me like there is still a lot of OSDMap reencoding going on. Take a look at 'ceph features' output and see who

Re: [ceph-users] Mimic upgrade failure

2018-09-19 Thread KEVIN MICHAEL HRPCEK
se(429) = 0 munmap(0x7f2ea8c97000, 2468005) = 0 open("/var/lib/ceph/mon/ceph-sephmon1/store.db/26299338.sst", O_RDONLY) = 429 stat("/var/lib/ceph/mon/ceph-sephmon1/store.db/26299338.sst", {st_mode=S_IFREG|0644, st_size=2484001, ...}) = 0

Re: [ceph-users] Mimic upgrade failure

2018-09-18 Thread KEVIN MICHAEL HRPCEK
355) lease_timeout -- calling new election Thanks Kevin On 09/10/2018 07:06 AM, Sage Weil wrote: I took a look at the mon log you sent. A few things I noticed: - The frequent mon elections seem to get only 2/3 mons about half of the time. - The messages coming in a mostly osd_failure, and half of th

[ceph-users] (no subject)

2018-09-18 Thread Kevin Olbrich
Hi! is the compressible hint / incompressible hint supported on qemu+kvm? http://docs.ceph.com/docs/mimic/rados/configuration/bluestore-config-ref/ If not, only aggressive would work in this case for rbd, right? Kind regards Kevin ___ ceph-users

Re: [ceph-users] Mimic upgrade failure

2018-09-12 Thread Kevin Hrpcek
d authorizer 2018-09-10 03:30:17.324 7ff0ab678700 -1 osd.84 992286 heartbeat_check: no reply from 10.1.9.28:6843 osd.578 since back 2018-09-10 03:15:35.358240 front 2018-09-10 03:15:47.879015 (cutoff 2018-09-10 03:29:17.326329) Kevin On 09/10/2018 07:06 AM, Sage Weil wrote: I took a look at

[ceph-users] nfs-ganesha FSAL CephFS: nfs_health :DBUS :WARN :Health status is unhealthy

2018-09-10 Thread Kevin Olbrich
for this problem in 2.6.3: https://github.com/nfs-ganesha/nfs-ganesha/issues/339 Can the build in the repos be compiled against this bugfix release? Thank you very much. Kind regards Kevin ___ ceph-users mailing list ceph-users@lists.ceph.com http

Re: [ceph-users] Mimic upgrade failure

2018-09-10 Thread Kevin Hrpcek
the mix of luminous and mimic did not play well together for some reason. Maybe it has to do with the scale of my cluster, 871 osd, or maybe I've missed some some tuning as my cluster has scaled to this size. Kevin On 09/09/2018 12:49 PM, Kevin Hrpcek wrote: Nothing too crazy for non default

Re: [ceph-users] Mimic upgrade failure

2018-09-09 Thread Kevin Hrpcek
things are, setting pause on the cluster to just finish the upgrade faster might not be a bad idea either. This should be a simple question, have you confirmed that there are no networking problems between the MONs while the elections are happening? On Sat, Sep 8, 2018, 7:52 PM Kevin Hrpcek

Re: [ceph-users] Mimic upgrade failure

2018-09-08 Thread Kevin Hrpcek
are trying to fail each other. I'll put in the rocksdb_cache_size setting. Thanks for taking a look. Kevin On 09/08/2018 06:04 PM, Sage Weil wrote: Hi Kevin, I can't think of any major luminous->mimic changes off the top of my head that would impact CPU usage, but it's always possi

[ceph-users] Mimic upgrade failure

2018-09-08 Thread Kevin Hrpcek
90% good with the finish line in sight and then the mons started their issue of releecting every minute. Now I can't keep any decent amount of PGs up for more than a few hours. This started on Wednesday. Any help would be greatly appreciated. Thanks, Kevin --Debug snippet from a

[ceph-users] SPDK/DPDK with Intel P3700 NVMe pool

2018-08-30 Thread Kevin Olbrich
-usage I would like to re-use these cards for high-end (max IO) for database VMs. Some notes or feedback about the setup (ceph-volume etc.) would be appreciated. Thank you. Kind regards Kevin ___ ceph-users mailing list ceph-users@lists.ceph.com http

[ceph-users] HDD-only CephFS cluster with EC and without SSD/NVMe

2018-08-22 Thread Kevin Olbrich
but I did not test with ceph yet. Is anyone using CephFS + bluestore + ec 3/2 + without WAL/DB-dev and is it working well? Thank you. Kevin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Running 12.2.5 without problems, should I upgrade to 12.2.7 or wait for 12.2.8?

2018-08-10 Thread Kevin Olbrich
t planning any upgrade from 12.2.5 atm. Please correct me, if I am wrong. Kevin > Quote: > The v12.2.5 release has a potential data corruption issue with erasure > coded pools. If you ran v12.2.5 with erasure coding, please see below. > > See: https://ceph.com/releases/12-2-7-l

Re: [ceph-users] v12.2.7 Luminous released

2018-07-19 Thread Kevin Olbrich
Hi, on upgrade from 12.2.4 to 12.2.5 the balancer module broke (mgr crashes minutes after service started). Only solution was to disable the balancer (service is running fine since). Is this fixed in 12.2.7? I was unable to locate the bug in bugtracker. Kevin 2018-07-17 18:28 GMT+02:00

Re: [ceph-users] Periodically activating / peering on OSD add

2018-07-14 Thread Kevin Olbrich
PS: It's luminous 12.2.5! Mit freundlichen Grüßen / best regards, Kevin Olbrich. 2018-07-14 15:19 GMT+02:00 Kevin Olbrich : > Hi, > > why do I see activating followed by peering during OSD add (refill)? > I did not change pg(p)_num. > > Is this normal? From my other clust

[ceph-users] Periodically activating / peering on OSD add

2018-07-14 Thread Kevin Olbrich
Hi, why do I see activating followed by peering during OSD add (refill)? I did not change pg(p)_num. Is this normal? From my other clusters, I don't think that happend... Kevin ___ ceph-users mailing list ceph-users@lists.ceph.com http

Re: [ceph-users] Bluestore and number of devices

2018-07-13 Thread Kevin Olbrich
You can keep the same layout as before. Most place DB/WAL combined in one partition (similar to the journal on filestore). Kevin 2018-07-13 12:37 GMT+02:00 Robert Stanford : > > I'm using filestore now, with 4 data devices per journal device. > > I'm confused by this: "

[ceph-users] mds daemon damaged

2018-07-12 Thread Kevin
Sorry for the long posting but trying to cover everything I woke up to find my cephfs filesystem down. This was in the logs 2018-07-11 05:54:10.398171 osd.1 [ERR] 2.4 full-object read crc 0x6fc2f65a != expected 0x1c08241c on 2:292cf221:::200.:head I had one standby MDS, but as far as

Re: [ceph-users] PGs stuck peering (looping?) after upgrade to Luminous.

2018-07-11 Thread Kevin Olbrich
Sounds a little bit like the problem I had on OSDs: [ceph-users] Blocked requests activating+remapped after extending pg(p)_num <http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026680.html> *Kevin Olbrich* - [ceph-users] Blocked requests activating+remapped afterextendi

Re: [ceph-users] rbd lock remove unable to parse address

2018-07-10 Thread Kevin Olbrich
2018-07-10 14:37 GMT+02:00 Jason Dillaman : > On Tue, Jul 10, 2018 at 2:37 AM Kevin Olbrich wrote: > >> 2018-07-10 0:35 GMT+02:00 Jason Dillaman : >> >>> Is the link-local address of "fe80::219:99ff:fe9e:3a86%eth0" at least >>> present on the c

Re: [ceph-users] rbd lock remove unable to parse address

2018-07-10 Thread Kevin Olbrich
ocal when there is an ULA-prefix available. The address is available on brX on this client node. - Kevin > On Mon, Jul 9, 2018 at 3:43 PM Kevin Olbrich wrote: > >> 2018-07-09 21:25 GMT+02:00 Jason Dillaman : >> >>> BTW -- are you running Ceph on a one-node computer? I t

  1   2   3   >