Re: [ceph-users] RBD journal feature

2018-08-15 Thread Glen Baars
Is there any workaround that you can think of to correctly enable journaling on locked images? Kind regards, Glen Baars From: ceph-users On Behalf Of Glen Baars Sent: Tuesday, 14 August 2018 9:36 PM To: dilla...@redhat.com Cc: ceph-users Subject: Re: [ceph-users] RBD journal feature Hello Jaso

Re: [ceph-users] MDS stuck in 'rejoin' after network fragmentation caused OSD flapping

2018-08-15 Thread Jonathan Woytek
On Wed, Aug 15, 2018 at 11:02 PM Yan, Zheng wrote: > On Thu, Aug 16, 2018 at 10:55 AM Jonathan Woytek > wrote: > > > > ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic > (stable) > > > > > > Try deleting mds0_openfiles.0 (mds1_openfiles.0 and so on if you have > multiple acti

Re: [ceph-users] MDS stuck in 'rejoin' after network fragmentation caused OSD flapping

2018-08-15 Thread Yan, Zheng
On Thu, Aug 16, 2018 at 10:55 AM Jonathan Woytek wrote: > > ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable) > > Try deleting mds0_openfiles.0 (mds1_openfiles.0 and so on if you have multiple active mds) from metadata pool of your filesystem. Records in these files a

Re: [ceph-users] MDS stuck in 'rejoin' after network fragmentation caused OSD flapping

2018-08-15 Thread Jonathan Woytek
ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable) On Wed, Aug 15, 2018 at 10:51 PM, Yan, Zheng wrote: > On Thu, Aug 16, 2018 at 10:50 AM Jonathan Woytek wrote: >> >> Actually, I missed it--I do see the wipe start, wipe done in the log. >> However, it is still doing v

Re: [ceph-users] MDS stuck in 'rejoin' after network fragmentation caused OSD flapping

2018-08-15 Thread Yan, Zheng
On Thu, Aug 16, 2018 at 10:50 AM Jonathan Woytek wrote: > > Actually, I missed it--I do see the wipe start, wipe done in the log. > However, it is still doing verify_diri_backtrace, as described > previously. > which version of mds do you use? > jonathan > > On Wed, Aug 15, 2018 at 10:42 PM, Jon

Re: [ceph-users] MDS stuck in 'rejoin' after network fragmentation caused OSD flapping

2018-08-15 Thread Jonathan Woytek
Actually, I missed it--I do see the wipe start, wipe done in the log. However, it is still doing verify_diri_backtrace, as described previously. jonathan On Wed, Aug 15, 2018 at 10:42 PM, Jonathan Woytek wrote: > On Wed, Aug 15, 2018 at 9:40 PM, Yan, Zheng wrote: >> How many client reconnected

Re: [ceph-users] MDS stuck in 'rejoin' after network fragmentation caused OSD flapping

2018-08-15 Thread Jonathan Woytek
On Wed, Aug 15, 2018 at 9:40 PM, Yan, Zheng wrote: > How many client reconnected when mds restarts? The issue is likely > because reconnected clients held two many inodes, mds was opening > these inodes in rejoin state. Try starting mds with option > mds_wipe_sessions = true. The option makes m

Re: [ceph-users] MDS stuck in 'rejoin' after network fragmentation caused OSD flapping

2018-08-15 Thread Yan, Zheng
On Wed, Aug 15, 2018 at 11:44 PM Jonathan Woytek wrote: > > Hi list people. I was asking a few of these questions in IRC, too, but > figured maybe a wider audience could see something that I'm missing. > > I'm running a four-node cluster with cephfs and the kernel-mode driver as the > primary ac

Re: [ceph-users] FreeBSD rc.d script: sta.rt not found

2018-08-15 Thread Willem Jan Withagen
On 15/08/2018 19:46, Norman Gray wrote: Greetings. I'm having difficulty starting up the ceph monitor on FreeBSD.  The rc.d/ceph script appears to be doing something ... odd. I'm following the instructions on . I've configure

Re: [ceph-users] Clock skew

2018-08-15 Thread Sean Crosby
Hi Dominique, The clock skew warning shows up when your NTP daemon is not synced. You can see the sync in the output of ntpq -p This is a synced NTP # ntpq -p remote refid st t when poll reach delay offset jitter ==

[ceph-users] FreeBSD rc.d script: sta.rt not found

2018-08-15 Thread Norman Gray
Greetings. I'm having difficulty starting up the ceph monitor on FreeBSD. The rc.d/ceph script appears to be doing something ... odd. I'm following the instructions on . I've configured a monitor called mon.pochhammer When

Re: [ceph-users] BlueStore wal vs. db size

2018-08-15 Thread Wido den Hollander
On 08/15/2018 06:15 PM, Robert Stanford wrote: > >  The workload is relatively high read/write of objects through radosgw.  > Gbps+ in both directions.  The OSDs are spinning disks, the journals (up > until now filestore) are on SSDs.  Four OSDs / journal disk. > RGW isn't always a heavy enoug

Re: [ceph-users] BlueStore wal vs. db size

2018-08-15 Thread Robert Stanford
The workload is relatively high read/write of objects through radosgw. Gbps+ in both directions. The OSDs are spinning disks, the journals (up until now filestore) are on SSDs. Four OSDs / journal disk. On Wed, Aug 15, 2018 at 10:58 AM, Wido den Hollander wrote: > > > On 08/15/2018 05:57 PM,

Re: [ceph-users] BlueStore wal vs. db size

2018-08-15 Thread Wido den Hollander
On 08/15/2018 05:57 PM, Robert Stanford wrote: > >  Thank you Wido.  I don't want to make any assumptions so let me verify, > that's 10GB of DB per 1TB storage on that OSD alone, right?  So if I > have 4 OSDs sharing the same SSD journal, each 1TB, there are 4 10 GB DB > partitions for each? >

Re: [ceph-users] BlueStore wal vs. db size

2018-08-15 Thread Robert Stanford
Thank you Wido. I don't want to make any assumptions so let me verify, that's 10GB of DB per 1TB storage on that OSD alone, right? So if I have 4 OSDs sharing the same SSD journal, each 1TB, there are 4 10 GB DB partitions for each? On Wed, Aug 15, 2018 at 1:59 AM, Wido den Hollander wrote: >

[ceph-users] MDS stuck in 'rejoin' after network fragmentation caused OSD flapping

2018-08-15 Thread Jonathan Woytek
Hi list people. I was asking a few of these questions in IRC, too, but figured maybe a wider audience could see something that I'm missing. I'm running a four-node cluster with cephfs and the kernel-mode driver as the primary access method. Each node has 72 * 10TB OSDs, for a total of 288 OSDs. Ea

Re: [ceph-users] Clock skew

2018-08-15 Thread Brent Kennedy
For clock skew, I setup NTPD on one of the monitors with a public time server to pull from. Then I setup NTPD on all the servers with them pulling time only from the local monitor server. Restart the time service on each server until they get relatively close. If you have a time server setup

[ceph-users] cephfs fuse versus kernel performance

2018-08-15 Thread Chad William Seys
Hi all, Anyone know of benchmarks of cephfs through fuse versus kernel? Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] limited disk slots - should I ran OS on SD card ?

2018-08-15 Thread Götz Reinicke
Hi, > Am 15.08.2018 um 15:11 schrieb Steven Vacaroaia : > > Thank you all > > Since all concerns were about reliability I am assuming performance impact > of having OS running on SD card is minimal / negligible some time ago we had a some Cisco Blades booting VMware esxi from SD cards and

[ceph-users] upgraded centos7 (not collectd nor ceph) now json failed error

2018-08-15 Thread Marc Roos
I upgraded centos7, not ceph nor collectd. Ceph was already 12.2.7 and collectd was already 5.8.0-2 (and collectd-ceph-5.8.0-2) Now I have this error: Aug 14 22:43:34 c01 collectd[285425]: ceph plugin: ds FinisherPurgeQueue.queueLen was not properly initialized. Aug 14 22:43:34 c01 collectd[

Re: [ceph-users] limited disk slots - should I ran OS on SD card ?

2018-08-15 Thread Steven Vacaroaia
Thank you all Since all concerns were about reliability I am assuming performance impact of having OS running on SD card is minimal / negligible In other words, an OSD server is not writing/reading from Linux OS partitions too much ( especially with logs at minimum ) so its performance is not de

Re: [ceph-users] Ceph upgrade Jewel to Luminous

2018-08-15 Thread Jaime Ibar
Hi Tom, thanks for the info. That's what I thought but I asked just in case as breaking the entire cluster would be very bad news. Thanks again. Jaime On 14/08/18 20:18, Thomas White wrote: Hi Jaime, Upgrading directly should not be a problem. It is usually recommended to go to the la

Re: [ceph-users] Ceph logging into graylog

2018-08-15 Thread Roman Steinhart
Hi, thanks for your reply. May I ask which type of input do you use in graylog? "GELF UDP" or another one? And which version of graylog/ceph do you use? Thanks, Roman On Aug 9 2018, at 7:47 pm, Rudenko Aleksandr wrote: > > Hi, > > All our settings for this: > > mon cluster log to graylog = true

[ceph-users] Segmentation fault in Ceph-mon

2018-08-15 Thread Arif A.
Dear all, I am facing problem in deploying ceph-mon (Segmentation fault). I am deploying Ceph on single-board Raspberry pi 3, Hyperiot debian 8.0 jessie OS. I downloaded ceph packages from the following repository : deb http://mirrordirector.raspbian.org/raspbian/ testing main contrib non-free rpi

[ceph-users] Clock skew

2018-08-15 Thread Dominque Roux
Hi all, We recently facing clock skews from time to time. This means that sometimes everything is fine but hours later the warning appears again. NTPD is running and configured with the same pool. Did someone else already had the same issue and could probably help us to fix this? Thanks a lot!

Re: [ceph-users] Help needed for debugging slow_requests

2018-08-15 Thread Konstantin Shalygin
Now here's the thing: Some weeks ago Proxmox upgraded from kernel 4.13 to 4.15. Since then I'm getting slow requests that cause blocked IO inside the VMs that are running on the cluster (but not necessarily on the host with the OSD causing the slow request). If I boot back into 4.13 then Ceph

Re: [ceph-users] Enable daemonperf - no stats selected by filters

2018-08-15 Thread Marc Roos
This is working again, after I upgraded centos7. Is it not the idea of 12.2.x releases that are 'somewhat' compatible with the running os? Add some rpm dependancy or even better make sure it just works with the 'older' rpms [@c01 ~]# ceph daemonperf mds.a ---mds -

Re: [ceph-users] limited disk slots - should I ran OS on SD card ?

2018-08-15 Thread Janne Johansson
Den ons 15 aug. 2018 kl 10:04 skrev Wido den Hollander : > > This is the case for filesystem journals (xfs, ext4, almost all modern > > filesystems). Been there, done that, had two storage systems failing due > > to SD wear > > > > I've been running OS on the SuperMicro 64 and 128GB SATA-DOMs

Re: [ceph-users] limited disk slots - should I ran OS on SD card ?

2018-08-15 Thread Wido den Hollander
On 08/14/2018 09:12 AM, Burkhard Linke wrote: > Hi, > > > AFAIk SD cards (and SATA DOMs) do not have any kind of wear-leveling > support. Even if the crappy write endurance of these storage systems > would be enough to operate a server for several years on average, you > will always have some

Re: [ceph-users] limited disk slots - should I ran OS on SD card ?

2018-08-15 Thread Vladimir Prokofev
I'm running small CEPH cluster of 9 OSD nodes, with systems hosted on USB sticks exactly for the same reason - not enough disk slots. Works fine for almost 2 years now. 2018-08-15 1:13 GMT+03:00 Paul Emmerich : > I've seen the OS running on SATA DOMs and cheap USB sticks. > It works well for some