Re: [ceph-users] How do you deal with "clock skew detected"?

2019-05-15 Thread Richard Hesketh
Another option would be adding a boot time script which uses ntpdate (or something) to force an immediate sync with your timeservers before ntpd starts - this is actually suggested in ntpdate's man page! Rich On 15/05/2019 13:00, Marco Stuurman wrote: > Hi Yenya, > > You could try to

Re: [ceph-users] Should ceph build against libcurl4 for Ubuntu 18.04 and later?

2018-11-22 Thread Richard Hesketh
Bionic's mimic packages do seem to depend on libcurl4 already, for what that's worth: root@vm-gw-1:/# apt-cache depends ceph-common ceph-common ... Depends: libcurl4 On 22/11/2018 12:40, Matthew Vernon wrote: > Hi, > > The ceph.com ceph luminous packages for Ubuntu Bionic still depend on >

Re: [ceph-users] WAL/DB size

2018-09-07 Thread Richard Hesketh
It can get confusing. There will always be a WAL, and there will always be a metadata DB, for a bluestore OSD. However, if a separate device is not specified for the WAL, it is kept in the same device/partition as the DB; in the same way, if a separate device is not specified for the DB, it is

Re: [ceph-users] BlueStore performance: SSD vs on the same spinning disk

2018-08-07 Thread Richard Hesketh
n 07/08/18 17:10, Robert Stanford wrote: > >  I was surprised to see an email on this list a couple of days ago, > which said that write performance would actually fall with BlueStore.  I > thought the reason BlueStore existed was to increase performance.  > Nevertheless, it seems like filestore

Re: [ceph-users] Best way to replace OSD

2018-08-06 Thread Richard Hesketh
; replaced drive, and doesn’t move data around. > That script is specific for filestore to bluestore somewhat, as the > flush-journal command is no longer used in bluestore. > > Hope thats helpful. > > Reed > >> On Aug 6, 2018, at 9:30 AM, Richard Hesketh >>

Re: [ceph-users] Best way to replace OSD

2018-08-06 Thread Richard Hesketh
Waiting for rebalancing is considered the safest way, since it ensures you retain your normal full number of replicas at all times. If you take the disk out before rebalancing is complete, you will be causing some PGs to lose a replica. That is a risk to your data redundancy, but it might be an

Re: [ceph-users] Bluestore : Where is my WAL device ?

2018-06-05 Thread Richard Hesketh
On 05/06/18 14:49, rafael.diazmau...@univ-rennes1.fr wrote: > Hello, > > I run proxmox 5.2 with ceph 12.2 (bluestore). > > I've created an OSD on a Hard Drive (/dev/sda) and tried to put both WAL and > Journal on a SSD part (/dev/sde1) like this : > pveceph createosd /dev/sda --wal_dev

Re: [ceph-users] Ceph Jewel and Ubuntu 16.04

2018-04-17 Thread Richard Hesketh
a just fine and everything did > startup in a good state post OS reinstall.  > > Thanks again for your help on this issue. > > Shain > > > On 04/17/2018 06:00 AM, Richard Hesketh wrote: >> On 16/04/18 18:32, Shain Miley wrote: >>> Hello,

Re: [ceph-users] Ceph Jewel and Ubuntu 16.04

2018-04-17 Thread Richard Hesketh
On 16/04/18 18:32, Shain Miley wrote: > Hello, > > We are currently running Ceph Jewel (10.2.10) on Ubuntu 14.04 in production.  > We have been running into a kernel panic bug off an on for a while and I am > starting to look into upgrading as a possible solution.  We are currently > running

Re: [ceph-users] Fwd: Separate --block.wal --block.db bluestore not working as expected.

2018-04-10 Thread Richard Hesketh
No, you shouldn't invoke it that way, you should just not specify a WAL device at all if you want it to be stored with the DB - if not otherwise specified the WAL is automatically stored with the other metadata on the DB device. You should do something like: ceph-volume lvm prepare --bluestore

Re: [ceph-users] split brain case

2018-03-29 Thread Richard Hesketh
On 29/03/18 09:25, ST Wong (ITSC) wrote: > Hi all, > > We put 8 (4+4) OSD and 5 (2+3) MON servers in server rooms in 2 buildings for > redundancy.  The buildings are connected through direct connection. > > While servers in each building have alternate uplinks.   What will happen in > case the

Re: [ceph-users] Bluestore bluestore_prefer_deferred_size and WAL size

2018-03-09 Thread Richard Hesketh
I am also curious about this, in light of the reported performance regression switching from Filestore to Bluestore (when using SSDs for journalling/metadata db). I didn't get any responses when I asked, though. The major consideration that seems obvious is that this potentially hugely

Re: [ceph-users] RFC Bluestore-Cluster of SAMSUNG PM863a

2018-02-02 Thread Richard Hesketh
On 02/02/18 08:33, Kevin Olbrich wrote: > Hi! > > I am planning a new Flash-based cluster. In the past we used SAMSUNG PM863a > 480G as journal drives in our HDD cluster. > After a lot of tests with luminous and bluestore on HDD clusters, we plan to > re-deploy our whole RBD pool (OpenNebula

[ceph-users] WAL size constraints, bluestore_prefer_deferred_size

2018-01-08 Thread Richard Hesketh
I recently came across the bluestore_prefer_deferred_size family of config options, for controlling the upper size threshold on deferred writes. Given a number of users suggesting that write performance in filestore is better than write performance in bluestore - because filestore writing to an

Re: [ceph-users] Different Ceph versions on OSD/MONs and Clients?

2018-01-05 Thread Richard Hesketh
Whoops meant to reply to-list Forwarded Message Subject: Re: [ceph-users] Different Ceph versions on OSD/MONs and Clients? Date: Fri, 5 Jan 2018 15:10:56 + From: Richard Hesketh <richard.hesk...@rd.bbc.co.uk> To: Götz Reinicke <goetz.reini...@filmakademie.de> On

Re: [ceph-users] question on rbd resize

2018-01-03 Thread Richard Hesketh
No, most filesystems can be expanded pretty trivially (shrinking is a more complex operation but usually also doable). Assuming the likely case of an ext2/3/4 filesystem, the command "resize2fs /dev/rbd0" should resize the FS to cover the available space in the block device. Rich On 03/01/18

Re: [ceph-users] in the same ceph cluster, why the object in the same osd some are 8M and some are 4M?

2018-01-02 Thread Richard Hesketh
On 02/01/18 02:36, linghucongsong wrote: > Hi, all! > > I just use ceph rbd for openstack. > > my ceph version is 10.2.7. > > I find a surprise thing that the object save in the osd , in some pgs the > objects are 8M, and in some pgs the objects are 4M, can someone tell me why?  > thanks! >

Re: [ceph-users] Slow backfilling with bluestore, ssd and metadata pools

2017-12-21 Thread Richard Hesketh
On 21/12/17 10:28, Burkhard Linke wrote: > OSD config section from ceph.conf: > > [osd] > osd_scrub_sleep = 0.05 > osd_journal_size = 10240 > osd_scrub_chunk_min = 1 > osd_scrub_chunk_max = 1 > max_pg_per_osd_hard_ratio = 4.0 > osd_max_pg_per_osd_hard_ratio = 4.0 > bluestore_cache_size_hdd =

Re: [ceph-users] Proper way of removing osds

2017-12-21 Thread Richard Hesketh
On 21/12/17 10:21, Konstantin Shalygin wrote: >> Is this the correct way to removes OSDs, or am I doing something wrong ? > Generic way for maintenance (e.g. disk replace) is rebalance by change osd > weight: > > > ceph osd crush reweight osdid 0 > > cluster migrate data "from this osd" > >

Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

2017-12-06 Thread Richard Hesketh
On 06/12/17 09:17, Caspar Smit wrote: > > 2017-12-05 18:39 GMT+01:00 Richard Hesketh <richard.hesk...@rd.bbc.co.uk > <mailto:richard.hesk...@rd.bbc.co.uk>>: > > On 05/12/17 17:10, Graham Allan wrote: > > On 12/05/2017 07:20 AM, Wido den Hollander wrote:

Re: [ceph-users] Luminous v12.2.2 released

2017-12-05 Thread Richard Hesketh
You are safe to upgrade packages just by doing an apt-get update; apt-get upgrade, and you will then want to restart your ceph daemons to bring them to the new version - though you should of course stagger your restarts of each type to ensure your mons remain quorate (don't restart more than

Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

2017-12-05 Thread Richard Hesketh
On 05/12/17 17:10, Graham Allan wrote: > On 12/05/2017 07:20 AM, Wido den Hollander wrote: >> Hi, >> >> I haven't tried this before but I expect it to work, but I wanted to >> check before proceeding. >> >> I have a Ceph cluster which is running with manually formatted >> FileStore XFS disks,

Re: [ceph-users] Adding multiple OSD

2017-12-05 Thread Richard Hesketh
On 05/12/17 09:20, Ronny Aasen wrote: > On 05. des. 2017 00:14, Karun Josy wrote: >> Thank you for detailed explanation! >> >> Got one another doubt, >> >> This is the total space available in the cluster : >> >> TOTAL : 23490G >> Use  : 10170G >> Avail : 13320G >> >> >> But ecpool shows max avail

Re: [ceph-users] Sharing Bluestore WAL

2017-11-24 Thread Richard Hesketh
On 23/11/17 17:19, meike.talb...@women-at-work.org wrote: > Hello, > > in our preset Ceph cluster we used to have 12 HDD OSDs per host. > All OSDs shared a common SSD for journaling. > The SSD was used as root device and the 12 journals were files in the > /usr/share directory, like this: > >

Re: [ceph-users] Journal / WAL drive size?

2017-11-23 Thread Richard Hesketh
On 23/11/17 16:13, Rudi Ahlers wrote: > Hi Caspar,  > > Thanx. I don't see any mention that it's a bad idea to have the WAL and DB on > the same SSD, but I guess it could improve performance? It's not that it's a bad idea to put WAL and DB on the same device - it's that if not otherwise

Re: [ceph-users] Journal / WAL drive size?

2017-11-23 Thread Richard Hesketh
gards > Rudi Ahlers > Website: http://www.rudiahlers.co.za > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > http://lists.ceph.com/listinfo.cgi/ceph-us

Re: [ceph-users] Separation of public/cluster networks

2017-11-15 Thread Richard Hesketh
On 15/11/17 12:58, Micha Krause wrote: > Hi, > > I've build a few clusters with separated public/cluster network, but I'm > wondering if this is really > the way to go. > > http://docs.ceph.com/docs/jewel/rados/configuration/network-config-ref > > states 2 reasons: > > 1. There is more

Re: [ceph-users] how to upgrade CEPH journal?

2017-11-09 Thread Richard Hesketh
gt; ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> > > > > > -- > Kind Regards > Rudi Ahlers >

Re: [ceph-users] bluestore - wal,db on faster devices?

2017-11-09 Thread Richard Hesketh
tting a performance gain from putting the journal >> >>> on a fast device (ssd,nvme) when using filestore backend. >> >>> it's not when it comes to bluestore - are there any resources, >> >>> performance test, etc. out there how a fast wal,db device impacts >>

Re: [ceph-users] Small cluster for VMs hosting

2017-11-07 Thread Richard Hesketh
On 07/11/17 13:16, Gandalf Corvotempesta wrote: > Hi to all > I've been far from ceph from a couple of years (CephFS was still unstable) > > I would like to test it again, some questions for a production cluster for > VMs hosting: > > 1. Is CephFS stable? Yes, CephFS is stable and safe (though

Re: [ceph-users] how does recovery work

2017-10-19 Thread Richard Hesketh
On 19/10/17 11:00, Dennis Benndorf wrote: > Hello @all, > > givin the following config: > > * ceph.conf: > > ... > mon osd down out subtree limit = host > osd_pool_default_size = 3 > osd_pool_default_min_size = 2 > ... > > * each OSD has its

Re: [ceph-users] Bluestore OSD_DATA, WAL & DB

2017-10-16 Thread Richard Hesketh
On 16/10/17 13:45, Wido den Hollander wrote: >> Op 26 september 2017 om 16:39 schreef Mark Nelson : >> On 09/26/2017 01:10 AM, Dietmar Rieder wrote: >>> thanks David, >>> >>> that's confirming what I was assuming. To bad that there is no >>> estimate/method to calculate the db

Re: [ceph-users] Backup VM (Base image + snapshot)

2017-10-16 Thread Richard Hesketh
On 16/10/17 03:40, Alex Gorbachev wrote: > On Sat, Oct 14, 2017 at 12:25 PM, Oscar Segarra > wrote: >> Hi, >> >> In my VDI environment I have configured the suggested ceph >> design/arquitecture: >> >> http://docs.ceph.com/docs/giant/rbd/rbd-snapshot/ >> >> Where I have

Re: [ceph-users] MGR Dahhsboard hostname missing

2017-10-12 Thread Richard Hesketh
On 12/10/17 17:15, Josy wrote: > Hello, > > After taking down couple of OSDs, the dashboard is not showing the > corresponding hostname. Ceph-mgr is known to have issues with associated services with hostnames sometimes, e.g. http://tracker.ceph.com/issues/20887 Fixes look to be incoming.

[ceph-users] "ceph osd status" fails

2017-10-06 Thread Richard Hesketh
When I try to run the command "ceph osd status" on my cluster, I just get an error. Luckily unlike the last issue I had with ceph fs commands it doesn't seem to be crashing any of the daemons. root@vm-ds-01:/var/log/ceph# ceph osd status Error EINVAL: Traceback (most recent call last): File

Re: [ceph-users] "ceph fs" commands hang forever and kill monitors

2017-09-28 Thread Richard Hesketh
On 27/09/17 19:35, John Spray wrote: > On Wed, Sep 27, 2017 at 1:18 PM, Richard Hesketh > <richard.hesk...@rd.bbc.co.uk> wrote: >> On 27/09/17 12:32, John Spray wrote: >>> On Wed, Sep 27, 2017 at 12:15 PM, Richard Hesketh >>> <richard.hesk...@rd.bbc.co.uk&

Re: [ceph-users] Different recovery times for OSDs joining and leaving the cluster

2017-09-27 Thread Richard Hesketh
users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi

Re: [ceph-users] "ceph fs" commands hang forever and kill monitors

2017-09-27 Thread Richard Hesketh
On 27/09/17 12:32, John Spray wrote: > On Wed, Sep 27, 2017 at 12:15 PM, Richard Hesketh > <richard.hesk...@rd.bbc.co.uk> wrote: >> As the subject says... any ceph fs administrative command I try to run hangs >> forever and kills monitors in the background -

[ceph-users] "ceph fs" commands hang forever and kill monitors

2017-09-27 Thread Richard Hesketh
As the subject says... any ceph fs administrative command I try to run hangs forever and kills monitors in the background - sometimes they come back, on a couple of occasions I had to manually stop/restart a suffering mon. Trying to load the filesystem tab in the ceph-mgr dashboard dumps an

Re: [ceph-users] Bluestore OSD_DATA, WAL & DB

2017-09-22 Thread Richard Hesketh
I asked the same question a couple of weeks ago. No response I got contradicted the documentation but nobody actively confirmed the documentation was correct on this subject, either; my end state was that I was relatively confident I wasn't making some horrible mistake by simply specifying a

Re: [ceph-users] Bluestore "separate" WAL and DB (and WAL/DB size?) [and recovery sleep]

2017-09-14 Thread Richard Hesketh
> activity for X seconds. Theoretically more advanced heuristics might cover > this, but in the interim it seems to me like this would solve the very > specific problem you are seeing while still throttling recovery when IO is > happening. > > Mark > > On 09/14/2017 06:19 AM, R

Re: [ceph-users] Bluestore "separate" WAL and DB (and WAL/DB size?) [and recovery sleep]

2017-09-14 Thread Richard Hesketh
/09/17 11:16, Richard Hesketh wrote: > Hi Mark, > > No, I wasn't familiar with that work. I am in fact comparing speed of > recovery to maintenance work I did while the cluster was in Jewel; I haven't > manually done anything to sleep settings, only adjusted max backfills OSD

Re: [ceph-users] Bluestore "separate" WAL and DB (and WAL/DB size?)

2017-09-12 Thread Richard Hesketh
http://ceph.com/planet/understanding-bluestore-cephs-new-storage-backend/ > > On Mon, Sep 11, 2017 at 8:45 PM, Richard Hesketh > <richard.hesk...@rd.bbc.co.uk> wrote: >> On 08/09/17 11:44, Richard Hesketh wrote: >>> Hi, >>> >>> Reading the ceph-users lis

Re: [ceph-users] Bluestore "separate" WAL and DB (and WAL/DB size?)

2017-09-11 Thread Richard Hesketh
On 08/09/17 11:44, Richard Hesketh wrote: > Hi, > > Reading the ceph-users list I'm obviously seeing a lot of people talking > about using bluestore now that Luminous has been released. I note that many > users seem to be under the impression that they need separat

[ceph-users] Bluestore "separate" WAL and DB

2017-09-08 Thread Richard Hesketh
n reusing the partitions I set up for journals on my SSDs as DB devices for Bluestore HDDs without specifying anything to do with the WAL, and I'd like to know sooner rather than later if I'm making some sort of horrible mistake. Rich -- Richard Hesketh signature.asc Desc

Re: [ceph-users] Ceph Maintenance

2017-08-01 Thread Richard Hesketh
On 01/08/17 12:41, Osama Hasebou wrote: > Hi, > > What would be the best possible and efficient way for big Ceph clusters when > maintenance needs to be performed ? > > Lets say that we have 3 copies of data, and one of the servers needs to be > maintained, and maintenance might take 1-2 days

Re: [ceph-users] Client behavior when adding and removing mons

2017-07-31 Thread Richard Hesketh
On 31/07/17 14:05, Edward R Huyer wrote: > I’m migrating my Ceph cluster to entirely new hardware. Part of that is > replacing the monitors. My plan is to add new monitors and remove old ones, > updating config files on client machines as I go. > > I have clients actively using the cluster.

Re: [ceph-users] best practices for expanding hammer cluster

2017-07-19 Thread Richard Hesketh
he recovery/refilling operation on your clients' data traffic? > What setting have you used to avoid slow requests? > > Kind regards, > Laszlo > > > On 19.07.2017 17:40, Richard Hesketh wrote: >> On 19/07/17 15:14, Laszlo Budai wrote: >>> Hi David, >>

Re: [ceph-users] best practices for expanding hammer cluster

2017-07-19 Thread Richard Hesketh
On 19/07/17 15:14, Laszlo Budai wrote: > Hi David, > > Thank you for that reference about CRUSH. It's a nice one. > There I could read about expanding the cluster, but in one of my cases we > want to do more: we want to move from host failure domain to chassis failure > domain. Our concern is:

Re: [ceph-users] missing feature 400000000000000 ?

2017-07-14 Thread Richard Hesketh
On 14/07/17 11:03, Ilya Dryomov wrote: > On Fri, Jul 14, 2017 at 11:29 AM, Riccardo Murri > wrote: >> Hello, >> >> I am trying to install a test CephFS "Luminous" system on Ubuntu 16.04. >> >> Everything looks fine, but the `mount.ceph` command fails (error 110, >>

Re: [ceph-users] Degraded objects while OSD is being added/filled

2017-07-12 Thread Richard Hesketh
On 11/07/17 20:05, Eino Tuominen wrote: > Hi Richard, > > Thanks for the explanation, that makes perfect sense. I've missed the > difference between ceph osd reweight and ceph osd crush reweight. I have to > study that better. > > Is there a way to get ceph to prioritise fixing degraded

Re: [ceph-users] Migrating RGW from FastCGI to Civetweb

2017-07-12 Thread Richard Hesketh
solve your problem. Rich On 12/07/17 10:40, Richard Hesketh wrote: > Best guess, apache is munging together everything it picks up using the > aliases and translating the host to the ServerName before passing on the > request. Try setting ProxyPreserveHost on as per > https://http

Re: [ceph-users] Migrating RGW from FastCGI to Civetweb

2017-07-12 Thread Richard Hesketh
Best guess, apache is munging together everything it picks up using the aliases and translating the host to the ServerName before passing on the request. Try setting ProxyPreserveHost on as per https://httpd.apache.org/docs/2.4/mod/mod_proxy.html#proxypreservehost ? Rich On 11/07/17 21:47,

Re: [ceph-users] Migrating RGW from FastCGI to Civetweb

2017-07-11 Thread Richard Hesketh
On 11/07/17 17:08, Roger Brown wrote: > What are some options for migrating from Apache/FastCGI to Civetweb for > RadosGW object gateway *without* breaking other websites on the domain? > > I found documention on how to migrate the object gateway to Civetweb >

Re: [ceph-users] Degraded objects while OSD is being added/filled

2017-07-11 Thread Richard Hesketh
First of all, your disk removal process needs tuning. "ceph osd out" sets the disk reweight to 0 but NOT the crush weight; this is why you're seeing misplaced objects after removing the osd, because the crush weights have changed (even though reweight meant that disk currently held no data).

[ceph-users] Prioritise recovery on specific PGs/OSDs?

2017-06-20 Thread Richard Hesketh
Is there a way, either by individual PG or by OSD, I can prioritise backfill/recovery on a set of PGs which are currently particularly important to me? For context, I am replacing disks in a 5-node Jewel cluster, on a node-by-node basis - mark out the OSDs on a node, wait for them to clear,

Re: [ceph-users] Reg: PG

2017-05-04 Thread Richard Hesketh
The extra pools are probably the data and metadata pools that are automatically created for cephfs. http://ceph.com/pgcalc/ is a useful tool for helping to work out how many PGs your pools should have. Rich On 04/05/17 15:41, David Turner wrote: > I'm guessing you have more than just the 1 pool

Re: [ceph-users] SSD Primary Affinity

2017-04-20 Thread Richard Hesketh
the pool had an SSD for a primary, so I think this is a reliable way of doing it. You would of course end up with an acting primary on one of the slow spinners for a brief period if you lost an SSD for whatever reason and it needed to rebalance. The only downside is that if you have your SSD and

Re: [ceph-users] SSD Primary Affinity

2017-04-19 Thread Richard Hesketh
the HDDs and SSDs into separate pools, and just using the SSD pool for VMs/datablocks which needed to be snappier. For most of my users it didn't matter that the backing pool was kind of slow, and only a few were wanting to do I/O intensive workloads where the speed was required, so putting so much

Re: [ceph-users] Hummer upgrade stuck all OSDs down

2017-04-12 Thread Richard Hesketh
On 12/04/17 09:47, Siniša Denić wrote: > Hi to all, my cluster stuck after upgrade from hammer 0.94.5 to luminous. > Iit seems somehow osds stuck at hammer version despite > > Can I somehow overcome this situation and what could happened during the > upgrade? > I performed upgrade from hammer

Re: [ceph-users] What's the actual justification for min_size?

2017-03-22 Thread Richard Hesketh
I definitely saw it on a Hammer cluster, though I decided to check my IRC logs for more context and found that in my specific cases it was due to PGs going incomplete. `ceph health detail` offered the following, for instance: pg 8.31f is remapped+incomplete, acting [39] (reducing pool one

[ceph-users] What's the actual justification for min_size? (was: Re: I/O hangs with 2 node failure even if one node isn't involved in I/O)

2017-03-21 Thread Richard Hesketh
On 21/03/17 17:48, Wes Dillingham wrote: > a min_size of 1 is dangerous though because it means you are 1 hard disk > failure away from losing the objects within that placement group entirely. a > min_size of 2 is generally considered the minimum you want but many people > ignore that advice,

Re: [ceph-users] Question regarding CRUSH algorithm

2017-02-17 Thread Richard Hesketh
On 16/02/17 20:44, girish kenkere wrote: > Thanks David, > > Its not quiet what i was looking for. Let me explain my question in more > detail - > > This is excerpt from Crush paper, this explains how crush algo running on > each client/osd maps pg to an osd during the write operation[lets