Re: [ceph-users] [External Email] RE: Beginner questions

2020-01-16 Thread DHilsbos
Paul; So is the 3/30/300GB a limit of RocksDB, or of Bluestore? The percentages you list, are they used DB / used data? If so... Where do you get the used DB data from? Thank you, Dominic L. Hilsbos, MBA Director – Information Technology Perform Air International Inc.

Re: [ceph-users] [External Email] RE: Beginner questions

2020-01-16 Thread DHilsbos
Dave; I don't like reading inline responses, so... I have zero experience with EC pools, so I won't pretend to give advice in that area. I would think that small NVMe for DB would be better than nothing, but I don't know. Once I got the hang of building clusters, it was relatively easy to

Re: [ceph-users] Beginner questions

2020-01-16 Thread DHilsbos
Dave; I'd like to expand on this answer, briefly... The information in the docs is wrong. There have been many discussions about changing it, but no good alternative has been suggested, thus it hasn't been changed. The 3rd party project that Ceph's BlueStore uses for its database (RocksDB),

Re: [ceph-users] Separate disk sets for high IO?

2019-12-16 Thread DHilsbos
Philip; Ah, ok. I suspect that isn't documented because the developers don't want average users doing it. It's also possible that it won't work as expected, as there is discussion on the web of device classes being changed at startup of the OSD daemon. That said... "ceph osd crush class

Re: [ceph-users] Separate disk sets for high IO?

2019-12-16 Thread DHilsbos
Philip; There's isn't any documentation that shows specifically how to do that, though the below comes close. Here's the documentation, for Nautilus, on CRUSH operations: https://docs.ceph.com/docs/nautilus/rados/operations/crush-map/ About a third of the way down the page is a discussion of

Re: [ceph-users] HEALTH_WARN 1 MDSs report oversized cache

2019-12-05 Thread DHilsbos
Patrick; I agree with Ranjan, though not in the particulars. The issue is that "oversized" is ambiguous, though undersized is also ambiguous. I personally prefer unambiguous error messages which also suggest solutions, like: "1 MDSs reporting cache exceeds 'mds cache memory limit,' of: ." My

Re: [ceph-users] Large OMAP Object

2019-11-20 Thread DHilsbos
All; Since I haven't heard otherwise, I have to assume that the only way to get this to go away is to dump the contents of the RGW bucket(s), and recreate it (them)? How did this get past release approval? A change which makes a valid cluster state in-valid, with no mitigation other than

Re: [ceph-users] Large OMAP Object

2019-11-15 Thread DHilsbos
Wido; Ok, yes, I have tracked it down to the index for one of our buckets. I missed the ID in the ceph df output previously. Next time I'll wait to read replies until I've finished my morning coffee. How would I go about correcting this? The content for this bucket is basically just junk,

Re: [ceph-users] Large OMAP Object

2019-11-15 Thread DHilsbos
Paul; I upgraded the cluster in question from 14.2.2 to 14.2.4 just before this came up, so that makes sense. Thank you, Dominic L. Hilsbos, MBA Director – Information Technology Perform Air International Inc. dhils...@performair.com www.PerformAir.com -Original Message- From:

Re: [ceph-users] Large OMAP Object

2019-11-15 Thread DHilsbos
All; Thank you for your help so far. I have found the log entries from when the object was found, but don't see a reference to the pool. Here the logs: 2019-11-14 03:10:16.508601 osd.1 (osd.1) 21 : cluster [DBG] 56.7 deep-scrub starts 2019-11-14 03:10:18.325881 osd.1 (osd.1) 22 : cluster

[ceph-users] Large OMAP Object

2019-11-14 Thread DHilsbos
All; We had a warning about a large OMAP object pop up in one of our clusters overnight. The cluster is configured for CephFS, but nothing mounts a CephFS, at this time. The cluster mostly uses RGW. I've checked the cluster log, the MON log, and the MGR log on one of the mons, with no

Re: [ceph-users] rgw: multisite support

2019-10-04 Thread DHilsbos
Swami; For 12.2.11 (Luminous), the previously linked document would be: https://docs.ceph.com/docs/luminous/radosgw/multisite/#migrating-a-single-site-system-to-multi-site Thank you, Dominic L. Hilsbos, MBA Director – Information Technology Perform Air International Inc.

[ceph-users] Mutliple CephFS Filesystems Nautilus (14.2.2)

2019-08-21 Thread DHilsbos
All; How experimental is the multiple CephFS filesystems per cluster feature? We plan to use different sets of pools (meta / data) per filesystem. Are there any known issues? While we're on the subject, is it possible to assign a different active MDS to each filesystem? Thank you, Dominic

Re: [ceph-users] New Cluster Failing to Start (Resolved)

2019-08-14 Thread DHilsbos
All; We found the problem, we had the v2 ports incorrect in the monmap. Thank you, Dominic L. Hilsbos, MBA Director - Information Technology Perform Air International Inc. dhils...@performair.com www.PerformAir.com -Original Message- From: ceph-users

[ceph-users] New Cluster Failing to Start

2019-08-14 Thread DHilsbos
All; We're working to deploy our first production Ceph cluster, and we've run into a snag. The MONs start, but the "cluster" doesn't appear to come up. Ceph -s never returns. These are the last lines in the event log of one of the mons: 2019-08-13 16:20:03.706 7f668108f180 0 starting

Re: [ceph-users] WAL/DB size

2019-08-13 Thread DHilsbos
Wildo / Hemant; Current recommendations (since at least luminous) say that a block.db device should be at least 4% of the block device. For a 6 TB drive, this would be 240 GB, not 60 GB. Thank you, Dominic L. Hilsbos, MBA Director – Information Technology Perform Air International Inc.

Re: [ceph-users] More than 100% in a dashboard PG Status

2019-08-13 Thread DHilsbos
All; I also noticed this behavior. It may have started after inducing a failure in the cluster in order to observe the self-healing behavior. In the "PG Status" section of the dashboard, I have "Clean (200%)." This has not seemed to affect the functioning of the cluster. Cluster is a new

Re: [ceph-users] Error Mounting CephFS

2019-08-07 Thread DHilsbos
JC; Excellent, thank you! I apologize, normally I'm better about RTFM... Thank you, Dominic L. Hilsbos, MBA Director – Information Technology Perform Air International Inc. dhils...@performair.com www.PerformAir.com From: JC Lopez [mailto:jelo...@redhat.com] Sent: Wednesday, August 07,

Re: [ceph-users] Error Mounting CephFS

2019-08-07 Thread DHilsbos
All; Thank you for your assistance, this led me to the fact that I hadn't set up the Ceph repo on this client server, and the ceph-common I had installed was version 10. I got all of that squared away, and it all works. I do have a couple follow up questions: Can more than one system mount

[ceph-users] RadosGW (Ceph Object Gateay) Pools

2019-08-06 Thread DHilsbos
All; Based on the PG Calculator, on the Ceph website, I have this list of pools to pre-create for my Object Gateway: .rgw.root default.rgw.control default.rgw.data.root default.rgw.gc default.rgw.log default.rgw.intent-log default.rgw.meta default.rgw.usage default.rgw.users.keys

[ceph-users] Error Mounting CephFS

2019-08-06 Thread DHilsbos
All; I have a server running CentOS 7.6 (1810), that I want to set up with CephFS (full disclosure, I'm going to be running samba on the CephFS). I can mount the CephFS fine when I use the option secret=, but when I switch to secretfile=, I get an error "No such process." I installed

Re: [ceph-users] even number of monitors

2019-08-05 Thread DHilsbos
All; While most discussion of MONs, and their failure modes revolves around the failure of the MONs themselves, the recommendation for od numbers of MONs has nothing to do with the loss of one or more MONs. It's actually in response to the split brain problem. Imagine you have the following

Re: [ceph-users] [Disarmed] Re: ceph-ansible firewalld blocking ceph comms

2019-07-25 Thread DHilsbos
Nathan; I'm not an expert on firewalld, but shouldn't you have a list of open ports? ports: ? Here's the configuration on my test cluster: public (active) target: default icmp-block-inversion: no interfaces: bond0 sources: services: ssh dhcpv6-client ports: 6789/tcp 3300/tcp

Re: [ceph-users] how to power off a cephfs cluster cleanly

2019-07-25 Thread DHilsbos
Dan; I don't have a lot of experience with Ceph, but I generally set all of the following before taking a cluster offline: ceph osd set noout ceph osd set nobackfill ceph osd set norecover ceph osd set norebalance ceph osd set nodown ceph osd set pause I then unset them in the opposite order:

[ceph-users] Kernel, Distro & Ceph

2019-07-24 Thread DHilsbos
All; There's been a lot of discussion of various kernel versions on this list lately, so I thought I'd seek some clarification. I prefer to run CentOS, and I prefer to keep the number of "extra" repositories to a minimum. Ceph requires adding a Ceph repo, and the EPEL repo. Updating the

[ceph-users] MON / MDS Storage Location

2019-07-22 Thread DHilsbos
All; Where, in the filesystem, do MONs and MDSs store their data? Thank you, Dominic L. Hilsbos, MBA Director - Information Technology Perform Air International Inc. dhils...@performair.com www.PerformAir.com ___ ceph-users mailing list

[ceph-users] MON DNS Lookup & Version 2 Protocol

2019-07-17 Thread DHilsbos
All; I'm trying to firm up my understanding of how Ceph works, and ease of management tools and capabilities. I stumbled upon this: http://docs.ceph.com/docs/nautilus/rados/configuration/mon-lookup-dns/ It got me wondering; how do you convey protocol version 2 capabilities in this format?

Re: [ceph-users] Natlius, RBD-Mirroring & Cluster Names

2019-07-15 Thread DHilsbos
Paul; If I understand you correctly: I will have 2 clusters, each named "ceph" (internally). As such, each will have a configuration file at: /etc/ceph/ceph.conf I would copy the other clusters configuration file to something like: /etc/ceph/remote.conf Then the commands (run on the

[ceph-users] Natlius, RBD-Mirroring & Cluster Names

2019-07-15 Thread DHilsbos
All; I'm digging deeper into the capabilities of Ceph, and I ran across this: http://docs.ceph.com/docs/nautilus/rbd/rbd-mirroring/ Which seems really interesting, except... This feature seems to require custom cluster naming to function, which is deprecated in Nautilus, and not all commands

Re: [ceph-users] RADOSGW S3 - Continuation Token Ignored?

2019-06-28 Thread DHilsbos
Matt; Yep, that would certainly explain it. My apologies, I almost searched for that information before sending the email. Thank you, Dominic L. Hilsbos, MBA Director – Information Technology Perform Air International Inc. dhils...@performair.com www.PerformAir.com -Original

[ceph-users] RADOSGW S3 - Continuation Token Ignored?

2019-06-28 Thread DHilsbos
All; I've got a RADOSGW instance setup, backed by my demonstration Ceph cluster. I'm using Amazon's S3 SDK, and I've run into an annoying little snag. My code looks like this: amazonS3 = builder.build(); ListObjectsV2Request req = new

Re: [ceph-users] MGR Logs after Failure Testing

2019-06-27 Thread DHilsbos
Eugen; All services are running, yes, though they didn't all start when I brought the host up (configured not to start because the last thing I had done is physically relocate the entire cluster). All services are running, and happy. # ceph status cluster: id:

[ceph-users] MGR Logs after Failure Testing

2019-06-27 Thread DHilsbos
All; I built a demonstration and testing cluster, just 3 hosts (10.0.200.110, 111, 112). Each host runs mon, mgr, osd, mds. During the demonstration yesterday, I pulled the power on one of the hosts. After bringing the host back up, I'm getting several error messages every second or so:

Re: [ceph-users] Nautilus HEALTH_WARN for msgr2 protocol

2019-06-14 Thread DHilsbos
Bob; Have you verified that port 3300 is open for TCP on that host? The extra host firewall rules for v2 protocol caused me all kinds of grief when I was setting up my MONs. Thank you, Dominic L. Hilsbos, MBA Director – Information Technology Perform Air International Inc.

[ceph-users] Ceph Cluster Replication / Disaster Recovery

2019-06-12 Thread DHilsbos
All; I'm testing and evaluating Ceph for the next generation of storage architecture for our company, and so far I'm fairly impressed, but I've got a couple of questions around cluster replication and disaster recovery. First; intended uses. Ceph Object Gateway will be used to support new

Re: [ceph-users] radosgw dying

2019-06-09 Thread DHilsbos
All; Thank you to all who assisted, this was the problem! My default PG/pool was too high for my total OSD count, and it was unable to create all of these pools. I remove the other pools I had created, and reduced the default PGs / pool, and radosgw was able to create all of its default

Re: [ceph-users] radosgw dying

2019-06-09 Thread DHilsbos
Certainly. Output of ceph osd df: ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATAOMAP META AVAIL %USE VAR PGS STATUS 2 hdd 11.02950 1.0 11 TiB 120 GiB 51 MiB 0 B 1 GiB 11 TiB 1.07 1.00 227 up 3 hdd 11.02950 1.0 11 TiB 120 GiB 51 MiB 0 B 1 GiB 11 TiB 1.07 1.00

Re: [ceph-users] radosgw dying

2019-06-09 Thread DHilsbos
Huan; I get that, but the pool already exists, why is radosgw trying to create one? Dominic Hilsbos Get Outlook for Android On Sat, Jun 8, 2019 at 2:55 AM -0700, "huang jun" mailto:hjwsm1...@gmail.com>> wrote: >From the error message, i'm decline to that

[ceph-users] radosgw dying

2019-06-07 Thread DHilsbos
All; I have a test and demonstration cluster running (3 hosts, MON, MGR, 2x OSD per host), and I'm trying to add a 4th host for gateway purposes. The radosgw process keeps dying with: 2019-06-07 15:59:50.700 7fc4ef273780 0 ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972)

[ceph-users] ceph-volume ignores cluster name?

2019-05-13 Thread DHilsbos
All; I'm working on spinning up a demonstration cluster using ceph, and yes, I'm installing it manually, for the purpose of learning. I can't seem to correctly create an OSD, as ceph-volume seems to only work if the cluster name is the default. If I rename my configuration file (at

Re: [ceph-users] Bluestore Hardwaresetup

2018-02-15 Thread DHilsbos
Peter; I was just looking at this myself. With regards to BlueStore, the Config Reference is useful: http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/ As far as tiering goes, the OSD Config Reference talks about it:

Re: [ceph-users] Shutting down half / full cluster

2018-02-14 Thread DHilsbos
All; This might be a noob type question, but this thread is interesting, and there's one thing I would like clarified. David Turner mentions setting 3 flags on OSDs, Götz has mentioned 5 flags, do the commands need to be run on all OSD nodes, or just one? Thank you, Dominic L. Hilsbos, MBA

Re: [ceph-users] BlueStore & Journal

2018-02-14 Thread DHilsbos
David; Thank you for responding so quickly. I believe I've been looking at Master. I found the information on BlueStore five or ten minutes after I sent the email, but I appreciate the summary. Thank you, Dominic L. Hilsbos, MBA Director – Information Technology Perform Air International

[ceph-users] BlueStore & Journal

2018-02-13 Thread DHilsbos
All; I'm sorry if this question has been asked before. I'm reading through Ceph's documentation in preparation to build a cluster, and O keep coming across the recommendation to place journals on SSDs. Does BlueStore uses journals, or was this a nod to improving XFS and BTRFS performance?