Dear all,
in real-world use, is there a significant performance
benefit in using 4kn instead of 512e HDDs (using
Ceph bluestore with block-db on NVMe-SSD)?
Cheers and thanks for any advice,
Oliver
___
ceph-users mailing list
ceph-users@lists.ceph.com
about 2x faster than the P3700 we had,
and allow us to get more out of our flash drives.
--------
*From:* Oliver Schulz
*Sent:* Wednesday, 18 July 2018 12:00:14 PM
*To:* Linh Vu; ceph-users
*Subject:* Re: [ceph-users] CephFS wi
It's about 2x faster than the P3700 we had,
and allow us to get more out of our flash drives.
----
*From:* Oliver Schulz
*Sent:* Wednesday, 18 July 2018 12:00:14 PM
*To:* Linh Vu; ceph-users
*Subject:* Re: [ceph-users] C
on't do any such
special allocation. 😊
Cheers,
Linh
--------
*From:* Oliver Schulz
*Sent:* Tuesday, 17 July 2018 11:39:26 PM
*To:* Linh Vu; ceph-users
*Subject:* Re: [ceph-users] CephFS with erasure coding, do I need a
cache-pool?
On 18.07.2018 00:43, Gregory Farnum wrote:
> But you could also do workaround like letting it choose (K+M)/2
racks
> and putting two shards in each rack.
Oh yes, you are more susceptible to top-of-rack switch failures in this
case or whatever. It's just one option — many people are
you more
throughput but increase latency especially for small files, so it also
depends on how important performance is and what kind of file size you
store on your CephFS.
Cheers,
Linh
*From:* ceph-users on behalf of
O
ind of file size you
store on your CephFS.
Cheers,
Linh
*From:* ceph-users on behalf of
Oliver Schulz
*Sent:* Sunday, 15 July 2018 9:46:16 PM
*To:* ceph-users
*Subject:* [ceph-users] CephFS with erasure coding, do
Hi Greg,
On 17.07.2018 03:01, Gregory Farnum wrote:
Since Luminous, you can use an erasure coded pool (on bluestore)
directly as a CephFS data pool, no cache pool needed.
More than that, we'd really prefer you didn't use cache pools for
anything. Just Say No. :)
Thanks for the confirm
Dear John,
On 16.07.2018 16:25, John Spray wrote:
Since Luminous, you can use an erasure coded pool (on bluestore)
directly as a CephFS data pool, no cache pool needed.
Great! I'll be happy to go without
a cache pool then.
Thanks for your help, John,
Oliver
__
Dear all,
we're planning a new Ceph-Clusterm, with CephFS as the
main workload, and would like to use erasure coding to
use the disks more efficiently. Access pattern will
probably be more read- than write-heavy, on average.
I don't have any practical experience with erasure-
coded pools so far.
Hi Konstantin,
thanks! "set-all-straw-buckets-to-straw2" was what I was
looking for. Didn't see it in the docs. Thanks again!
Cheers,
Oliver
On 23.06.2018 06:39, Konstantin Shalygin wrote:
Yes, I know that section of the docs, but can't find how
to change the crush rules after "ceph osd cru
#tunables
2018-06-20 18:27 GMT+02:00 Oliver Schulz mailto:oliver.sch...@tu-dortmund.de>>:
Thanks, Paul - I could probably activate the Jewel tunables
profile without losing too many clients - most are running
at least kernel 4.2, I think. I'll go hunting for older
clien
without breaking the oldest clients. Incrementing choose*tries in the
crush rule
or tunables is probably sufficient.
But since you are apparently running into data balance problems you'll have
to update that to something more modern sooner or later.
You can also play around with crushtool
re common).
Values used
for EC are set_chooseleaf_tries = 5 and set_choose_tries = 100.
You can configure them by adding them as the first steps of the rule.
You can also configure an upmap exception.
But in general it is often not the best idea to have only 3 racks for
On 06/20/2018 04:00 PM, Paul Emmerich wrote:
Can you post the full output of "ceph -s", "ceph health detail, and ceph osd df
tree
Also please run "ceph pg X.YZ query" on one of the PGs not backfilling.
Paul
2018-06-20 15:25 GMT+02:00 Oliver Schulz mailto:oliver.sch...@tu
Dear all,
we (somewhat) recently extended our Ceph cluster,
and updated it to Luminous. By now, the fill level
on some ODSs is quite high again, so I'd like to
re-balance via "OSD reweight".
I'm running into the following problem, however:
Not matter what I do (reweigt a little, or a lot,
or onl
ephfs disaster recovery tools. Your cluster was
offline here because it couldn't do some writes, but it should still be
self-consistent.
On Thu, Jun 14, 2018 at 4:52 PM Oliver Schulz
mailto:oliver.sch...@tu-dortmund.de>> wrote:
They are recovered now, looks like it just took a bit
shmap -o crushmap
crushtool -d crushmap -o crushmap.txt
Paul
2018-06-14 22:39 GMT+02:00 Oliver Schulz <mailto:oliver.sch...@tu-dortmund.de>>:
Thanks, Greg!!
I reset all the OSD weights to 1.00, and I think I'm in a much
better state now. The only trouble left in &quo
gs with it in various versions
and forcing that kind of priority without global decision making is
prone to issues.
But yep, looks like things will eventually become all good now. :)
On Thu, Jun 14, 2018 at 4:39 PM Oliver Schulz
mailto:oliver.sch...@tu-dortmund.de>> wrote:
Thanks, Gre
quot;recovery_wait", after.
Cheers,
Oliver
On 14.06.2018 22:09, Gregory Farnum wrote:
On Thu, Jun 14, 2018 at 4:07 PM Oliver Schulz
mailto:oliver.sch...@tu-dortmund.de>> wrote:
Hi Greg,
I increased the hard limit and rebooted everything. The
PG without acting OSDs still
ndoing the remap-by-utilization
> changes.
How do I do that, best? Just set all the weights back to 1.00?
Cheers,
Oliver
P.S.: Thanks so much for helping!
On 14.06.2018 21:37, Gregory Farnum wrote:
On Thu, Jun 14, 2018 at 3:26 PM Oliver Schulz
mailto:oliver.sch...@tu-dortmund.de>> w
Ah, I see some OSDs actually are over the 200 PG
limit - I'll increase the hard limit and restart
everything.
On 14.06.2018 21:26, Oliver Schulz wrote:
But the contents of the remapped PGs should still be
Ok, right? What confuses me is that they don't
backfill - why don't the &q
he pg overdose protection that
was added in luminous. Check the list archives for the exact name, but
you’ll want to increase the pg hard limit and restart the osds that
exceeded the previous/current setting.
-Greg
On Thu, Jun 14, 2018 at 2:33 PM Oliver Schulz
mailto:oliver.sch...@tu-dortmund.de
t the pg reporting, but I believe if it’s
reporting the state as unknown that means *no* running osd which
contains any copy of that pg. That’s not something which ceph could do
on its own without failures of osds. What’s the output of “ceph -s”?
On Thu, Jun 14, 2018 at 2:15 PM Oliver Schulz
hould still be getting
reported as inactive.
On Thu, Jun 14, 2018 at 8:40 AM Oliver Schulz
mailto:oliver.sch...@tu-dortmund.de>> wrote:
Dear all,
I have a serious problem with our Ceph cluster: One of our PGs somehow
ended up in this state (reported by "ceph health deta
Dear all,
I have a serious problem with our Ceph cluster: One of our PGs somehow
ended up in this state (reported by "ceph health detail":
pg 1.XXX is stuck inactive for ..., current state unknown, last acting []
Also, "ceph pg map 1.xxx" reports:
osdmap e525812 pg 1.721 (1.721) -> up
Dear all,
we have a Ceph cluster that has slowly evolved over several
years and Ceph versions (started with 18 OSDs and 54 TB
in 2013, now about 200 OSDs and 1.5 PB, still the same
cluster, with data continuity). So there are some
"early sins" in the cluster configuration, left over from
the earl
that much more longevity to the life of the drive.
You cannot change the size of any part of a bluestore osd after creation.
On Sat, May 12, 2018, 3:09 PM Oliver Schulz
mailto:oliver.sch...@tu-dortmund.de>> wrote:
Dear David,
On 11.05.2018 22:10, David Turner wrote:
> F
Dear David,
On 11.05.2018 22:10, David Turner wrote:
For if you should do WAL only on the NVMe vs use a filestore journal,
that depends on your write patterns, use case, etc.
we mostly use CephFS, for scientific data processing. It's
mainly larger files (10 MB to 10 GB, but sometimes also
a bu
Dear David,
thanks a lot for the detailed answer(s) and clarifications!
Can I ask just a few more questions?
On 11.05.2018 18:46, David Turner wrote:
partitions is 10GB per 1TB of OSD. If your OSD is a 4TB disk you should
be looking closer to a 40GB block.db partition. If your block.db
parti
ng your question, you could do something like:
$ ceph-deploy osd create --bluestore --data=/dev/sdb --block-db
/dev/nvme0n1p1 $HOSTNAME
$ ceph-deploy osd create --bluestore --data=/dev/sdc --block-db
/dev/nvme0n1p1 $HOSTNAME
On Fri, May 11, 2018 at 10:35 AM Oliver Schulz
mailto:oliver.sch...@tu-d
4 0 576M 0 part
> ├─nvme0n1p15 259:15 0 1G 0 part
> ├─nvme0n1p16 259:16 0 576M 0 part
> ├─nvme0n1p17 259:17 0 1G 0 part
> ├─nvme0n1p18 259:18 0 576M 0 part
> ├─nvme0n1p19 259:19 0 1G 0 part
> ├─nvme0n1p20 259:20 0 576M 0 part
> ├─
Dear Ceph Experts,
I'm trying to set up some new OSD storage nodes, now with
bluestore (our existing nodes still use filestore). I'm
a bit unclear on how to specify WAL/DB devices: Can
several OSDs share one WAL/DB partition? So, can I do
ceph-deploy osd create --bluestore --osd-db=/dev/nvme
find 'MON_HOSTNAME' in monmap
Any ideas?
Cheers,
Oliver
On 04/20/2018 11:46 AM, Stefan Kooman wrote:
Quoting Oliver Schulz (oliver.sch...@tu-dortmund.de):
Dear Ceph Experts,
I'm try to switch an old Ceph cluster from manual administration to
ceph-deploy, but I'm running
Dear Ceph Experts,
I'm try to switch an old Ceph cluster from manual administration to
ceph-deploy, but I'm running into the following error:
# ceph-deploy gatherkeys HOSTNAME
[HOSTNAME][INFO ] Running command: /usr/bin/ceph --connect-timeout=25
--cluster=ceph --admin-daemon=/var/run/ceph/cep
Dear Ceph Experts,
after upgrading our Ceph cluster from Hammer to Jewel,
the MDS (after a few days) found some metadata damage:
# ceph status
[...]
health HEALTH_ERR
mds0: Metadata damage detected
[...]
The output of
# ceph tell mds.0 damage ls
is:
[
{
d consider using either a more
recent kernel or a fuse client.
John
On 30/07/15 08:32, Oliver Schulz wrote:
Hello Ceph Experts,
lately, "ceph status" on our cluster often states:
mds0: Client CLIENT_ID failing to respond to capability release
How can I identify which client is at
Hello Ceph Experts,
lately, "ceph status" on our cluster often states:
mds0: Client CLIENT_ID failing to respond to capability release
How can I identify which client is at fault (hostname or IP address)
from the CLIENT_ID?
What could be the source of the "failing to respond to capability
Dear Greg,
On 18.02.2015 23:41, Gregory Farnum wrote:
is it possible to define a Ceph user/key with privileges
that allow for read-only CephFS access but do not allow
...and deletes, unfortunately. :( I don't think this is presently a
thing it's possible to do until we get a much better user au
Hi Florian,
On 18.02.2015 22:58, Florian Haas wrote:
is it possible to define a Ceph user/key with privileges
that allow for read-only CephFS access but do not allow
All you should need to do is [...]
However, I've just tried the above with ceph-fuse on firefly, and [...]
So I believe you've un
Dear Ceph Experts,
is it possible to define a Ceph user/key with privileges
that allow for read-only CephFS access but do not allow
write or other modifications to the Ceph cluster?
I would like to export a sub-tree of our CephFS via HTTPS.
Alas, web-servers are inviting targets, so in the (hope
Hello Ceph-Gurus,
a short while ago I reported some trouble we had with our cluster
suddenly going into a state of "blocked requests".
We did a few tests, and we can reproduce the problem:
During / after deleting of a substantial chunk of data on
CephFS (a few TB), ceph health shows blocked requ
> our Ceph cluster suddenly went into a state of OSDs constantly having
> blocked or slow requests, rendering the cluster unusable. This happened
> during normal use, there were no updates, etc.
our cluster seems to have recovered overnight and is back
to normal behaviour. This morning, everything
Hi Michael,
> Sounds like what I was having starting a couple of days ago, played
[...]
yes, that sounds ony too familiar. :-(
> Updated to 3.12 kernel and restarted all of the ceph nodes and it's now
> happily churning through a rados -p rbd bench 300 write -t 120 that
Weird - but if that s
Dear Ceph Experts,
our Ceph cluster suddenly went into a state of OSDs constantly having
blocked or slow requests, rendering the cluster unusable. This happened
during normal use, there were no updates, etc.
All disks seem to be healthy (smartctl, iostat, etc.). A complete
hardware reboot includ
Dear Greg,
I believe 3.8 is after CRUSH_TUNABLES v1 was implemented in the
kernel, so it shouldn't hurt you to turn them on if you need them.
(And the crush tool is just out of date; we should update that text!)
However, if you aren't having distribution issues on your cluster I
wouldn't bother
Dear Ceph Experts,
We're running a production Ceph cluster with Ceph Dumpling,
with Ubuntu 12.04.3 (kernel 3.8) on the cluster nodes and
all clients. We're mainly using CephFS (kernel) and RBD
(kernel and user-space/libvirt).
Would you recommend to activate CRUSH_TUNABLES (1, not 2) for
our use
Dear Ceph Experts,
I remember reading that at least in the past I wasn't recommended
to mount Ceph storage on a Ceph cluster node. Given a recent kernel
(3.8/3.9) and sufficient CPU and memory resources on the nodes,
would it now be safe to
* Mount RBD oder CephFS on a Ceph cluster node?
* Run a
Dear Ceph Experts,
I remember reading that at least in the past I wasn't recommended
to mount Ceph storage on a Ceph cluster node. Given a recent kernel
(3.8/3.9) and sufficient CPU and memory resources on the nodes,
would it now be safe to
* Mount RBD oder CephFS on a Ceph cluster node?
* Run a
Hello,
the CEPH "Hard disk and file system recommendations" page states
that XFS is the recommend OSD file system for production systems.
Does that still hold true for the last kernels versions
(e.g. Ubuntu 12.04 with lts-raring kernel 3.8.5)?
Would btrfs provide a significant performance incre
50 matches
Mail list logo