Hello,
a colleague of mine has done a presentation at FOSDEM about how we (OVH) are
doing RBD backups. You might find it interesting:
https://archive.fosdem.org/2018/schedule/event/backup_ceph_at_scale/
--
Tomasz Kuzemko
tomasz.kuze...@corp.ovh.com
Od
Welcome Mike! You're the perfect person for this role!
--
Tomasz Kuzemko
tomasz.kuze...@corp.ovh.com
Od: ceph-users w imieniu użytkownika Sage
Weil
Wysłane: środa, 29 sierpnia 2018 03:13
Do: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com;
Hello Josef,
I would suggest setting up a bigger disk (if not physical then maybe a LVM
volume from 2 smaller disks) and cloning (remember about extended attributes!)
the OSD data dir to the new disk, then try to bring the OSD back into cluster.
--
Tomasz Kuzemko
tomasz.kuze...@corp.ovh.com
If the PG cannot be queried I would bet on OSD message throttler. Check with
"ceph --admin-daemon PATH_TO_ADMIN_SOCK perf dump" on each OSD which is holding
this PG if message throttler current value is not equal max. If it is,
increase the max value in ceph.conf and restart OSD.
mance
increase.
--
Tomasz Kuzemko
tomasz.kuze...@corp.ovh.com
Dnia 14.02.2017 o godz. 17:25 Wido den Hollander napisał(a):
>
>> Op 14 februari 2017 om 11:14 schreef Nick Fisk :
>>
>>
>>> -Original Message-
>>> From: ceph-users [mailto:cep
use of the contents of this information is prohibited. If you have received
> this electronic message in error, please notify us by post or telephone (to
> the numbers or correspondence address above) or by email (at the email
> address above) immediately.
>
ema
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> <http://lists.ceph.com/listinfo.cgi/ce
bscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Tomasz Kuzemko
tomasz.kuze...@corp.ovh.com
signature.asc
Description: OpenPGP digital signature
_
anyone who might run into the same problem.
2016-10-01 14:27 GMT+02:00 Tomasz Kuzemko :
> Hi,
>
> I have a production cluster on which 1 OSD on a failing disk was slowing
> the whole cluster down. I removed the OSD (osd.87) like usual in such case
> but this time it resulted in 17 un
d85016af6bd7879ef272ca5639/raw/d6fceb9acd206b75c3ce59c60bcd55a47dea7acd/osd-dump
ceph health detail:
https://gist.github.com/anonymous/ddb27863ecd416748ebd7ebbc036e438/raw/59ef1582960e011f10cbdbd4ccee509419b95d4e/health-detail
--
Pozdrawiam,
Tomasz Kuzemko
tom...@kuzemko.net
_
be
> numbers. I never found some sort of calculator which can say "Oh you get
> this hardware? Than a repl size of x y z is what you need."
>
> HTH a bit . Regards . Götz
>
>
>
>
> ___
> ceph-users mailing list
&
___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
--
Tomasz Kuzem
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
--
Tomasz Kuzemko
tomasz.kuze...@corp.ovh.com
signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
l persists.
> - I have been at HEALTH_OK every day, but overnight scrubbing has been
> uncovering problematic pgs I've had to repair every single night so
> far. This morning was when it went beyond my ability to repair.
>
>
>
; I have read many times the post "incomplete pgs, oh my"
> I think my case is different.
> The broken disk is completely broken.
> So how can I simply mark incomplete pgs as complete?
> Should I stop ceph before?
>
>
> Il giorno mer 29
ual machine
>>> can boot
>>> because ceph has stopped i/o.
>>>
>>> I can accept to lose some data, but not ALL data!
>>> Can you help me please?
>>> Thanks,
>>> Mario
>>>
&g
Hi,
my team did some benchmarks in the past to answer this question. I don't
have results at hand, but conclusion was that it depends on how many
disks/OSDs you have in a single host: above 9 there was more benefit
from more cores than GHz (6-core 3.5GHz vs 10-core 2.4GHz AFAIR).
--
T
ECC will not be able to recover the data, but it will always be able to
detect that data is corrupted. AFAIK under Linux this results in
immediate halt of system, so it would not be able to report bad checksum
data during deep-scrub.
--
Tomasz Kuzemko
tomasz.kuze...@corp.ovh.com
W dniu
Hi,
I have also seen inconsistent PGs despite md5 being the same on all
objects, however all my hardware uses ECC RAM, which as I understand
should prevent this type of error. To be clear - in your case you were
using ECC or non-ECC module?
--
Tomasz Kuzemko
tomasz.kuze...@ovh.net
W dniu
ine will probably take affect only for PGs
on which backfill has not yet started, which can explain why you did not
see immediate effect of changing these on the fly.
--
Tomasz Kuzemko
tom...@kuzemko.net
2015-11-26 0:24 GMT+01:00 Robert LeBlanc :
>
> -BEGIN PGP SIGNED MESSAGE-
> H
filestore_split_multiple": "2",
> "filestore_update_to": "1000",
> "filestore_blackhole": "false",
> "filestore_fd_cache_size": "128
On Sun, Dec 28, 2014 at 02:49:08PM +0900, Christian Balzer wrote:
> You really, really want size 3 and a third node for both performance
> (reads) and redundancy.
How does it benefit read performance? I thought all reads are made only
from the active primary OSD.
--
Tomasz Kuzemko
tomas
Try lowering "filestore max sync interval" and "filestore min sync
interval". It looks like during the hanged period data is flushed from
some overly big buffer.
If this does not help you can monitor perf stats on OSDs to see if some
queue is unusually large.
--
Tomasz
Be very careful with running "ceph pg repair". Have a look at this
thread:
http://thread.gmane.org/gmane.comp.file-systems.ceph.user/15185
--
Tomasz Kuzemko
tomasz.kuze...@ovh.net
On Thu, Dec 11, 2014 at 10:57:22AM +, Luis Periquito wrote:
> Hi,
>
> I've stoppe
For metadata corruption you would have to modify object file's extended
attributes (with xattr for example).
--
Tomasz Kuzemko
tomasz.kuze...@ovh.net
On Thu, Dec 04, 2014 at 02:26:56PM +0100, Sebastien Han wrote:
> AFAIK there is no tool to do this.
> You simply rm object or dd a new
.intent-log' replicated size 3 min_size 2 crush_ruleset 0
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 213 flags hashpspool
> stripe_width 0
> pool 15 '' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins
> pg_num 8 pgp_num 8 last_change 238 flags hashpspool stripe_width 0
>
Some of your pools have size = 3.
>
> --
> With regards,
> Stanislav Butkeev
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Tomasz Kuzemko
tomasz.kuze...@ovh.net
signature.asc
Description: Digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
On Tue, Nov 25, 2014 at 07:10:26AM -0800, Sage Weil wrote:
> On Tue, 25 Nov 2014, Tomasz Kuzemko wrote:
> > Hello,
> > as far as I can tell, Ceph does not make any guarantee that reads from an
> > object return what was actually written to it. In other words, it does not
>
se it in production? Are there any
>> considerations one should make before enabling it? Is it safe to
>> enable it on an existing cluster?
>>
>> --
>>
>> Tomasz Kuzemko
>> tom...@kuzemko.net
>>
>> ______
it's merged since Emperor.
Getting back to my actual question - what is the state of "filestore sloppy
crc"? Does someone actually use it in production? Are there any
considerations one should make before enabling it? Is it safe to enable it
on an existing cluste
29 matches
Mail list logo