Re: [ceph-users] slow request and unresponsive kvm guests after upgrading ceph cluster and os, please help debugging

2020-01-07 Thread Jelle de Jong
38 Stddev Latency(s): 0.0843661 Max latency(s): 0.48464 Min latency(s): 0.0467124 On 2020-01-06 20:44, Jelle de Jong wrote: Hello everybody, I have issues with very slow requests a simple tree node cluster here, four WDC enterprise disks and Intel Optane NVMe journa

Re: [ceph-users] Random slow requests without any load

2020-01-06 Thread Jelle de Jong
Hi, What are the full commands you used to setup this iptables config? iptables --table raw --append OUTPUT --jump NOTRACK iptables --table raw --append PREROUTING --jump NOTRACK Does not create the same output, it needs some more. Kind regards, Jelle de Jong On 2019-07-17 14:59, Kees

[ceph-users] slow request and unresponsive kvm guests after upgrading ceph cluster and os, please help debugging

2020-01-06 Thread Jelle de Jong
Hello everybody, I have issues with very slow requests a simple tree node cluster here, four WDC enterprise disks and Intel Optane NVMe journal on identical high memory nodes, with 10GB networking. It was working all good with Ceph Hammer on Debian Wheezy, but I wanted to upgrade to a

[ceph-users] help! pg inactive and slow requests after filestore to bluestore migration, version 12.2.12

2019-12-12 Thread Jelle de Jong
Hello everybody, I got a tree node ceph cluster made of E3-1220v3, 24GB ram, 6 hdd osd's with 32GB Intel Optane NVMe journal, 10GB networking. I wanted to move to bluestore due to dropping support of filestore, our cluster was working fine with filestore and we could take complete nodes out

[ceph-users] help! pg inactive and slow requests after filestore to bluestore migration, version 12.2.12

2019-12-06 Thread Jelle de Jong
Hello everybody, [fix confusing typo] I got a tree node ceph cluster made of E3-1220v3, 24GB ram, 6 hdd osd's with 32GB Intel Optane NVMe journal, 10GB networking. I wanted to move to bluestore due to dropping support of filestore, our cluster was working fine with filestore and we could

[ceph-users] help! pg inactive and slow requests after filestore to bluestore migration, version 12.2.12

2019-12-06 Thread Jelle de Jong
Hello everybody, I got a tree node ceph cluster made of E3-1220v3, 24GB ram, 6 hdd osd's with 32GB Intel Optane NVMe journal, 10GB networking. I wanted to move to bluestore due to dropping support of file store, our cluster was working fine with bluestore and we could take complete nodes

Re: [ceph-users] Scaling out

2019-11-21 Thread Alfredo De Luca
e. > > On Thu, Nov 21, 2019 at 7:46 AM Alfredo De Luca > wrote: > > > > Hi all. > > We are doing some tests on how to scale out nodes on Ceph Nautilus. > > Basically we want to try to install Ceph on one node and scale up to 2+ > nodes. How to do so? >

[ceph-users] Scaling out

2019-11-21 Thread Alfredo De Luca
Hi all. We are doing some tests on how to scale out nodes on Ceph Nautilus. Basically we want to try to install Ceph on one node and scale up to 2+ nodes. How to do so? Every nodes has 6 disks and maybe we can use the crushmap to achieve this? Any thoughts/ideas/recommendations? Cheers --

Re: [ceph-users] ceph-objectstore-tool crash when trying to recover pg from OSD

2019-11-07 Thread Eugene de Beste
Hi, does anyone have any feedback for me regarding this? Here's the log I get when trying to restart the OSD via systemctl: https://pastebin.com/tshuqsLP On Mon, 4 Nov 2019 at 12:42, Eugene de Beste mailto:eug...@sanbi.ac.za)> wrote: > Hi everyone > > I have a cluster that was init

[ceph-users] ceph-objectstore-tool crash when trying to recover pg from OSD

2019-11-04 Thread Eugene de Beste
Hi everyone I have a cluster that was initially set up with bad defaults in Luminous. After upgrading to Nautilus I've had a few OSDs crash on me, due to errors seemingly related to https://tracker.ceph.com/issues/42223 and https://tracker.ceph.com/issues/22678. One of my pools have been

Re: [ceph-users] ssd requirements for wal/db

2019-10-04 Thread Stijn De Weirdt
hi all, maybe to clarify a bit, e.g. https://indico.cern.ch/event/755842/contributions/3243386/attachments/1784159/2904041/2019-jcollet-openlab.pdf clearly shows that the db+wal disks are not saturated, but we are wondering what is really needed/acceptable wrt throughput and latency (eg is a

Re: [ceph-users] process stuck in D state on cephfs kernel mount

2019-01-21 Thread Stijn De Weirdt
hi marc, > - how to prevent the D state process to accumulate so much load? you can't. in linux, uninterruptable tasks themself count as "load", this does not mean you eg ran out of cpu resources. stijn > > Thanks, > > > > > > ___ > ceph-users

Re: [ceph-users] Encryption questions

2019-01-11 Thread Sergio A. de Carvalho Jr.
Thanks for the answers, guys! Am I right to assume msgr2 (http://docs.ceph.com/docs/mimic/dev/msgr2/) will provide encryption between Ceph daemons as well as between clients and daemons? Does anybody know if it will be available in Nautilus? On Fri, Jan 11, 2019 at 8:10 AM Tobias Florek

[ceph-users] Encryption questions

2019-01-10 Thread Sergio A. de Carvalho Jr.
Hi everyone, I have some questions about encryption in Ceph. 1) Are RBD connections encrypted or is there an option to use encryption between clients and Ceph? From reading the documentation, I have the impression that the only option to guarantee encryption in transit is to force clients to

[ceph-users] Lost machine with MON and MDS

2018-10-26 Thread Maiko de Andrade
Hi, I have 3 machine with ceph config with cephfs. But I lost one machine, just with mon and mds. It's possible recovey cephfs? If yes how? ceph: Ubuntu 16.05.5 (lost this machine) - mon - mds - osd ceph-osd-1: Ubuntu 16.05.5 - osd ceph-osd-2: Ubuntu 16.05.5 - osd []´s Maiko de Andrade MAX

Re: [ceph-users] dovecot + cephfs - sdbox vs mdbox

2018-10-04 Thread Webert de Souza Lima
DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* *IRC NICK - WebertRLZ* On Wed, May 16, 2018 at 5:15 PM Webert de Souza Lima wrote: > Thanks Jack. > > That's good to know. It is definitely something to consider. > In a distributed storage scenario we might build a de

Re: [ceph-users] rados rm objects, still appear in rados ls

2018-09-28 Thread Frank de Bot (lists)
pool? The pool had 2 snaps. After removing those, the ls command returned no 'non-existing' objects. I expected that ls would only return objects of the current contents, I did not specify -s for working with snaps of the pool. > > John > >> >> I use Centos 7.5 with mi

Re: [ceph-users] Ceph-Deploy error on 15/71 stage

2018-09-04 Thread Jones de Andrade
Hi Eugen. Just tried everything again here by removing the /sda4 partitions and letting it so that either salt-run proposal-populate or salt-run state.orch ceph.stage.configure could try to find the free space on the partitions to work with: unsuccessfully again. :( Just to make things clear:

Re: [ceph-users] Ceph-Deploy error on 15/71 stage

2018-08-31 Thread Jones de Andrade
eveal anything run stage.3 again and watch the logs. > > Regards, > Eugen > > > Zitat von Jones de Andrade : > > > Hi Eugen. > > > > Ok, edited the file /etc/salt/minion, uncommented the "log_level_logfile" > > line and set it to "debug" level

Re: [ceph-users] Ceph-Deploy error on 15/71 stage

2018-08-30 Thread Jones de Andrade
e master, I was expecting it to have logs from the other > > too) and, moreover, no ceph-osd* files. Also, I'm looking the logs I have > > available, and nothing "shines out" (sorry for my poor english) as a > > possible error. > > the logging is not configured to be centr

Re: [ceph-users] Ceph-Deploy error on 15/71 stage

2018-08-29 Thread Jones de Andrade
gen > > [1] https://forums.suse.com/forumdisplay.php?99-SUSE-Enterprise-Storage > > Zitat von Jones de Andrade : > > > Hi Eugen. > > > > Thanks for the suggestion. I'll look for the logs (since it's our first > > attempt with ceph, I'll have to discover w

Re: [ceph-users] Ceph-Deploy error on 15/71 stage

2018-08-26 Thread Jones de Andrade
; Since the deployment stage fails at the OSD level, start with the OSD > logs. Something's not right with the disks/partitions, did you wipe > the partition from previous attempts? > > Regards, > Eugen > > Zitat von Jones de Andrade : > > > (Please forgive my previous e

[ceph-users] Ceph-Deploy error on 15/71 stage

2018-08-24 Thread Jones de Andrade
(Please forgive my previous email: I was using another message and completely forget to update the subject) Hi all. I'm new to ceph, and after having serious problems in ceph stages 0, 1 and 2 that I could solve myself, now it seems that I have hit a wall harder than my head. :) When I run

Re: [ceph-users] Mimic prometheus plugin -no socket could be created

2018-08-24 Thread Jones de Andrade
Hi all. I'm new to ceph, and after having serious problems in ceph stages 0, 1 and 2 that I could solve myself, now it seems that I have hit a wall harder than my head. :) When I run salt-run state.orch ceph.stage.deploy, i monitor I see it going up to here: ### [14/71] ceph.sysctl on

Re: [ceph-users] cephfs kernel client hangs

2018-08-08 Thread Webert de Souza Lima
t restart it everytime. > > Webert de Souza Lima 于2018年8月8日周三 下午10:33写道: > >> Hi Zhenshi, >> >> if you still have the client mount hanging but no session is connected, >> you probably have some PID waiting with blocked IO from cephfs mount. >> I face that

Re: [ceph-users] cephfs kernel client hangs

2018-08-08 Thread Webert de Souza Lima
e. > So I cannot get useful infomation from the command you provide. > > Thanks > > Webert de Souza Lima 于2018年8月8日周三 下午10:10写道: > >> You could also see open sessions at the MDS server by issuing `ceph >> daemon mds.XX session ls` >> >> Regards, >> >

Re: [ceph-users] cephfs kernel client hangs

2018-08-08 Thread Webert de Souza Lima
>>> >>> This is not a Ceph-specific thing -- it can also affect similar >>> >>> systems like Lustre. >>> >>> >>> >>> The classic case is when under some memory pressure, the kernel tries >>> >>> to free memory by f

Re: [ceph-users] Whole cluster flapping

2018-08-08 Thread Webert de Souza Lima
healthy 'OSD::osd_op_tp thread 0x7fdabd897700' had > timed out after 90 > > > > (I update it to 90 instead of 15s) > > > > Regards, > > > > > > > > *De :* ceph-users *De la part de* > Webert de Souza Lima > *Envoyé :* 07 August 2018 16:28 >

Re: [ceph-users] cephfs kernel client hangs

2018-08-07 Thread Webert de Souza Lima
client at this > point, but that isn’t etched in stone. > > > > Curious if there is more to share. > > > > Reed > > > > On Aug 7, 2018, at 9:47 AM, Webert de Souza Lima > wrote: > > > > > > Yan, Zheng 于2018年8月7日周二 下午7:51写道: > >

Re: [ceph-users] cephfs kernel client hangs

2018-08-07 Thread Webert de Souza Lima
Yan, Zheng 于2018年8月7日周二 下午7:51写道: > On Tue, Aug 7, 2018 at 7:15 PM Zhenshi Zhou wrote: > this can cause memory deadlock. you should avoid doing this > > > Yan, Zheng 于2018年8月7日 周二19:12写道: > >> > >> did you mount cephfs on the same machines that run ceph-osd? > >> I didn't know about this. I

Re: [ceph-users] Whole cluster flapping

2018-08-07 Thread Webert de Souza Lima
Tecnologia *Belo Horizonte - Brasil* *IRC NICK - WebertRLZ* On Tue, Aug 7, 2018 at 10:47 AM CUZA Frédéric wrote: > Pool is already deleted and no longer present in stats. > > > > Regards, > > > > *De :* ceph-users *De la part de* > Webert de Souza Lima > *En

Re: [ceph-users] Whole cluster flapping

2018-08-07 Thread Webert de Souza Lima
ole cluster keeps flapping, it is > never the same OSDs that go down. > > Is there a way to get the progress of this recovery ? (The pool hat I > deleted is no longer present (for a while now)) > > In fact, there is a lot of i/o activity on the server where osds go down. > > >

Re: [ceph-users] Whole cluster flapping

2018-07-31 Thread Webert de Souza Lima
The pool deletion might have triggered a lot of IO operations on the disks and the process might be too busy to respond to hearbeats, so the mons mark them as down due to no response. Check also the OSD logs to see if they are actually crashing and restarting, and disk IO usage (i.e. iostat).

Re: [ceph-users] MDS damaged

2018-07-13 Thread Alessandro De Salvo
Alessandro De Salvo wrote: However, I cannot reduce the number of mdses anymore, I was used to do that with e.g.: ceph fs set cephfs max_mds 1 Trying this with 12.2.6 has apparently no effect, I am left with 2 active mdses. Is this another bug? Are you following this procedure? http://docs.ceph.com/docs

Re: [ceph-users] MDS damaged

2018-07-13 Thread Alessandro De Salvo
, Jul 12, 2018 at 11:39 PM Alessandro De Salvo wrote: Some progress, and more pain... I was able to recover the 200. using the ceph-objectstore-tool for one of the OSDs (all identical copies) but trying to re-inject it just with rados put was giving no error while the get was still

Re: [ceph-users] MDS damaged

2018-07-12 Thread Alessandro De Salvo
error) Can I safely try to do the same as for object 200.? Should I check something before trying it? Again, checking the copies of the object, they have identical md5sums on all the replicas. Thanks,     Alessandro Il 12/07/18 16:46, Alessandro De Salvo ha scritto: Unfortunately

Re: [ceph-users] MDS damaged

2018-07-12 Thread Alessandro De Salvo
up when trying to read an object, but not on scrubbing, that magically disappeared after restarting the OSD. However, in my case it was clearly related to https://tracker.ceph.com/issues/22464 which doesn't seem to be the issue here. Paul 2018-07-12 13:53 GMT+02:00 Alessandro De Salvo

Re: [ceph-users] MDS damaged

2018-07-12 Thread Alessandro De Salvo
Il 12/07/18 11:20, Alessandro De Salvo ha scritto: Il 12/07/18 10:58, Dan van der Ster ha scritto: On Wed, Jul 11, 2018 at 10:25 PM Gregory Farnum wrote: On Wed, Jul 11, 2018 at 9:23 AM Alessandro De Salvo wrote: OK, I found where the object is: ceph osd map cephfs_metadata

Re: [ceph-users] MDS damaged

2018-07-12 Thread Alessandro De Salvo
Il 12/07/18 10:58, Dan van der Ster ha scritto: On Wed, Jul 11, 2018 at 10:25 PM Gregory Farnum wrote: On Wed, Jul 11, 2018 at 9:23 AM Alessandro De Salvo wrote: OK, I found where the object is: ceph osd map cephfs_metadata 200. osdmap e632418 pool 'cephfs_metadata' (10) object

Re: [ceph-users] MDS damaged

2018-07-12 Thread Alessandro De Salvo
> Il giorno 11 lug 2018, alle ore 23:25, Gregory Farnum ha > scritto: > >> On Wed, Jul 11, 2018 at 9:23 AM Alessandro De Salvo >> wrote: >> OK, I found where the object is: >> >> >> ceph osd map cephfs_metadata 200. >>

Re: [ceph-users] v10.2.11 Jewel released

2018-07-11 Thread Webert de Souza Lima
Cheers! Thanks for all the backports and fixes. Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* *IRC NICK - WebertRLZ* On Wed, Jul 11, 2018 at 1:46 PM Abhishek Lekshmanan wrote: > > We're glad to announce v10.2.11 release of the Jewel stable release >

Re: [ceph-users] MDS damaged

2018-07-11 Thread Alessandro De Salvo
e OSDs with 10.14 are on a SAN system and one on a different one, so I would tend to exclude they both had (silent) errors at the same time. Thanks,     Alessandro Il 11/07/18 18:56, John Spray ha scritto: On Wed, Jul 11, 2018 at 4:49 PM Alessandro De Salvo wrote: Hi John, in fact I get an I/O

Re: [ceph-users] MDS damaged

2018-07-11 Thread Alessandro De Salvo
, 2018 at 4:10 PM Alessandro De Salvo wrote: Hi, after the upgrade to luminous 12.2.6 today, all our MDSes have been marked as damaged. Trying to restart the instances only result in standby MDSes. We currently have 2 filesystems active and 2 MDSes each. I found the following error messages

Re: [ceph-users] MDS damaged

2018-07-11 Thread Alessandro De Salvo
age before issuing the "repaired" command? What is the history of the filesystems on this cluster? On Wed, Jul 11, 2018 at 8:10 AM Alessandro De Salvo <mailto:alessandro.desa...@roma1.infn.it>> wrote: Hi, after the upgrade to luminous 12.2.6 today, all our MDSes ha

[ceph-users] MDS damaged

2018-07-11 Thread Alessandro De Salvo
Hi, after the upgrade to luminous 12.2.6 today, all our MDSes have been marked as damaged. Trying to restart the instances only result in standby MDSes. We currently have 2 filesystems active and 2 MDSes each. I found the following error messages in the mon: mds.0 :6800/2412911269

[ceph-users] Looking for some advise on distributed FS: Is Ceph the right option for me?

2018-07-10 Thread Jones de Andrade
Hi all. I'm looking for some information on several distributed filesystems for our application. It looks like it finally came down to two candidates, Ceph being one of them. But there are still a few questions about ir that I would really like to clarify, if possible. Our plan, initially on 6

Re: [ceph-users] SSD for bluestore

2018-07-09 Thread Webert de Souza Lima
bluestore doesn't have a journal like the filestore does, but there is the WAL (Write-Ahead Log) which is looks like a journal but works differently. You can (or must, depending or your needs) have SSDs to serve this WAL (and for Rocks DB). Regards, Webert Lima DevOps Engineer at MAV Tecnologia

[ceph-users] FreeBSD Initiator with Ceph iscsi

2018-06-26 Thread Frank de Bot (lists)
Hi, In my test setup I have a ceph iscsi gateway (configured as in http://docs.ceph.com/docs/luminous/rbd/iscsi-overview/ ) I would like to use thie with a FreeBSD (11.1) initiator, but I fail to make a working setup in FreeBSD. Is it known if the FreeBSD initiator (with gmultipath) can work

[ceph-users] Intel SSD DC P3520 PCIe for OSD 1480 TBW good idea?

2018-06-25 Thread Jelle de Jong
I want to try using NUMA to also run KVM guests besides the OSD. I should have enough cores and only have a few osd processes. Kind regards, Jelle de Jong ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph

Re: [ceph-users] Frequent slow requests

2018-06-19 Thread Frank de Bot (lists)
, all slow request are blocked by OSD's on a single host. How can I debug this problem further? I can't find any errors or other strange things on the host with osd's that are seemingly not sending a response to an op. Regards, Frank de Bot ___ ceph-users

Re: [ceph-users] Minimal MDS for CephFS on OSD hosts

2018-06-19 Thread Webert de Souza Lima
Keep in mind that the mds server is cpu-bound, so during heavy workloads it will eat up CPU usage, so the OSD daemons can affect or be affected by the MDS daemon. But it does work well. We've been running a few clusters with MON, MDS and OSDs sharing the same hosts for a couple of years now.

Re: [ceph-users] Migrating cephfs data pools and/or mounting multiple filesystems belonging to the same cluster

2018-06-14 Thread Alessandro De Salvo
Hi, Il 14/06/18 06:13, Yan, Zheng ha scritto: On Wed, Jun 13, 2018 at 9:35 PM Alessandro De Salvo wrote: Hi, Il 13/06/18 14:40, Yan, Zheng ha scritto: On Wed, Jun 13, 2018 at 7:06 PM Alessandro De Salvo wrote: Hi, I'm trying to migrate a cephfs data pool to a different one in order

Re: [ceph-users] cephfs: bind data pool via file layout

2018-06-13 Thread Webert de Souza Lima
n’t. The backtrace does > create another object but IIRC it’s a maximum one IO per create/rename (on > the file). > On Wed, Jun 13, 2018 at 1:12 PM Webert de Souza Lima < > webert.b...@gmail.com> wrote: > >> Thanks for clarifying that, Gregory. >> >> As said

Re: [ceph-users] cephfs: bind data pool via file layout

2018-06-13 Thread Webert de Souza Lima
isn’t > available you would stack up pending RADOS writes inside of your mds but > the rest of the system would continue unless you manage to run the mds out > of memory. > -Greg > On Wed, Jun 13, 2018 at 9:25 AM Webert de Souza Lima < > webert.b...@gmail.com> wrote: > >>

Re: [ceph-users] Migrating cephfs data pools and/or mounting multiple filesystems belonging to the same cluster

2018-06-13 Thread Alessandro De Salvo
Hi, Il 13/06/18 14:40, Yan, Zheng ha scritto: On Wed, Jun 13, 2018 at 7:06 PM Alessandro De Salvo wrote: Hi, I'm trying to migrate a cephfs data pool to a different one in order to reconfigure with new pool parameters. I've found some hints but no specific documentation to migrate pools

Re: [ceph-users] cephfs: bind data pool via file layout

2018-06-13 Thread Webert de Souza Lima
the overhead may be acceptable for us. Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* *IRC NICK - WebertRLZ* On Wed, Jun 13, 2018 at 9:51 AM Yan, Zheng wrote: > On Wed, Jun 13, 2018 at 3:34 AM Webert de Souza Lima > wrote: > > > > hello, &g

[ceph-users] Migrating cephfs data pools and/or mounting multiple filesystems belonging to the same cluster

2018-06-13 Thread Alessandro De Salvo
Hi, I'm trying to migrate a cephfs data pool to a different one in order to reconfigure with new pool parameters. I've found some hints but no specific documentation to migrate pools. I'm currently trying with rados export + import, but I get errors like these: Write

[ceph-users] cephfs: bind data pool via file layout

2018-06-12 Thread Webert de Souza Lima
hello, is there any performance impact on cephfs for using file layouts to bind a specific directory in cephfs to a given pool? Of course, such pool is not the default data pool for this cephfs. Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* *IRC NICK -

Re: [ceph-users] (yet another) multi active mds advise needed

2018-05-19 Thread Webert de Souza Lima
Hi Daniel, Thanks for clarifying. I'll have a look at dirfrag option. Regards, Webert Lima Em sáb, 19 de mai de 2018 01:18, Daniel Baumann <daniel.baum...@bfh.ch> escreveu: > On 05/19/2018 01:13 AM, Webert de Souza Lima wrote: > > New question: will it make any difference i

Re: [ceph-users] (yet another) multi active mds advise needed

2018-05-18 Thread Webert de Souza Lima
Hi Patrick On Fri, May 18, 2018 at 6:20 PM Patrick Donnelly wrote: > Each MDS may have multiple subtrees they are authoritative for. Each > MDS may also replicate metadata from another MDS as a form of load > balancing. Ok, its good to know that it actually does some load

[ceph-users] (yet another) multi active mds advise needed

2018-05-18 Thread Webert de Souza Lima
Hi, We're migrating from a Jewel / filestore based cephfs archicture to a Luminous / buestore based one. One MUST HAVE is multiple Active MDS daemons. I'm still lacking knowledge of how it actually works. After reading the docs and ML we learned that they work by sort of dividing the

Re: [ceph-users] Multi-MDS Failover

2018-05-18 Thread Webert de Souza Lima
Hello, On Mon, Apr 30, 2018 at 7:16 AM Daniel Baumann wrote: > additionally: if rank 0 is lost, the whole FS stands still (no new > client can mount the fs; no existing client can change a directory, etc.). > > my guess is that the root of a cephfs (/; which is always

Re: [ceph-users] dovecot + cephfs - sdbox vs mdbox

2018-05-16 Thread Webert de Souza Lima
NICK - WebertRLZ* On Wed, May 16, 2018 at 4:45 PM Jack <c...@jack.fr.eu.org> wrote: > On 05/16/2018 09:35 PM, Webert de Souza Lima wrote: > > We'll soon do benchmarks of sdbox vs mdbox over cephfs with bluestore > > backend. > > We'll have to do some some work on h

Re: [ceph-users] dovecot + cephfs - sdbox vs mdbox

2018-05-16 Thread Webert de Souza Lima
n production, but you can try it to run a POC. > > For more information check out my slides from Ceph Day London 2018: > https://dalgaaf.github.io/cephday-london2018-emailstorage/#/cover-page > > The project can be found on github: > https://github.com/ceph-dovecot/ > > -D

Re: [ceph-users] dovecot + cephfs - sdbox vs mdbox

2018-05-16 Thread Webert de Souza Lima
gt; and will help you a lot: > - Compression (classic, https://wiki.dovecot.org/Plugins/Zlib) > - Single-Instance-Storage (aka sis, aka "attachment deduplication" : > https://www.dovecot.org/list/dovecot/2013-December/094276.html) > > Regards, > On 05/16/2018 08:37 PM, Webert de So

[ceph-users] dovecot + cephfs - sdbox vs mdbox

2018-05-16 Thread Webert de Souza Lima
I'm sending this message to both dovecot and ceph-users ML so please don't mind if something seems too obvious for you. Hi, I have a question for both dovecot and ceph lists and below I'll explain what's going on. Regarding dbox format (https://wiki2.dovecot.org/MailboxFormat/dbox), when using

Re: [ceph-users] Node crash, filesytem not usable

2018-05-15 Thread Webert de Souza Lima
"osd_peering_wq_threads": "2", > "osd_recovery_thread_suicide_timeout": "300", > "osd_recovery_thread_timeout": "30", > "osd_remove_thread_suicide_timeout": "36000", > "osd_remove_th

Re: [ceph-users] ceph mds memory usage 20GB : is it normal ?

2018-05-14 Thread Webert de Souza Lima
On Sat, May 12, 2018 at 3:11 AM Alexandre DERUMIER wrote: > The documentation (luminous) say: > > >mds cache size > > > >Description:The number of inodes to cache. A value of 0 indicates an > unlimited number. It is recommended to use mds_cache_memory_limit to limit >

Re: [ceph-users] Question: CephFS + Bluestore

2018-05-11 Thread Webert de Souza Lima
; > On Fri, May 11, 2018 at 2:39 PM Webert de Souza Lima < > webert.b...@gmail.com> wrote: > >> I think ceph doesn't have IO metrics will filters by pool right? I see IO >> metrics from clients only: >> >> ceph_client_io_ops >> ceph_client_io_read

Re: [ceph-users] Question: CephFS + Bluestore

2018-05-11 Thread Webert de Souza Lima
e/read)_bytes(_total) Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* *IRC NICK - WebertRLZ* On Wed, May 9, 2018 at 2:23 PM Webert de Souza Lima <webert.b...@gmail.com> wrote: > Hey Jon! > > On Wed, May 9, 2018 at 12:11 PM, John Spray <jsp

Re: [ceph-users] Node crash, filesytem not usable

2018-05-11 Thread Webert de Souza Lima
This message seems to be very concerning: >mds0: Metadata damage detected but for the rest, the cluster seems still to be recovering. you could try to seep thing up with ceph tell, like: ceph tell osd.* injectargs --osd_max_backfills=10 ceph tell osd.* injectargs

Re: [ceph-users] ceph mds memory usage 20GB : is it normal ?

2018-05-11 Thread Webert de Souza Lima
;imported": 0, "imported_inodes": 0 } } Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* *IRC NICK - WebertRLZ* On Fri, May 11, 2018 at 3:13 PM Alexandre DERUMIER <aderum...@odiso.com> wrote: > Hi, > > I'm still seeing

Re: [ceph-users] howto: multiple ceph filesystems

2018-05-11 Thread Webert de Souza Lima
Basically what we're trying to figure out looks like what is being done here: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/020958.html But instead of using LIBRADOS to store EMAILs directly into RADOS we're still using CEPHFS for it, just figuring out if it makes sense to

Re: [ceph-users] Question: CephFS + Bluestore

2018-05-09 Thread Webert de Souza Lima
Hey Jon! On Wed, May 9, 2018 at 12:11 PM, John Spray wrote: > It depends on the metadata intensity of your workload. It might be > quite interesting to gather some drive stats on how many IOPS are > currently hitting your metadata pool over a week of normal activity. > Any

Re: [ceph-users] Can't get MDS running after a power outage

2018-03-29 Thread Webert de Souza Lima
I'd also try to boot up only one mds until it's fully up and running. Not both of them. Sometimes they go switching states between each other. Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* *IRC NICK - WebertRLZ* On Thu, Mar 29, 2018 at 7:32 AM, John Spray

Re: [ceph-users] CephFS very unstable with many small files

2018-02-25 Thread Stijn De Weirdt
hi, can you give soem more details on the setup? number and size of osds. are you using EC or not? and if so, what EC parameters? thanks, stijn On 02/26/2018 08:15 AM, Linh Vu wrote: > Sounds like you just need more RAM on your MDS. Ours have 256GB each, and the > OSD nodes have 128GB each.

Re: [ceph-users] CephFS very unstable with many small files

2018-02-25 Thread Stijn De Weirdt
hi oliver, >>> in preparation for production, we have run very successful tests with large >>> sequential data, >>> and just now a stress-test creating many small files on CephFS. >>> >>> We use a replicated metadata pool (4 SSDs, 4 replicas) and a data pool with >>> 6 hosts with 32 OSDs each,

Re: [ceph-users] Ceph Bluestore performance question

2018-02-18 Thread Stijn De Weirdt
hi oliver, the IPoIB network is not 56gb, it's probably a lot less (20gb or so). the ib_write_bw test is verbs/rdma based. do you have iperf tests between hosts, and if so, can you share those reuslts? stijn > we are just getting started with our first Ceph cluster (Luminous 12.2.2) and >

Re: [ceph-users] Luminous 12.2.2 OSDs with Bluestore crashing randomly

2018-01-31 Thread Alessandro De Salvo
, 2018 at 5:49 AM Alessandro De Salvo <alessandro.desa...@roma1.infn.it <mailto:alessandro.desa...@roma1.infn.it>> wrote: Hi, we have several times a day different OSDs running Luminous 12.2.2 and Bluestore crashing with errors like this: starting osd.2 at - osd_d

[ceph-users] Luminous 12.2.2 OSDs with Bluestore crashing randomly

2018-01-30 Thread Alessandro De Salvo
Hi, we have several times a day different OSDs running Luminous 12.2.2 and Bluestore crashing with errors like this: starting osd.2 at - osd_data /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal 2018-01-30 13:45:28.440883 7f1e193cbd00 -1 osd.2 107082 log_to_monitors {default=true}

Re: [ceph-users] ceph df shows 100% used

2018-01-22 Thread Webert de Souza Lima
Hi, On Fri, Jan 19, 2018 at 8:31 PM, zhangbingyin wrote: > 'MAX AVAIL' in the 'ceph df' output represents the amount of data that can > be used before the first OSD becomes full, and not the sum of all free > space across a set of OSDs. > Thank you very much. I

Re: [ceph-users] ceph df shows 100% used

2018-01-19 Thread Webert de Souza Lima
available space. Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* *IRC NICK - WebertRLZ* On Thu, Jan 18, 2018 at 8:21 PM, Webert de Souza Lima <webert.b...@gmail.com > wrote: > With the help of robbat2 and llua on IRC channel I was able to solve this &g

Re: [ceph-users] ceph df shows 100% used

2018-01-18 Thread Webert de Souza Lima
With the help of robbat2 and llua on IRC channel I was able to solve this situation by taking down the 2-OSD only hosts. After crush reweighting OSDs 8 and 23 from host mia1-master-fe02 to 0, ceph df showed the expected storage capacity usage (about 70%) With this in mind, those guys have told

Re: [ceph-users] ceph df shows 100% used

2018-01-18 Thread Webert de Souza Lima
*Belo Horizonte - Brasil* *IRC NICK - WebertRLZ* On Thu, Jan 18, 2018 at 8:05 PM, David Turner <drakonst...@gmail.com> wrote: > `ceph osd df` is a good command for you to see what's going on. Compare > the osd numbers with `ceph osd tree`. > > >> >> On Thu, Jan 1

Re: [ceph-users] ceph df shows 100% used

2018-01-18 Thread Webert de Souza Lima
Sorry I forgot, this is a ceph jewel 10.2.10 Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* *IRC NICK - WebertRLZ* ___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] ceph df shows 100% used

2018-01-18 Thread Webert de Souza Lima
Also, there is no quota set for the pools Here is "ceph osd pool get xxx all": http://termbin.com/ix0n Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* *IRC NICK - WebertRLZ* ___ ceph-users mailing list

[ceph-users] ceph df shows 100% used

2018-01-18 Thread Webert de Souza Lima
Hello, I'm running near-out-of service radosgw (very slow to write new objects) and I suspect it's because of ceph df is showing 100% usage in some pools, though I don't know what that information comes from. Pools: #~ ceph osd pool ls detail -> http://termbin.com/lsd0 Crush Rules (important

Re: [ceph-users] cephfs degraded on ceph luminous 12.2.2

2018-01-11 Thread Alessandro De Salvo
018 05:40 PM, Alessandro De Salvo wrote: > > Thanks Lincoln, > > > > indeed, as I said the cluster is recovering, so there are pending ops: > > > > > > pgs: 21.034% pgs not active > > 1692310/24980804 objects degraded (6.774%) >

Re: [ceph-users] luminous: HEALTH_ERR full ratio(s) out of order

2018-01-10 Thread Webert de Souza Lima
Good to know. I don't think this should trigger HEALTH_ERR though, but HEALTH_WARN makes sense. It makes sense to keep the backfillfull_ratio greater than nearfull_ratio as one might need backfilling to avoid OSD getting full on reweight operations. Regards, Webert Lima DevOps Engineer at MAV

Re: [ceph-users] 'lost' cephfs filesystem?

2018-01-10 Thread Webert de Souza Lima
On Wed, Jan 10, 2018 at 12:44 PM, Mark Schouten wrote: > > Thanks, that's a good suggestion. Just one question, will this affect > RBD- > > access from the same (client)host? i'm sorry that this didn't help. No, it does not affect rbd clients, as MDS is related only to cephfs.

Re: [ceph-users] 'lost' cephfs filesystem?

2018-01-10 Thread Webert de Souza Lima
know the simple solution is to just reboot the server, but the server > holds > quite a lot of VM's and Containers, so I'd prefer to fix this without a > reboot. > > Anybody with some clever ideas? :) > > -- > Kerio Operator in de Cloud? https://www.kerioindecloud.nl/ > M

Re: [ceph-users] cephfs degraded on ceph luminous 12.2.2

2018-01-08 Thread Alessandro De Salvo
+0100, Alessandro De Salvo wrote: Hi, I'm running on ceph luminous 12.2.2 and my cephfs suddenly degraded. I have 2 active mds instances and 1 standby. All the active instances are now in replay state and show the same error in the logs: mds1 2018-01-08 16:04:15.765637 7fc2e92451c0  0

[ceph-users] cephfs degraded on ceph luminous 12.2.2

2018-01-08 Thread Alessandro De Salvo
Hi, I'm running on ceph luminous 12.2.2 and my cephfs suddenly degraded. I have 2 active mds instances and 1 standby. All the active instances are now in replay state and show the same error in the logs: mds1 2018-01-08 16:04:15.765637 7fc2e92451c0  0 ceph version 12.2.2

Re: [ceph-users] Linux Meltdown (KPTI) fix and how it affects performance?

2018-01-05 Thread Stijn De Weirdt
e applied to guest- will affect >> librbd performance in the hypervisors. >> >> Does anybody have some information about how Meltdown or Spectre affect ceph >> OSDs and clients? >> >> Also, regarding Meltdown patch, seems to be a compilation option, meaning >>

[ceph-users] PGs stuck in "active+undersized+degraded+remapped+backfill_wait", recovery speed is extremely slow

2018-01-03 Thread ignaqui de la fila
Hello all, I have ceph Luminous setup with filestore and bluestore OSDs. This cluster was deployed initially as Hammer, than I upgraded it to Jewel and eventually to Luminous. It’s heterogenous, we have SSDs, SAS 15K and 7.2K HDDs in it (see crush map attached). Earlier I converted 7.2K HDD from

Re: [ceph-users] [luminous 12.2.2] Cluster write performance degradation problem(possibly tcmalloc related)

2017-12-22 Thread Webert de Souza Lima
On Thu, Dec 21, 2017 at 12:52 PM, shadow_lin wrote: > > After 18:00 suddenly the write throughput dropped and the osd latency > increased. TCmalloc started relcaim page heap freelist much more > frequently.All of this happened very fast and every osd had the indentical >

Re: [ceph-users] MDS locatiins

2017-12-22 Thread Webert de Souza Lima
it depends on how you use it. for me, it runs fine on the OSD hosts but the mds server consumes loads of RAM, so be aware of that. if the system load average goes too high due to osd disk utilization the MDS server might run into troubles too, as delayed response from the host could cause the MDS

Re: [ceph-users] cephfs mds millions of caps

2017-12-22 Thread Webert de Souza Lima
On Fri, Dec 22, 2017 at 3:20 AM, Yan, Zheng wrote: > idle client shouldn't hold so many caps. > i'll try to make it reproducible for you to test. yes. For now, it's better to run "echo 3 >/proc/sys/vm/drop_caches" > after cronjob finishes Thanks. I'll adopt that for now.

Re: [ceph-users] cephfs mds millions of caps

2017-12-21 Thread Webert de Souza Lima
Horizonte - Brasil* *IRC NICK - WebertRLZ* On Thu, Dec 21, 2017 at 11:55 AM, Yan, Zheng <uker...@gmail.com> wrote: > On Thu, Dec 21, 2017 at 7:33 PM, Webert de Souza Lima > <webert.b...@gmail.com> wrote: > > I have upgraded the kernel on a client node (one that has close-t

Re: [ceph-users] cephfs mds millions of caps

2017-12-21 Thread Webert de Souza Lima
id" : "admin" }, "replay_requests" : 0 }, still 1.4M caps used. is upgrading the client kernel enough ? Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* *IRC NICK - WebertRLZ* On Fri, Dec 15, 2017 at 11:16 AM, Webert de Souz

  1   2   3   >