Re: [ceph-users] Major ceph disaster

2019-05-13 Thread Lionel Bouton
Le 13/05/2019 à 16:20, Kevin Flöh a écrit : > Dear ceph experts, > > [...] We have 4 nodes with 24 osds each and use 3+1 erasure coding. [...] > Here is what happened: One osd daemon could not be started and > therefore we decided to mark the osd as lost and set it up from > scratch. Ceph started r

Re: [ceph-users] KVM+Ceph: Live migration of I/O-heavy VM

2018-12-11 Thread Lionel Bouton
Le 11/12/2018 à 15:51, Konstantin Shalygin a écrit : > >> Currently I plan a migration of a large VM (MS Exchange, 300 Mailboxes >> and 900GB DB) from qcow2 on ext4 (RAID1) to an all-flash Ceph luminous >> cluster (which already holds lot's of images). >> The server has access to both local and clu

Re: [ceph-users] Many concurrent drive failures - How do I activate pgs?

2018-05-31 Thread Lionel Bouton
On 31/05/2018 14:41, Simon Ironside wrote: > On 24/05/18 19:21, Lionel Bouton wrote: > >> Unfortunately I just learned that Supermicro found an incompatibility >> between this motherboard and SM863a SSDs (I don't have more information >> yet) and they propos

Re: [ceph-users] Many concurrent drive failures - How do I activate pgs?

2018-05-24 Thread Lionel Bouton
Hi, On 22/02/2018 23:32, Mike Lovell wrote: > hrm. intel has, until a year ago, been very good with ssds. the > description of your experience definitely doesn't inspire confidence. > intel also dropping the entire s3xxx and p3xxx series last year before > having a viable replacement has been driv

Re: [ceph-users] HW Raid vs. Multiple OSD

2017-11-13 Thread Lionel Bouton
Le 13/11/2017 à 15:47, Oscar Segarra a écrit : > Thanks Mark, Peter,  > > For clarification, the configuration with RAID5 is having many servers > (2 or more) with RAID5 and CEPH on top of it. Ceph will replicate data > between servers. Of course, each server will have just one OSD daemon > managin

Re: [ceph-users] dropping filestore+btrfs testing for luminous

2017-07-04 Thread Lionel Bouton
Le 04/07/2017 à 19:00, Jack a écrit : > You may just upgrade to Luminous, then replace filestore by bluestore You don't just "replace" filestore by bluestore on a production cluster : you transition over several weeks/months from the first to the second. The two must be rock stable and have predic

Re: [ceph-users] dropping filestore+btrfs testing for luminous

2017-07-04 Thread Lionel Bouton
Le 30/06/2017 à 18:48, Sage Weil a écrit : > On Fri, 30 Jun 2017, Lenz Grimmer wrote: >> Hi Sage, >> >> On 06/30/2017 05:21 AM, Sage Weil wrote: >> >>> The easiest thing is to >>> >>> 1/ Stop testing filestore+btrfs for luminous onward. We've recommended >>> against btrfs for a long time and are

Re: [ceph-users] slow requests and short OSD failures in small cluster

2017-04-18 Thread Lionel Bouton
Le 18/04/2017 à 11:24, Jogi Hofmüller a écrit : > Hi, > > thanks for all you comments so far. > > Am Donnerstag, den 13.04.2017, 16:53 +0200 schrieb Lionel Bouton: >> Hi, >> >> Le 13/04/2017 à 10:51, Peter Maloney a écrit : >>> Ceph snapshots relly

Re: [ceph-users] slow requests and short OSD failures in small cluster

2017-04-13 Thread Lionel Bouton
Le 13/04/2017 à 17:47, mj a écrit : > Hi, > > On 04/13/2017 04:53 PM, Lionel Bouton wrote: >> We use rbd snapshots on Firefly (and Hammer now) and I didn't see any >> measurable impact on performance... until we tried to remove them. > > What exactly do you m

Re: [ceph-users] slow requests and short OSD failures in small cluster

2017-04-13 Thread Lionel Bouton
Hi, Le 13/04/2017 à 10:51, Peter Maloney a écrit : > [...] > Also more things to consider... > > Ceph snapshots relly slow things down. We use rbd snapshots on Firefly (and Hammer now) and I didn't see any measurable impact on performance... until we tried to remove them. We usually have at l

Re: [ceph-users] Analysing ceph performance with SSD journal, 10gbe NIC and 2 replicas -Hammer release

2017-01-10 Thread Lionel Bouton
Hi, Le 10/01/2017 à 19:32, Brian Andrus a écrit : > [...] > > > I think the main point I'm trying to address is - as long as the > backing OSD isn't egregiously handling large amounts of writes and it > has a good journal in front of it (that properly handles O_DSYNC [not > D_SYNC as Sebastien's a

Re: [ceph-users] Analysing ceph performance with SSD journal, 10gbe NIC and 2 replicas -Hammer release

2017-01-07 Thread Lionel Bouton
Le 07/01/2017 à 14:11, kevin parrikar a écrit : > Thanks for your valuable input. > We were using these SSD in our NAS box(synology) and it was giving > 13k iops for our fileserver in raid1.We had a few spare disks which we > added to our ceph nodes hoping that it will give good performance same >

Re: [ceph-users] Analysing ceph performance with SSD journal, 10gbe NIC and 2 replicas -Hammer release

2017-01-07 Thread Lionel Bouton
Hi, Le 07/01/2017 à 04:48, kevin parrikar a écrit : > i really need some help here :( > > replaced all 7.2 rpm SAS disks with new Samsung 840 evo 512Gb SSD with > no seperate journal Disk .Now both OSD nodes are with 2 ssd disks > with a replica of *2* . > Total number of OSD process in the clust

Re: [ceph-users] how possible is that ceph cluster crash

2016-11-19 Thread Lionel Bouton
Le 19/11/2016 à 00:52, Brian :: a écrit : > This is like your mother telling not to cross the road when you were 4 > years of age but not telling you it was because you could be flattened > by a car :) > > Can you expand on your answer? If you are in a DC with AB power, > redundant UPS, dual feed f

Re: [ceph-users] ceph OSD with 95% full

2016-07-19 Thread Lionel Bouton
Hi, On 19/07/2016 13:06, Wido den Hollander wrote: >> Op 19 juli 2016 om 12:37 schreef M Ranga Swami Reddy : >> >> >> Thanks for the correction...so even one OSD reaches to 95% full, the >> total ceph cluster IO (R/W) will be blocked...Ideally read IO should >> work... > That should be a config op

Re: [ceph-users] Fwd: Ceph OSD suicide himself

2016-07-12 Thread Lionel Bouton
Hi, Le 12/07/2016 02:51, Brad Hubbard a écrit : > [...] This is probably a fragmentation problem : typical rbd access patterns cause heavy BTRFS fragmentation. >>> To the extent that operations take over 120 seconds to complete? Really? >> Yes, really. I had these too. By default Ceph/R

Re: [ceph-users] Fwd: Ceph OSD suicide himself

2016-07-11 Thread Lionel Bouton
Le 11/07/2016 11:56, Brad Hubbard a écrit : > On Mon, Jul 11, 2016 at 7:18 PM, Lionel Bouton > wrote: >> Le 11/07/2016 04:48, 한승진 a écrit : >>> Hi cephers. >>> >>> I need your help for some issues. >>> >>> The ceph cluster version is Jewel

Re: [ceph-users] Fwd: Ceph OSD suicide himself

2016-07-11 Thread Lionel Bouton
Le 11/07/2016 04:48, 한승진 a écrit : > Hi cephers. > > I need your help for some issues. > > The ceph cluster version is Jewel(10.2.1), and the filesytem is btrfs. > > I run 1 Mon and 48 OSD in 4 Nodes(each node has 12 OSDs). > > I've experienced one of OSDs was killed himself. > > Always it issued s

Re: [ceph-users] pg scrub and auto repair in hammer

2016-06-29 Thread Lionel Bouton
Hi, Le 29/06/2016 18:33, Stefan Priebe - Profihost AG a écrit : >> Am 28.06.2016 um 09:43 schrieb Lionel Bouton >> : >> >> Hi, >> >> Le 28/06/2016 08:34, Stefan Priebe - Profihost AG a écrit : >>> [...] >>> Yes but at least BTRFS is still no

Re: [ceph-users] Another cluster completely hang

2016-06-29 Thread Lionel Bouton
Hi, Le 29/06/2016 12:00, Mario Giammarco a écrit : > Now the problem is that ceph has put out two disks because scrub has > failed (I think it is not a disk fault but due to mark-complete) There is something odd going on. I've only seen deep-scrub failing (ie detect one inconsistency and marking

Re: [ceph-users] pg scrub and auto repair in hammer

2016-06-28 Thread Lionel Bouton
Hi, Le 28/06/2016 08:34, Stefan Priebe - Profihost AG a écrit : > [...] > Yes but at least BTRFS is still not working for ceph due to > fragmentation. I've even tested a 4.6 kernel a few weeks ago. But it > doubles it's I/O after a few days. BTRFS autodefrag is not working over the long term. Tha

Re: [ceph-users] Pinpointing performance bottleneck / would SSD journals help?

2016-06-27 Thread Lionel Bouton
Le 27/06/2016 17:42, Daniel Schneller a écrit : > Hi! > > We are currently trying to pinpoint a bottleneck and are somewhat stuck. > > First things first, this is the hardware setup: > > 4x DELL PowerEdge R510, 12x4TB OSD HDDs, journal colocated on HDD > 96GB RAM, 2x6 Cores + HT > 2x1GbE bonded i

Re: [ceph-users] dense storage nodes

2016-05-18 Thread Lionel Bouton
Hi, I'm not yet familiar with Jewel, so take this with a grain of salt. Le 18/05/2016 16:36, Benjeman Meekhof a écrit : > We're in process of tuning a cluster that currently consists of 3 > dense nodes with more to be added. The storage nodes have spec: > - Dell R730xd 2 x Xeon E5-2650 v3 @ 2.30

Re: [ceph-users] Deprecating ext4 support

2016-04-11 Thread Lionel Bouton
Le 12/04/2016 01:40, Lindsay Mathieson a écrit : > On 12/04/2016 9:09 AM, Lionel Bouton wrote: >> * If the journal is not on a separate partition (SSD), it should >> definitely be re-created NoCoW to avoid unnecessary fragmentation. From >> memory : stop OSD, touch j

Re: [ceph-users] Deprecating ext4 support

2016-04-11 Thread Lionel Bouton
Hi, Le 11/04/2016 23:57, Mark Nelson a écrit : > [...] > To add to this on the performance side, we stopped doing regular > performance testing on ext4 (and btrfs) sometime back around when ICE > was released to focus specifically on filestore behavior on xfs. > There were some cases at the time

Re: [ceph-users] ZFS or BTRFS for performance?

2016-03-20 Thread Lionel Bouton
Hi, Le 20/03/2016 15:23, Francois Lafont a écrit : > Hello, > > On 20/03/2016 04:47, Christian Balzer wrote: > >> That's not protection, that's an "uh-oh, something is wrong, you better >> check it out" notification, after which you get to spend a lot of time >> figuring out which is the good repl

Re: [ceph-users] ZFS or BTRFS for performance?

2016-03-19 Thread Lionel Bouton
Le 19/03/2016 18:38, Heath Albritton a écrit : > If you google "ceph bluestore" you'll be able to find a couple slide > decks on the topic. One of them by Sage is easy to follow without the > benefit of the presentation. There's also the " Redhat Ceph Storage > Roadmap 2016" deck. > > In any case

Re: [ceph-users] ZFS or BTRFS for performance?

2016-03-18 Thread Lionel Bouton
Hi, Le 18/03/2016 20:58, Mark Nelson a écrit : > FWIW, from purely a performance perspective Ceph usually looks pretty > fantastic on a fresh BTRFS filesystem. In fact it will probably > continue to look great until you do small random writes to large > objects (like say to blocks in an RBD volum

Re: [ceph-users] Help: pool not responding

2016-02-29 Thread Lionel Bouton
Le 29/02/2016 22:50, Shinobu Kinjo a écrit : >> the fact that they are optimized for benchmarks and certainly not >> Ceph OSD usage patterns (with or without internal journal). > Are you assuming that SSHD is causing the issue? > If you could elaborate on this more, it would be helpful. Probably n

Re: [ceph-users] Help: pool not responding

2016-02-29 Thread Lionel Bouton
Le 29/02/2016 20:43, Mario Giammarco a écrit : > [...] > I said SSHD that is a standard hdd with ssd cache. It is 7200rpms but in > benchmarks it is better than a 1rpm disk. Lies, damn lies and benchmarks... SSHD usually have very small flash caches (16GB or less for 500GB of data or more) and

Re: [ceph-users] How to properly deal with NEAR FULL OSD

2016-02-19 Thread Lionel Bouton
Le 19/02/2016 17:17, Don Laursen a écrit : > > Thanks. To summarize > > Your data, images+volumes = 27.15% space used > > Raw used = 81.71% used > > > > This is a big difference that I can’t account for? Can anyone? So is > your cluster actually full? > I believe this is the pool size being acco

Re: [ceph-users] Reducing the impact of OSD restarts (noout ain't uptosnuff)

2016-02-13 Thread Lionel Bouton
Hi, Le 13/02/2016 15:52, Christian Balzer a écrit : > [..] > > Hum that's surprisingly long. How much data (size and nb of files) do > you have on this OSD, which FS do you use, what are the mount options, > what is the hardware and the kind of access ? > > I already mentioned the HW, Areca RAID c

Re: [ceph-users] Reducing the impact of OSD restarts (noout ain't uptosnuff)

2016-02-13 Thread Lionel Bouton
Le 13/02/2016 06:31, Christian Balzer a écrit : > [...] > --- > So from shutdown to startup about 2 seconds, not that bad. > However here is where the cookie crumbles massively: > --- > 2016-02-12 01:33:50.263152 7f75be4d57c0 0 filestore(/var/lib/ceph/osd/ceph-2) limited size xattrs > 2016-02-12 0

Re: [ceph-users] Increasing time to save RGW objects

2016-02-09 Thread Lionel Bouton
Le 09/02/2016 20:18, Lionel Bouton a écrit : > Le 09/02/2016 20:07, Kris Jurka a écrit : >> >> On 2/9/2016 10:11 AM, Lionel Bouton wrote: >> >>> Actually if I understand correctly how PG splitting works the next spike >>> should be times smaller and sprea

Re: [ceph-users] Increasing time to save RGW objects

2016-02-09 Thread Lionel Bouton
Le 09/02/2016 20:07, Kris Jurka a écrit : > > > On 2/9/2016 10:11 AM, Lionel Bouton wrote: > >> Actually if I understand correctly how PG splitting works the next spike >> should be times smaller and spread over times the period (where >> is the number of subd

Re: [ceph-users] Increasing time to save RGW objects

2016-02-09 Thread Lionel Bouton
Le 09/02/2016 19:11, Lionel Bouton a écrit : > Actually if I understand correctly how PG splitting works the next spike > should be times smaller and spread over times the period (where > is the number of subdirectories created during each split which > seems to be 15 typo : 16 >

Re: [ceph-users] Increasing time to save RGW objects

2016-02-09 Thread Lionel Bouton
Hi, Le 09/02/2016 17:07, Kris Jurka a écrit : > > > On 2/8/2016 9:16 AM, Gregory Farnum wrote: >> On Mon, Feb 8, 2016 at 8:49 AM, Kris Jurka wrote: >>> >>> I've been testing the performance of ceph by storing objects through >>> RGW. >>> This is on Debian with Hammer using 40 magnetic OSDs, 5 mon

Re: [ceph-users] K is for Kraken

2016-02-08 Thread Lionel Bouton
Le 08/02/2016 20:09, Robert LeBlanc a écrit : > Too bad K isn't an LTS. It was be fun to release the Kraken many times. Kraken is an awesome release name ! How I will miss being able to say/write to our clients that we just released the Kraken on their infra :-/ Lionel ___

Re: [ceph-users] SSD Journal

2016-01-29 Thread Lionel Bouton
Le 29/01/2016 16:25, Jan Schermer a écrit : > > [...] > > > But if I understand correctly, there is indeed a log of the recent > modifications in the filestore which is used when a PG is recovering > because another OSD is lagging behind (not when Ceph reports a full > backfill

Re: [ceph-users] SSD Journal

2016-01-29 Thread Lionel Bouton
Le 29/01/2016 01:12, Jan Schermer a écrit : > [...] >> Second I'm not familiar with Ceph internals but OSDs must make sure that >> their PGs are synced so I was under the impression that the OSD content for >> a PG on the filesystem should always be guaranteed to be on all the other >> active OS

Re: [ceph-users] SSD Journal

2016-01-28 Thread Lionel Bouton
Le 28/01/2016 22:32, Jan Schermer a écrit : > P.S. I feel very strongly that this whole concept is broken > fundamentaly. We already have a journal for the filesystem which is > time proven, well behaved and above all fast. Instead there's this > reinvented wheel which supposedly does it better in

[ceph-users] Repository with some internal utils

2016-01-19 Thread Lionel Bouton
Hi, someone asked me if he could get access to the BTRFS defragmenter we used for our Ceph OSDs. I took a few minutes to put together a small github repository with : - the defragmenter I've been asked about (tested on 7200 rpm drives and designed to put low IO load on them), - the scrub scheduler

Re: [ceph-users] Ceph cache tier and rbd volumes/SSD primary, HDD replica crush rule!

2016-01-12 Thread Lionel Bouton
Le 12/01/2016 18:27, Mihai Gheorghe a écrit : > One more question. Seeing that cache tier holds data on it untill it > reaches % ratio, i suppose i must set replication to 2 or higher on > the cache pool to not lose hot data not writen to the cold storage in > case of a drive failure, right? > > A

Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio results

2015-12-23 Thread Lionel Bouton
Le 23/12/2015 18:37, Mart van Santen a écrit : > So, maybe you are right and is the HBA the bottleneck (LSI Logic / > Symbios Logic MegaRAID SAS 2108). Under all cirumstances, I do not get > close to the numbers of the PM863 quoted by Sebastien. But his site > does not state what kind of HBA he is

Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio results

2015-12-23 Thread Lionel Bouton
Le 23/12/2015 16:18, Mart van Santen a écrit : > Hi all, > > > On 12/22/2015 01:55 PM, Wido den Hollander wrote: >> On 22-12-15 13:43, Andrei Mikhailovsky wrote: >>> Hello guys, >>> >>> Was wondering if anyone has done testing on Samsung PM863 120 GB version to >>> see how it performs? IMHO the 48

Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio results

2015-12-22 Thread Lionel Bouton
Le 22/12/2015 17:36, Tyler Bishop a écrit : > Write endurance is kinda bullshit. > > We have crucial 960gb drives storing data and we've only managed to take 2% > off the drives life in the period of a year and hundreds of tb written weekly. This is not really helpful without more context. This

Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio results

2015-12-22 Thread Lionel Bouton
Le 22/12/2015 13:43, Andrei Mikhailovsky a écrit : > Hello guys, > > Was wondering if anyone has done testing on Samsung PM863 120 GB version to > see how it performs? IMHO the 480GB version seems like a waste for the > journal as you only need to have a small disk size to fit 3-4 osd journals.

[ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio results

2015-12-21 Thread Lionel Bouton
Hi, Sébastien Han just added the test results I reported for these SSDs on the following page : http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ The table in the original post has the most important numbers and more details can be found in

Re: [ceph-users] SSD only pool without journal

2015-12-17 Thread Lionel Bouton
Hi, Le 17/12/2015 16:47, Misa a écrit : > Hello everyone, > > does it make sense to create SSD only pool from OSDs without journal? No, because AFAIK you can't have OSDs without journals yet. IIRC there is work done for alternate stores where you wouldn't need journals anymore but it's not yet pr

Re: [ceph-users] Global, Synchronous Blocked Requests

2015-11-28 Thread Lionel Bouton
Hi, Le 28/11/2015 04:24, Brian Felton a écrit : > Greetings Ceph Community, > > We are running a Hammer cluster (0.94.3-1) in production that recently > experienced asymptotic performance degradation. We've been migrating > data from an older non-Ceph cluster at a fairly steady pace for the > pas

Re: [ceph-users] Scrubbing question

2015-11-26 Thread Lionel Bouton
Le 26/11/2015 15:53, Tomasz Kuzemko a écrit : > ECC will not be able to recover the data, but it will always be able to > detect that data is corrupted. No. That's a theoretical impossibility as the detection is done by some kind of hash over the memory content which brings the possibility of hash

Re: [ceph-users] CEPH over SW-RAID

2015-11-23 Thread Lionel Bouton
Le 23/11/2015 21:58, Jose Tavares a écrit : > > AFAIK, people are complaining about lots os bad blocks in the new big > disks. The hardware list seems to be small and unable to replace > theses blocks. Note that if by big disks you mean SMR-based disks, they can exhibit what looks like bad blocks

Re: [ceph-users] CEPH over SW-RAID

2015-11-23 Thread Lionel Bouton
Le 23/11/2015 21:01, Jose Tavares a écrit : > > > > > My new question regarding Ceph is if it isolates this bad sectors where > it found bad data when scrubbing? or there will be always a replica of > something over a known bad block..? > Ceph OSDs don't know about bad sectors, they deleg

Re: [ceph-users] CEPH over SW-RAID

2015-11-23 Thread Lionel Bouton
Le 23/11/2015 19:58, Jose Tavares a écrit : > > > On Mon, Nov 23, 2015 at 4:15 PM, Lionel Bouton > <mailto:lionel-subscript...@bouton.name>> wrote: > > Hi, > > Le 23/11/2015 18:37, Jose Tavares a écrit : > > Yes, but with SW-RAID, when we

Re: [ceph-users] CEPH over SW-RAID

2015-11-23 Thread Lionel Bouton
Hi, Le 23/11/2015 18:37, Jose Tavares a écrit : > Yes, but with SW-RAID, when we have a block that was read and does not match > its checksum, the device falls out of the array I don't think so. Under normal circumstances a device only falls out of a md array if it doesn't answer IO queries afte

Re: [ceph-users] CEPH over SW-RAID

2015-11-23 Thread Lionel Bouton
Le 23/11/2015 18:17, Jan Schermer a écrit : > SW-RAID doesn't help with bit-rot if that's what you're afraid of. > If you are afraid bit-rot you need to use a fully checksumming filesystem > like ZFS. > Ceph doesn't help there either when using replicas - not sure how strong > error detection+cor

Re: [ceph-users] O_DIRECT on deep-scrub read

2015-10-08 Thread Lionel Bouton
Le 07/10/2015 13:44, Paweł Sadowski a écrit : > Hi, > > Can anyone tell if deep scrub is done using O_DIRECT flag or not? I'm > not able to verify that in source code. > > If not would it be possible to add such feature (maybe config option) to > help keeping Linux page cache in better shape? Note

Re: [ceph-users] Simultaneous CEPH OSD crashes

2015-10-03 Thread Lionel Bouton
Hi, Le 29/09/2015 19:06, Samuel Just a écrit : > It's an EIO. The osd got an EIO from the underlying fs. That's what > causes those asserts. You probably want to redirect to the relevant > fs maling list. Thanks. I didn't get any answer on this from BTRFS developers yet. The problem seems har

Re: [ceph-users] Predict performance

2015-10-02 Thread Lionel Bouton
Hi, Le 02/10/2015 18:15, Christian Balzer a écrit : > Hello, > On Fri, 2 Oct 2015 15:31:11 +0200 Javier C.A. wrote: > > Firstly, this has been discussed countless times here. > For one of the latest recurrences, check the archive for: > > "calculating maximum number of disk and node failure that c

Re: [ceph-users] Simultaneous CEPH OSD crashes

2015-09-29 Thread Lionel Bouton
Le 27/09/2015 10:25, Lionel Bouton a écrit : > Le 27/09/2015 09:15, Lionel Bouton a écrit : >> Hi, >> >> we just had a quasi simultaneous crash on two different OSD which >> blocked our VMs (min_size = 2, size = 3) on Firefly 0.80.9. >> >> the first OSD to go

Re: [ceph-users] Issue with journal on another drive

2015-09-29 Thread Lionel Bouton
Hi, Le 29/09/2015 13:32, Jiri Kanicky a écrit : > Hi Lionel. > > Thank you for your reply. In this case I am considering to create > separate partitions for each disk on the SSD drive. Would be good to > know what is the performance difference, because creating partitions > is kind of waste of spa

Re: [ceph-users] Issue with journal on another drive

2015-09-29 Thread Lionel Bouton
Le 29/09/2015 07:29, Jiri Kanicky a écrit : > Hi, > > Is it possible to create journal in directory as explained here: > http://wiki.skytech.dk/index.php/Ceph_-_howto,_rbd,_lvm,_cluster#Add.2Fmove_journal_in_running_cluster Yes, the general idea (stop, flush, move, update ceph.conf, mkjournal, sta

Re: [ceph-users] Simultaneous CEPH OSD crashes

2015-09-27 Thread Lionel Bouton
Le 27/09/2015 09:15, Lionel Bouton a écrit : > Hi, > > we just had a quasi simultaneous crash on two different OSD which > blocked our VMs (min_size = 2, size = 3) on Firefly 0.80.9. > > the first OSD to go down had this error : > > 2015-09-27 06:30:33.257133 7f7ac7fef70

[ceph-users] Simultaneous CEPH OSD crashes

2015-09-27 Thread Lionel Bouton
. I made copies of the ceph osd logs (including the stack trace and the recent events) if needed. Can anyone put some light on why these OSDs died ? Best regards, Lionel Bouton ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] question on reusing OSD

2015-09-15 Thread Lionel Bouton
Le 16/09/2015 01:21, John-Paul Robinson a écrit : > Hi, > > I'm working to correct a partitioning error from when our cluster was > first installed (ceph 0.56.4, ubuntu 12.04). This left us with 2TB > partitions for our OSDs, instead of the 2.8TB actually available on > disk, a 29% space hit. (Th

Re: [ceph-users] Hammer reduce recovery impact

2015-09-10 Thread Lionel Bouton
Le 11/09/2015 01:24, Lincoln Bryant a écrit : > On 9/10/2015 5:39 PM, Lionel Bouton wrote: >> For example deep-scrubs were a problem on our installation when at >> times there were several going on. We implemented a scheduler that >> enforces limits on simultaneous deep-scru

Re: [ceph-users] Hammer reduce recovery impact

2015-09-10 Thread Lionel Bouton
Le 11/09/2015 00:20, Robert LeBlanc a écrit : > I don't think the script will help our situation as it is just setting > osd_max_backfill from 1 to 0. It looks like that change doesn't go > into effect until after it finishes the PG. That was what I was afraid of. Note that it should help a little

Re: [ceph-users] Hammer reduce recovery impact

2015-09-10 Thread Lionel Bouton
Le 10/09/2015 22:56, Robert LeBlanc a écrit : > We are trying to add some additional OSDs to our cluster, but the > impact of the backfilling has been very disruptive to client I/O and > we have been trying to figure out how to reduce the impact. We have > seen some client I/O blocked for more than

[ceph-users] backfilling on a single OSD and caching controllers

2015-09-09 Thread Lionel Bouton
Hi, just a tip I just validated on our hardware. I'm currently converting an OSD from xfs with journal on same platter to btrfs with journal on SSD. To avoid any unwanted movement, I reused the same OSD number, weight and placement : so Ceph is simply backfilling all PGs previously stored on the o

Re: [ceph-users] Corruption of file systems on RBD images

2015-09-02 Thread Lionel Bouton
Le 02/09/2015 18:16, Mathieu GAUTHIER-LAFAYE a écrit : > Hi Lionel, > > - Original Message - >> From: "Lionel Bouton" >> To: "Mathieu GAUTHIER-LAFAYE" , >> ceph-us...@ceph.com >> Sent: Wednesday, 2 September, 2015 4:40:26 PM >>

Re: [ceph-users] Corruption of file systems on RBD images

2015-09-02 Thread Lionel Bouton
Hi Mathieu, Le 02/09/2015 14:10, Mathieu GAUTHIER-LAFAYE a écrit : > Hi All, > > We have some troubles regularly with virtual machines using RBD storage. When > we restart some virtual machines, they starts to do some filesystem checks. > Sometime it can rescue it, sometime the virtual machine d

Re: [ceph-users] EXT4 for Production and Journal Question?

2015-08-24 Thread Lionel Bouton
Le 24/08/2015 19:34, Robert LeBlanc a écrit : > Building off a discussion earlier this month [1], how "supported" is > EXT4 for OSDs? It seems that some people are getting good results with > it and I'll be testing it in our environment. > > The other question is if the EXT4 journal is even necessa

Re: [ceph-users] Ceph for multi-site operation

2015-08-24 Thread Lionel Bouton
Le 24/08/2015 15:11, Julien Escario a écrit : > Hello, > First, let me advise I'm really a noob with Cephsince I have only read some > documentation. > > I'm now trying to deploy a Ceph cluster for testing purposes. The cluster is > based on 3 (more if necessary) hypervisors running proxmox 3.4. >

Re: [ceph-users] btrfs w/ centos 7.1

2015-08-07 Thread Lionel Bouton
Le 07/08/2015 22:05, Ben Hines a écrit : > Howdy, > > The Ceph docs still say btrfs is 'experimental' in one section, but > say it's the long term ideal for ceph in the later section. Is this > still accurate with Hammer? Is it mature enough on centos 7.1 for > production use? > > (kernel is 3.10.

Re: [ceph-users] CephFS vs RBD

2015-07-22 Thread Lionel Bouton
Le 22/07/2015 21:17, Lincoln Bryant a écrit : > Hi Hadi, > > AFAIK, you can’t safely mount RBD as R/W on multiple machines. You > could re-export the RBD as NFS, but that’ll introduce a bottleneck and > probably tank your performance gains over CephFS. > > For what it’s worth, some of our RBDs are

Re: [ceph-users] how to recover from: 1 pgs down; 10 pgs incomplete; 10 pgs stuck inactive; 10 pgs stuck unclean

2015-07-15 Thread Lionel Bouton
Le 15/07/2015 10:55, Jelle de Jong a écrit : > On 13/07/15 15:40, Jelle de Jong wrote: >> I was testing a ceph cluster with osd_pool_default_size = 2 and while >> rebuilding the OSD on one ceph node a disk in an other node started >> getting read errors and ceph kept taking the OSD down, and instea

Re: [ceph-users] Issue with journal on another drive

2015-07-13 Thread Lionel Bouton
On 07/14/15 00:08, Rimma Iontel wrote: > Hi all, > > [...] > Is there something that needed to be done to journal partition to > enable sharing between multiple OSDs? Or is there something else > that's causing the isssue? > IIRC you can't share a volume between multiple OSDs. What you could do i

Re: [ceph-users] Real world benefit from SSD Journals for a more read than write cluster

2015-07-12 Thread Lionel Bouton
On 07/12/15 05:55, Alex Gorbachev wrote: > FWIW. Based on the excellent research by Mark Nelson > (http://ceph.com/community/ceph-performance-part-2-write-throughput-without-ssd-journals/) > we have dropped SSD journals altogether, and instead went for the > battery protected controller writeback c

Re: [ceph-users] How to prefer faster disks in same pool

2015-07-10 Thread Lionel Bouton
On 07/10/15 02:13, Christoph Adomeit wrote: > Hi Guys, > > I have a ceph pool that is mixed with 10k rpm disks and 7.2 k rpm disks. > > There are 85 osds and 10 of them are 10k > Size is not an issue, the pool is filled only 20% > > I want to somehow prefer the 10 k rpm disks so that they get more

Re: [ceph-users] FW: Ceph data locality

2015-07-07 Thread Lionel Bouton
On 07/07/15 18:20, Dmitry Meytin wrote: > Exactly because of that issue I've reduced the number of Ceph replications to > 2 and the number of HDFS copies is also 2 (so we're talking about 4 copies). > I want (but didn't tried yet) to change Ceph replication to 1 and change HDFS > back to 3. You

Re: [ceph-users] FW: Ceph data locality

2015-07-07 Thread Lionel Bouton
On 07/07/15 17:41, Dmitry Meytin wrote: > Hi Lionel, > Thanks for the answer. > The missing info: > 1) Ceph 0.80.9 "Firefly" > 2) map-reduce makes sequential reads of blocks of 64MB (or 128 MB) > 3) HDFS which is running on top of Ceph is replicating data for 3 times > between VMs which could be l

Re: [ceph-users] FW: Ceph data locality

2015-07-07 Thread Lionel Bouton
Hi Dmitry, On 07/07/15 14:42, Dmitry Meytin wrote: > Hi Christian, > Thanks for the thorough explanation. > My case is Elastic Map Reduce on top of OpenStack with Ceph backend for > everything (block, object, images). > With default configuration, performance is 300% worse than bare metal. > I di

Re: [ceph-users] Ceph Journal Disk Size

2015-07-02 Thread Lionel Bouton
On 07/02/15 19:13, Shane Gibson wrote: > > Lionel - thanks for the feedback ... inline below ... > > On 7/2/15, 9:58 AM, "Lionel Bouton" <mailto:lionel+c...@bouton.name>> wrote: > > > Ouch. These spinning disks are probably a bottleneck: there are &

Re: [ceph-users] Ceph Journal Disk Size

2015-07-02 Thread Lionel Bouton
On 07/02/15 18:27, Shane Gibson wrote: > > On 7/2/15, 9:21 AM, "Nate Curry" > wrote: > > Are you using the 4TB disks for the journal? > > > Nate - yes, at the moment the Journal is on 4 TB 7200 rpm disks as > well as the OSDS. It's what I've got for hardware ... si

Re: [ceph-users] Where does 130IOPS come from?

2015-07-02 Thread Lionel Bouton
On 07/02/15 17:53, Steffen Tilsch wrote: > > Hello Cephers, > > Whenever I read about HDDs for OSDs it is told that "they will deliver > around 130 IOPS". > Where does this number come from and how it was measured (random/seq, > how big where the IOs, which queue-dephat what latency) or is it more

Re: [ceph-users] any recommendation of using EnhanceIO?

2015-07-02 Thread Lionel Bouton
On 07/02/15 13:49, German Anders wrote: > output from iostat: > > CEPHOSD01: > > Device: rrqm/s wrqm/s r/s w/srMB/swMB/s > avgrq-sz avgqu-sz await r_await w_await svctm %util > sdc(ceph-0) 0.00 0.001.00 389.00 0.0035.98 > 188.9660.32 120.1

Re: [ceph-users] any recommendation of using EnhanceIO?

2015-07-02 Thread Lionel Bouton
On 07/02/15 12:48, German Anders wrote: > The idea is to cache rbd at a host level. Also could be possible to > cache at the osd level. We have high iowait and we need to lower it a > bit, since we are getting the max from our sas disks 100-110 iops per > disk (3TB osd's), any advice? Flashcache?

Re: [ceph-users] Unexpected issues with simulated 'rack' outage

2015-06-24 Thread Lionel Bouton
On 06/24/15 14:44, Romero Junior wrote: > > Hi, > > > > We are setting up a test environment using Ceph as the main storage > solution for my QEMU-KVM virtualization platform, and everything works > fine except for the following: > > > > When I simulate a failure by powering off the switches on

Re: [ceph-users] Unexpected disk write activity with btrfs OSDs

2015-06-23 Thread Lionel Bouton
On 06/23/15 11:43, Gregory Farnum wrote: > On Tue, Jun 23, 2015 at 9:50 AM, Erik Logtenberg wrote: >> Thanks! >> >> Just so I understand correctly, the btrfs snapshots are mainly useful if >> the journals are on the same disk as the osd, right? Is it indeed safe >> to turn them off if the journals

Re: [ceph-users] Unexpected disk write activity with btrfs OSDs

2015-06-22 Thread Lionel Bouton
On 06/22/15 17:21, Erik Logtenberg wrote: > I have the journals on a separate disk too. How do you disable the > snapshotting on the OSD? http://ceph.com/docs/master/rados/configuration/filestore-config-ref/ : filestore btrfs snap = false ___ ceph-users

Re: [ceph-users] Unexpected disk write activity with btrfs OSDs

2015-06-22 Thread Lionel Bouton
On 06/19/15 13:23, Erik Logtenberg wrote: > I believe this may be the same issue I reported some time ago, which is > as of yet unsolved. > > https://www.mail-archive.com/ceph-users@lists.ceph.com/msg19770.html > > I used strace to figure out that the OSD's were doing an incredible > amount of getx

Re: [ceph-users] Unexpected disk write activity with btrfs OSDs

2015-06-22 Thread Lionel Bouton
On 06/22/15 11:27, Jan Schermer wrote: > I don’t run Ceph on btrfs, but isn’t this related to the btrfs > snapshotting feature ceph uses to ensure a consistent journal? It's possible: if I understand correctly the code, the btrfs filestore backend creates a snapshot when syncing the journal. I'm a

Re: [ceph-users] Fwd: Re: Unexpected disk write activity with btrfs OSDs

2015-06-19 Thread Lionel Bouton
On 06/19/15 13:42, Burkhard Linke wrote: > > Forget the reply to the list... > > Forwarded Message > Subject: Re: [ceph-users] Unexpected disk write activity with btrfs OSDs > Date: Fri, 19 Jun 2015 09:06:33 +0200 > From: Burkhard Linke

Re: [ceph-users] Unexpected disk write activity with btrfs OSDs

2015-06-18 Thread Lionel Bouton
btrfs. On 06/18/15 23:28, Lionel Bouton wrote: > Hi, > > I've just noticed an odd behaviour with the btrfs OSDs. We monitor the > amount of disk writes on each device, our granularity is 10s (every 10s > the monitoring system collects the total amount of sector written and > w

[ceph-users] Unexpected disk write activity with btrfs OSDs

2015-06-18 Thread Lionel Bouton
Hi, I've just noticed an odd behaviour with the btrfs OSDs. We monitor the amount of disk writes on each device, our granularity is 10s (every 10s the monitoring system collects the total amount of sector written and write io performed since boot and computes both the B/s and IO/s). With only res

Re: [ceph-users] Is Ceph right for me?

2015-06-11 Thread Lionel Bouton
On 05/20/15 23:34, Trevor Robinson - Key4ce wrote: > > Hello, > > > > Could somebody please advise me if Ceph is suitable for our use? > > > > We are looking for a file system which is able to work over different > locations which are connected by VPN. If one locations was to go > offline then

Re: [ceph-users] Discuss: New default recovery config settings

2015-06-01 Thread Lionel Bouton
On 06/01/15 09:43, Jan Schermer wrote: > We had to disable deep scrub or the cluster would me unusable - we need to > turn it back on sooner or later, though. > With minimal scrubbing and recovery settings, everything is mostly good. > Turned out many issues we had were due to too few PGs - once

Re: [ceph-users] Performance and CPU load on HP servers running ceph (DL380 G6, should apply to others too)

2015-05-26 Thread Lionel Bouton
On 05/26/15 10:06, Jan Schermer wrote: > Turbo Boost will not hurt performance. Unless you have 100% load on > all cores it will actually improve performance (vastly, in terms of > bursty workloads). > The issue you have could be related to CPU cores going to sleep mode. Another possibility is tha

Re: [ceph-users] Btrfs defragmentation

2015-05-12 Thread Lionel Bouton
On 05/06/15 20:28, Lionel Bouton wrote: > Hi, > > On 05/06/15 20:07, Timofey Titovets wrote: >> 2015-05-06 20:51 GMT+03:00 Lionel Bouton : >>> Is there something that would explain why initially Btrfs creates the >>> 4MB files with 128k extents (32 extent

Re: [ceph-users] Btrfs defragmentation

2015-05-07 Thread Lionel Bouton
Hi, On 05/07/15 12:30, Burkhard Linke wrote: > [...] > Part of the OSD boot up process is also the handling of existing > snapshots and journal replay. I've also had several btrfs based OSDs > that took up to 20-30 minutes to start, especially after a crash. > During journal replay the OSD daemon

  1   2   >