Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio results

2015-12-23 Thread Alex Moore
As another data point, I recently bought a few 240GB SM863s, and found I was getting 79 MB/s on the single job test. In my case the SSDs are running off the onboard Intel C204 chipset's SATA controllers on a couple of systems with single Xeon E3-1240v2 CPUs. Alex On 23/12/2015 6:39 PM, Lione

[ceph-users] SSD journals killed by VMs generating 500 IOPs (4kB) non-stop for a month, seemingly because of a syslog-ng bug

2015-11-22 Thread Alex Moore
I just had 2 of the 3 SSD journals in my small 3-node cluster fail within 24 hours of each other (not fun, although thanks to a replication factor of 3x, at least I didn't lose any data). The journals were 128 GB Samsung 850 Pros. However I have determined that it wasn't really their fault...

Re: [ceph-users] Monitors not reaching quorum. (SELinux off, IPtables off, can see tcp traffic)

2015-06-03 Thread Alex Moore
Surely this is to be expected... 1500 is the IP MTU, and 1518 is the Ethernet MTU including 4 Bytes for optional 802.1q VLAN tag. Interface MTU typically means the IP MTU, whereas a layer 2 switch cares more about layer 2 Ethernet frames, and so MTU in that context means the Ethernet MTU. On

Re: [ceph-users] 1 unfound object (but I can find it on-disk on the OSDs!)

2015-05-03 Thread Alex Moore
was at least quite happy mounting the filesystem, so I'm hoping all is well... Alex On 03/05/2015 12:55 PM, Alex Moore wrote: Hi all, I need some help getting my 0.87.1 cluster back into a healthy state... Overnight, a deep scrub detected an inconsistent object pg. Ceph health detail said

[ceph-users] 1 unfound object (but I can find it on-disk on the OSDs!)

2015-05-03 Thread Alex Moore
Hi all, I need some help getting my 0.87.1 cluster back into a healthy state... Overnight, a deep scrub detected an inconsistent object pg. Ceph health detail said the following: # ceph health detail HEALTH_ERR 1 pgs inconsistent; 2 scrub errors pg 2.3b is active+clean+inconsistent, acting [1

Re: [ceph-users] Possible improvements for a slow write speed (excluding independent SSD journals)

2015-04-21 Thread Alex Moore
Just want to add my own experience, as I'm using consumer Samsung SSDs at the moment (Ceph 0.87.1, replication 3, 16 Gbps infiniband). Originally I only had Samsung 840 EVO 1TB SSDs, which I partitioned with an initial small partition for the journal and the rest for the OSD (using XFS). I don'

Re: [ceph-users] Have 2 different public networks

2014-12-20 Thread Alex Moore
Thought I'd share details of my setup as I am effectively achieving this (ie making monitors accessible over multiple interfaces) with IP routing as follows: My Ceph hosts each have a /32 IP address on a loopback interface. And that is the IP address that all their Ceph daemons are bound to. I

Re: [ceph-users] Ceph and TRIM on SSD disks

2014-09-07 Thread Alex Moore
Andrei Mikhailovsky writes: > > Hello guys, > > was wondering if it is a good idea to enable TRIM (mount option discard) on the ssd disks which are used for > either cache pool or osd journals? As far as the journals are concerned, isn't this irrelevant if you're assigning a block device to

[ceph-users] Delays while waiting_for_osdmap according to dump_historic_ops

2014-09-07 Thread Alex Moore
I recently found out about the "ceph --admin-daemon /var/run/ceph/ceph-osd..asok dump_historic_ops" command, and noticed something unexpected in the output on my cluster, after checking numerous output samples... It looks to me like "normal" write ops on my cluster spend roughly: <1ms between