In your example of EC 5 + 3, your min_size is 5. As long as you have 5
hosts up, you should still be serving content. My home cluster uses 2+1 and
has 3 nodes. I can reboot any node (leaving 2 online) as long as the PGs in
the cluster are healthy. If I were to actually lose a node, I would have to
On Wed, Aug 23, 2017 at 2:28 PM, Christian Balzer wrote:
> On Wed, 23 Aug 2017 13:38:25 +0800 Nick Tan wrote:
>
> > Thanks for the advice Christian. I think I'm leaning more towards the
> > 'traditional' storage server with 12 disks - as you say they give a lot
> > more flexibility with the perf
All,
I am looking for Grafana dashboard to monitor CEPH. I am using telegraf to
collect the metrics and influxDB to store the value.
Anyone is having the dashboard json file.
Thanks,
Saravans
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://
On 29 August 2017 at 00:21, Haomai Wang wrote:
> On Wed, Aug 23, 2017 at 1:26 AM, Florian Haas wrote:
>> - And more broadly, if a user wants to use the performance benefits of
>> RDMA, but not all of their potential Ceph clients have InfiniBand HCAs,
>> what are their options? RoCE?
>
> roce v2 i
Hi Oscar,
the mount command accepts multiple MON addresses.
mount -t ceph monhost1,monhost2,monhost3:/ /mnt/foo
If not specified the port by default is 6789.
JC
> On Aug 28, 2017, at 13:54, Oscar Segarra wrote:
>
> Hi,
>
> In Ceph, by design there is no single point of failure I terms of s
Rule of thumb with batteries is:
- more “proper temperature” you run them at the more life you get out of them
- more battery is overpowered for your application the longer it will survive.
Get your self a LSI 94** controller and use it as HBA and you will be fine. but
get MORE DRIVES ! …
>
Thank you Tomasz and Ronny. I'll have to order some hdd soon and try these
out. Car battery idea is nice! I may try that.. =) Do they last longer?
Ones that fit the UPS original battery spec didn't last very long... part of
the reason why I gave up on them.. =P My wife probably won't like
Sorry for being brutal … anyway
1. get the battery for UPS ( a car battery will do as well, I’ve moded on ups
in the past with truck battery and it was working like a charm :D )
2. get spare drives and put those in because your cluster CAN NOT get out of
error due to lack of space
3. Follow advi
Tomasz,
Those machines are behind a surge protector. Doesn't appear to be a good one!
I do have a UPS... but it is my fault... no battery. Power was pretty reliable
for a while... and UPS was just beeping every chance it had, disrupting some
sleep.. =P So running on surge protector only. I
> [SNIP - bad drives]
Generally when a disk is displaying bad blocks to the OS, the drive have
been remapping blocks for ages in the background. and the disk is really
on it's last legs. a bit unlikely that you get so many disks dying at
the same time tho. but the problem can have been silent
So to decode few things about your disk:
1 Raw_Read_Error_Rate 0x002f 100 100 051Pre-fail Always
- 37
37 read erros and only one sector marked as pending - fun disk :/
181 Program_Fail_Cnt_Total 0x0022 099 099 000Old_age Always
- 35325174
S
I think you are looking at something more like this :
https://www.google.co.uk/imgres?imgurl=https%3A%2F%2Fthumbs.dreamstime.com%2Fz%2Fhard-drive-being-destroyed-hammer-16668693.jpg&imgrefurl=https%3A%2F%2Fwww.dreamstime.com%2Fstock-photos-hard-drive-being-destroyed-hammer-image16668693&docid=Ofi7
I think you are looking at something more like this :
https://www.google.co.uk/imgres?imgurl=https%3A%2F%2Fthumbs.dreamstime.com%2Fz%2Fhard-drive-being-destroyed-hammer-16668693.jpg&imgrefurl=https%3A%2F%2Fwww.dreamstime.com%2Fstock-photos-hard-drive-being-destroyed-hammer-image16668693&docid=Ofi7
Hi,
In Ceph, by design there is no single point of failure I terms of server
roles, nevertheless, from the client point of view, it might exist.
In my environment:
Mon1: 192.168.100.101:6789
Mon2: 192.168.100.102:6789
Mon3: 192.168.100.103:6789
Client: 192.168.100.104
I have created a line in
So.. would doing something like this could potentially bring it back to life? =)
Analyzing a Faulty Hard Disk using Smartctl - Thomas-Krenn-Wiki
|
|
|
| ||
|
|
|
| |
Analyzing a Faulty Hard Disk using Smartctl - Thomas-Krenn-Wiki
| |
|
|
On Monday,
Marc,
These rpms (and debs) are built with the latest ganesha 2.5 stable release
and the latest luminous release on download.ceph.com:
http://download.ceph.com/nfs-ganesha/
I just put them up late last week, and I will be maintaining them in the future.
-Ali
- Original Message -
> From
I think you’ve got your anwser:
197 Current_Pending_Sector 0x0032 100 100 000Old_age Always
- 1
> On 28 Aug 2017, at 21:22, hjcho616 wrote:
>
> Steve,
>
> I thought that was odd too..
>
> Below is from the log, This captures transition from good to bad. Looks like
Steve,
I thought that was odd too..
Below is from the log, This captures transition from good to bad. Looks like
there is "Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors".
And looks like I did a repair with /dev/sdb1... =P
# grep sdb syslog.1Aug 27 06:27:22 OSD1 smartd[1031]:
I'm jumping in a little late here, but running xfs_repair on your partition
can't frag your partition table. The partition table lives outside the
partition block device and xfs_repair doesn't have access to it when run
against /dev/sdb1. I haven't actually tested it, but it seems unlikely that
Tomasz,
Looks like when I did xfs_repair -L /dev/sdb1 it did something to partition
table and I don't see /dev/sdb1 anymore... or maybe I missed 1 in the
/dev/sdb1? =(. Yes.. that extra power outage did a pretty good damage... =P I
am hoping 0.007% is very small...=P Any recommendations on fix
comments inline
On 28.08.2017 18:31, hjcho616 wrote:
I'll see what I can do on that... Looks like I may have to add another
OSD host as I utilized all of the SATA ports on those boards. =P
Ronny,
I am running with size=2 min_size=1. I created everything with
ceph-deploy and didn't touch
The vast majority of the sync error list is "failed to sync bucket
instance: (16) Device or resource busy". I can't find anything on Google
about this error message in relation to Ceph. Does anyone have any idea
what this means? and/or how to fix it?
On Fri, Aug 25, 2017 at 2:48 PM Casey Bodley
Sorry mate I’ve just noticed the
"unfound (0.007%)”
I think that your main culprit here is osd.0. You need to have all osd’s on one
host to get all the data back.
Also for time being I would just change size and min size down to 1 and try to
figure out which osd you actually need to get all the
Thank you all for suggestions!
Maged,
I'll see what I can do on that... Looks like I may have to add another OSD host
as I utilized all of the SATA ports on those boards. =P
Ronny,
I am running with size=2 min_size=1. I created everything with ceph-deploy and
didn't touch much of that pool setti
On Mon, Aug 28, 2017 at 7:54 AM, Florian Haas wrote:
> On Mon, Aug 28, 2017 at 4:21 PM, Haomai Wang wrote:
>> On Wed, Aug 23, 2017 at 1:26 AM, Florian Haas wrote:
>>> Hello everyone,
>>>
>>> I'm trying to get a handle on the current state of the async messenger's
>>> RDMA transport in Luminous,
Hi Florian,
On Wed, 23 Aug 2017 10:26:45 +0200, Florian Haas wrote:
> - In case there is no such support in the kernel yet: What's the current
> status of RDMA support (and testing) with regard to
> * libcephfs?
> * the Samba Ceph VFS?
On the client side, the SMB3 added an SMB-Direct protoco
do you follow this instruction(https://community.mellanox.com/docs/DOC-2693)?
On Mon, Aug 28, 2017 at 6:40 AM, Jeroen Oldenhof wrote:
> Hi All!
>
> I'm trying to run CEPH over RDMA, using a batch of Infiniband Mellanox
> MT25408 20GBit (4x DDR) cards.
>
> RDMA is running, rping works between all
On Mon, Aug 28, 2017 at 4:21 PM, Haomai Wang wrote:
> On Wed, Aug 23, 2017 at 1:26 AM, Florian Haas wrote:
>> Hello everyone,
>>
>> I'm trying to get a handle on the current state of the async messenger's
>> RDMA transport in Luminous, and I've noticed that the information
>> available is a littl
On Wed, Aug 23, 2017 at 1:26 AM, Florian Haas wrote:
> Hello everyone,
>
> I'm trying to get a handle on the current state of the async messenger's
> RDMA transport in Luminous, and I've noticed that the information
> available is a little bit sparse (I've found
> https://community.mellanox.com/do
Hi All!
I'm trying to run CEPH over RDMA, using a batch of Infiniband Mellanox
MT25408 20GBit (4x DDR) cards.
RDMA is running, rping works between all hosts, and I've configured
10.0.0.x addressing on the ib0 interfaces.
But when enabling RMDA in ceph.conf:
ms_type = async+rdma
ms_asyn
I was able to drill it further down.
The messages get logged when I create a VM image snapshot like:"rbd snap
create libvirt/wiki@backup"
And while the snapshot gets deleted at the end.
Btw. I'm runing Ceph 10.2.3
I saw this: http://tracker.ceph.com/issues/18990 and thought this migh
The rbd CLI's "lock"-related commands are advisory locks that require
an outside process to manage. The exclusive-lock feature replaces the
advisory locks (and purposely conflicts with it so you cannot use both
concurrently). I'd imagine at some point those CLI commands should be
deprecated, but th
Personally I would suggest to:
- change minimal replication type to OSD (from default host)
- remove the OSD from the host with all those "down OSD’s" (note that they are
down not out which makes it more weird)
- let single node cluster stabilise, yes performance will suck but at least you
will h
Hi.
When trying to take down a host for maintenance purposes I encountered an
I/O stall along with some PGs marked 'peered' unexpectedly.
Cluster stats: 96/96 OSDs, healthy prior to incident, 5120 PGs, 4 hosts
consisting of 24 OSDs each. Ceph version 11.2.0, using standard filestore
(with LVM jou
On 28. aug. 2017 08:01, hjcho616 wrote:
Hello!
I've been using ceph for long time mostly for network CephFS storage,
even before Argonaut release! It's been working very well for me. Yes,
I had some power outtages before and asked few questions on this list
before and got resolved happily!
Hi Marcelo,
On 26/08/17 00:05, lis...@marcelofrota.info wrote:
> Some days ago, I read about this comands rbd lock add and rbd lock
> remove , this commands will go maintened in ceph in future versions, or
> the better form, to use lock in ceph, will go exclusive-lock and this
> commands will go
I am looking for any materials which can help me to track and troubleshoot
performance of my cluster, and particulary rados gateway. I am using
command "ceph daemon 'daemon-name' perf dump", and in summary getting a
ridiculous count of various metrics, but where i can find their
description ?
I would suggest either adding 1 new disk on each of the 2 machines
increasing the osd_backfill_full_ratio to something like 90 or 92 from
default 85.
/Maged
On 2017-08-28 08:01, hjcho616 wrote:
> Hello!
>
> I've been using ceph for long time mostly for network CephFS storage, even
> before
Hello,
We plan to change our filestore osd to bluestore backend, and doing survey now.
Two questions need your help.
1. Is there any way to dump the rocksdb to let us check the content?
2. How can I get the space usage information of the db partition? We want to
figure out a reasonable size for
39 matches
Mail list logo