[ceph-users] Ceph MDS Q Size troubleshooting

2017-07-18 Thread James Wilkins
Hello list, I'm looking for some more information relating to CephFS and the 'Q' size, specifically how to diagnose what contributes towards it rising up Ceph Version: 11.2.0.0 OS: CentOS 7 Kernel (Ceph Servers): 3.10.0-514.10.2.el7.x86_64 Kernel (CephFS Clients): 4.4.76-1.el7.elrepo.x86_64 - usi

Re: [ceph-users] XFS attempt to access beyond end of device

2017-07-18 Thread Dan van der Ster
On Tue, Jul 18, 2017 at 6:08 AM, Marcus Furlong wrote: > On 22 March 2017 at 05:51, Dan van der Ster wrote: >> On Wed, Mar 22, 2017 at 8:24 AM, Marcus Furlong >> wrote: >>> Hi, >>> >>> I'm experiencing the same issue as outlined in this post: >>> >>> >>> http://lists.ceph.com/pipermail/ceph-user

Re: [ceph-users] how to list and reset the scrub schedules

2017-07-18 Thread Dan van der Ster
On Fri, Jul 14, 2017 at 10:40 PM, Gregory Farnum wrote: > On Fri, Jul 14, 2017 at 5:41 AM Dan van der Ster wrote: >> >> Hi, >> >> Occasionally we want to change the scrub schedule for a pool or whole >> cluster, but we want to do this by injecting new settings without >> restarting every daemon.

Re: [ceph-users] installing specific version of ceph-common

2017-07-18 Thread Buyens Niels
I've been looking into this again and have been able to install it now (10.2.9 is newest now instead of 10.2.8 when I first asked the question): Looking at the dependency resolving, we can see it's going to install libradosstriper1 version 10.2.9 and because of that also librados 10.2.9 ... --->

[ceph-users] Installing ceph on Centos 7.3

2017-07-18 Thread Brian Wallis
I’m failing to get an install of ceph to work on a new Centos 7.3.1611 server. I’m following the instructions at http://docs.ceph.com/docs/master/start/quick-ceph-deploy/ to no avail. First question, is it possible to install ceph on Centos 7.3 or should I choose a different version or differen

Re: [ceph-users] Yet another performance tuning for CephFS

2017-07-18 Thread Gencer W . Genç
I have 3 pools. 0 rbd,1 cephfs_data,2 cephfs_metadata cephfs_data has 1024 as a pg_num, total pg number is 2113 POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RDWR_OPS WR cephfs_data 4000M1000 0 2000 0

Re: [ceph-users] Installing ceph on Centos 7.3

2017-07-18 Thread Götz Reinicke - IT Koordinator
Hi, Am 18.07.17 um 10:51 schrieb Brian Wallis: > I’m failing to get an install of ceph to work on a new Centos 7.3.1611 > server. I’m following the instructions > at http://docs.ceph.com/docs/master/start/quick-ceph-deploy/ to no > avail. > > First question, is it possible to install ceph on Cent

Re: [ceph-users] Installing ceph on Centos 7.3

2017-07-18 Thread Marc Roos
We are running on Linux c01 3.10.0-514.26.2.el7.x86_64 #1 SMP Tue Jul 4 15:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux CentOS Linux release 7.3.1611 (Core) And didn’t have any issues installing/upgrading, but we are not using ceph-deploy. In fact am surprised on how easy it is to install.

Re: [ceph-users] hammer -> jewel 10.2.8 upgrade and setting sortbitwise

2017-07-18 Thread Martin Palma
Can the "sortbitwise" also be set if we have a cluster running OSDs on 10.2.6 and some OSDs on 10.2.9? Or should we wait that all OSDs are on 10.2.9? Monitor nodes are already on 10.2.9. Best, Martin On Fri, Jul 14, 2017 at 1:16 PM, Dan van der Ster wrote: > On Mon, Jul 10, 2017 at 5:06 PM, Sag

Re: [ceph-users] How's cephfs going?

2017-07-18 Thread David McBride
On Mon, 2017-07-17 at 02:59 +, 许雪寒 wrote: > Hi, everyone. > > We intend to use cephfs of Jewel version, however, we don’t know its status. > Is it production ready in Jewel? Does it still have lots of bugs? Is it a > major effort of the current ceph development? And who are using cephfs now?

[ceph-users] Updating 12.1.0 -> 12.1.1

2017-07-18 Thread Marc Roos
I just updated packages on one CentOS7 node and getting these errors: Jul 18 12:03:34 c01 ceph-mon: 2017-07-18 12:03:34.537510 7f4fa1c14e40 -1 WARNING: the following dangerous and experimental features are enabled: bluestore Jul 18 12:03:34 c01 ceph-mon: 2017-07-18 12:03:34.537510 7f4fa1c14e40

Re: [ceph-users] Yet another performance tuning for CephFS

2017-07-18 Thread Gencer W . Genç
Patrick, I did timing tests. Rsync is not a tools that should I trust for speed test. I simply do "cp" and extra write tests to ceph cluster. It is very very fast indeed. Rsync itself copies an 1GB file slowly and it takes 5-7 seconds to complete. Cp itself does it in 0,901s. (Not even 1 second

[ceph-users] Mon's crashing after updating

2017-07-18 Thread Ashley Merrick
Hello, I just updated to latest CEPH Lum RC, all was working fine with my 3 Mon's/Mgr's online, went to enable the Dashboard with the command : ceph mgr module enable dashboard Now only one of the 3 MON's will run, every time a try and start a failed mon it will either fail or stay online and

Re: [ceph-users] Yet another performance tuning for CephFS

2017-07-18 Thread Peter Maloney
On 07/17/17 22:49, gen...@gencgiyen.com wrote: > I have a seperate 10GbE network for ceph and another for public. > Are you sure? Your config didn't show this. > No they are not NVMe, unfortunately. > What kind of devices are they? did you do the journal test? http://www.sebastien-han.fr/blog/2014

Re: [ceph-users] hammer -> jewel 10.2.8 upgrade and setting sortbitwise

2017-07-18 Thread Dan van der Ster
Hi Martin, We had sortbitwise set on other jewel clusters well before 10.2.9 was out. 10.2.8 added the warning if it is not set, but the flag should be safe in 10.2.6. -- Dan On Tue, Jul 18, 2017 at 11:43 AM, Martin Palma wrote: > Can the "sortbitwise" also be set if we have a cluster running

Re: [ceph-users] Yet another performance tuning for CephFS

2017-07-18 Thread Gencer W . Genç
>> Are you sure? Your config didn't show this. Yes. I have dedicated 10GbE network between ceph nodes. Each ceph node has seperate network that have 10GbE network card and speed. Do I have to set anything in the config for 10GbE? >> What kind of devices are they? did you do the journal test? Th

Re: [ceph-users] Mon's crashing after updating

2017-07-18 Thread John Spray
On Tue, Jul 18, 2017 at 12:43 PM, Ashley Merrick wrote: > Hello, > > > > I just updated to latest CEPH Lum RC, all was working fine with my 3 > Mon’s/Mgr’s online, went to enable the Dashboard with the command : ceph mgr > module enable dashboard > > > > Now only one of the 3 MON’s will run, every

Re: [ceph-users] Mon's crashing after updating

2017-07-18 Thread Ashley Merrick
Hello, Thanks for quick response, as it seems to be related to when I tried to enable the dashboard (if I’m correct) is their a way I can try and disable the dashboard via admin socket e.t.c or another work around till version is released with the patch? Thanks, Ashley Sent from my iPhone On

Re: [ceph-users] Mon's crashing after updating

2017-07-18 Thread John Spray
On Tue, Jul 18, 2017 at 1:20 PM, Ashley Merrick wrote: > Hello, > > Thanks for quick response, as it seems to be related to when I tried to > enable the dashboard (if I’m correct) is their a way I can try and disable > the dashboard via admin socket e.t.c or another work around till version is > r

Re: [ceph-users] Yet another performance tuning for CephFS

2017-07-18 Thread Ansgar Jazdzewski
Hi, i will try to join in and help, as far es i got you you only have HDD's in your cluster? you use the journal on the HDD? and you have a replication of 3 set on your pools? with that in mind you can do some calulations ceph need to: 1. write the data and metadata into the journal 2. copy the

[ceph-users] best practices for expanding hammer cluster

2017-07-18 Thread Laszlo Budai
Dear all, we are planning to add new hosts to our existing hammer clusters, and I'm looking for best practices recommendations. currently we have 2 clusters with 72 OSDs and 6 nodes each. We want to add 3 more nodes (36 OSDs) to each cluster, and we have some questions about what would be the

Re: [ceph-users] Mon's crashing after updating

2017-07-18 Thread Ashley Merrick
Perfect seems to have worked, so look's like it was the same bug. Thanks, Ashley -Original Message- From: John Spray [mailto:jsp...@redhat.com] Sent: Tuesday, 18 July 2017 8:27 PM To: Ashley Merrick Cc: ceph-us...@ceph.com Subject: Re: [ceph-users] Mon's crashing after updating On Tue,

[ceph-users] v12.1.1 Luminous RC released

2017-07-18 Thread Abhishek Lekshmanan
This is the second release candidate for Luminous, the next long term stable release. Please note that this is still a *release candidate* and not the final release, and hence not yet recommended on production clusters, testing is welcome & we would love feedback and bug reports. Ceph Luminous (v

Re: [ceph-users] updating the documentation

2017-07-18 Thread John Spray
On Wed, Jul 12, 2017 at 8:28 PM, Sage Weil wrote: > On Wed, 12 Jul 2017, Patrick Donnelly wrote: >> On Wed, Jul 12, 2017 at 11:29 AM, Sage Weil wrote: >> > In the meantime, we can also avoid making the problem worse by requiring >> > that all pull requests include any relevant documentation updat

Re: [ceph-users] Yet another performance tuning for CephFS

2017-07-18 Thread Peter Maloney
On 07/18/17 14:10, Gencer W. Genç wrote: >>> Are you sure? Your config didn't show this. > Yes. I have dedicated 10GbE network between ceph nodes. Each ceph node has > seperate network that have 10GbE network card and speed. Do I have to set > anything in the config for 10GbE? Not for 10GbE, but

Re: [ceph-users] Problems getting nfs-ganesha with cephfs backend to work.

2017-07-18 Thread David
You mentioned the Kernel client works but the Fuse mount would be a better test in relation to the Ganesha FSAL. The following config didn't give me the error you describe in 1) but I'm mounting on the client with NFSv4, not sure about 2), is that dm-nfs? EXPORT { Export_ID = 1; Path = "/

[ceph-users] Modify pool size not allowed with permission osd 'allow rwx pool=test'

2017-07-18 Thread Marc Roos
With ceph auth I have set permissions like below, I can add and delete objects in the test pool, but cannot set size of a the test pool. What permission do I need to add for this user to modify the size of this test pool? mon 'allow r' mds 'allow r' osd 'allow rwx pool=test' __

Re: [ceph-users] Modify pool size not allowed with permission osd 'allow rwx pool=test'

2017-07-18 Thread Wido den Hollander
> Op 18 juli 2017 om 17:40 schreef Marc Roos : > > > > > With ceph auth I have set permissions like below, I can add and delete > objects in the test pool, but cannot set size of a the test pool. What > permission do I need to add for this user to modify the size of this > test pool? > >

[ceph-users] skewed osd utilization

2017-07-18 Thread Ashley Merrick
Hello, On a updated Lum cluster I am getting the following health warning (skewed osd utilization). The reason for this is I have a set of SSD’s in a cache which are much emptier than my standard SAS disks putting the ration off massively. Is it possible to tell it to exclude certain disks from

Re: [ceph-users] hammer -> jewel 10.2.8 upgrade and setting sortbitwise

2017-07-18 Thread David Turner
It was recommended to set sort_bitwise in the upgrade from Hammer to Jewel when Jewel was first released. 10.2.6 is definitely safe to enable it. On Tue, Jul 18, 2017, 8:05 AM Dan van der Ster wrote: > Hi Martin, > > We had sortbitwise set on other jewel clusters well before 10.2.9 was out. > 10

Re: [ceph-users] best practices for expanding hammer cluster

2017-07-18 Thread David Turner
This was recently covered on the mailing list. I believe this will cover all of your questions. https://www.spinics.net/lists/ceph-users/msg37252.html On Tue, Jul 18, 2017, 9:07 AM Laszlo Budai wrote: > Dear all, > > we are planning to add new hosts to our existing hammer clusters, and I'm > lo

[ceph-users] Ceph-Kraken: Error installing calamari

2017-07-18 Thread Oscar Segarra
Hi, I have created a VM called vdiccalamari where I'm trying to install the calamari server in order to view ceph status from a gui: [vdicceph@vdicnode01 ceph]$ sudo ceph status cluster 656e84b2-9192-40fe-9b81-39bd0c7a3196 health HEALTH_OK monmap e2: 1 mons at {vdicnode01=192.168.10

[ceph-users] Updating 12.1.0 -> 12.1.1 mon / osd wont start

2017-07-18 Thread Marc Roos
I just updated packages on one CentOS7 node and getting these errors. Anybody an idea how to resolve this? Jul 18 12:03:34 c01 ceph-mon: 2017-07-18 12:03:34.537510 7f4fa1c14e40 -1 WARNING: the following dangerous and experimental features are enabled: bluestore Jul 18 12:03:34 c01 ceph-mon:

Re: [ceph-users] updating the documentation

2017-07-18 Thread Gregory Farnum
On Tue, Jul 18, 2017 at 6:51 AM, John Spray wrote: > On Wed, Jul 12, 2017 at 8:28 PM, Sage Weil wrote: >> On Wed, 12 Jul 2017, Patrick Donnelly wrote: >>> On Wed, Jul 12, 2017 at 11:29 AM, Sage Weil wrote: >>> > In the meantime, we can also avoid making the problem worse by requiring >>> > that

Re: [ceph-users] cephfs metadata damage and scrub error

2017-07-18 Thread Mazzystr
Any update to this? I also have the same problem # for i in $(cat pg_dump | grep 'stale+active+clean' | awk {'print $1'}); do echo -n "$i: "; rados list-inconsistent-obj $i; echo; done 107.ff: {"epoch":10762,"inconsistents":[]} . and so on for 49 pg's that I think I had a problem with #

Re: [ceph-users] Yet another performance tuning for CephFS

2017-07-18 Thread Gencer W . Genç
>> Not for 10GbE, but for public vs cluster network, for example: Applied. Thanks! >> Then I'm not sure what to expect... probably poor performance with sync >> writes on filestore, and not sure what would happen with >> bluestore... >> probably much better than filestore though if you use a lar

Re: [ceph-users] Updating 12.1.0 -> 12.1.1

2017-07-18 Thread Gregory Farnum
Yeah, some of the message formats changed (incompatibly) during development. If you update all your nodes it should go away; that one I think is just ephemeral state. On Tue, Jul 18, 2017 at 3:09 AM Marc Roos wrote: > > I just updated packages on one CentOS7 node and getting these errors: > > Ju

Re: [ceph-users] skewed osd utilization

2017-07-18 Thread Gregory Farnum
On Tue, Jul 18, 2017 at 9:41 AM Ashley Merrick wrote: > Hello, > > On a updated Lum cluster I am getting the following health warning (skewed > osd utilization). The reason for this is I have a set of SSD’s in a cache > which are much emptier than my standard SAS disks putting the ration off > ma

Re: [ceph-users] updating the documentation

2017-07-18 Thread John Spray
On Tue, Jul 18, 2017 at 9:03 PM, Gregory Farnum wrote: > On Tue, Jul 18, 2017 at 6:51 AM, John Spray wrote: >> On Wed, Jul 12, 2017 at 8:28 PM, Sage Weil wrote: >>> On Wed, 12 Jul 2017, Patrick Donnelly wrote: On Wed, Jul 12, 2017 at 11:29 AM, Sage Weil wrote: > In the meantime, we ca

Re: [ceph-users] Long OSD restart after upgrade to 10.2.9

2017-07-18 Thread Josh Durgin
On 07/17/2017 10:04 PM, Anton Dmitriev wrote: My cluster stores more than 1.5 billion objects in RGW, cephfs I dont use. Bucket index pool stored on separate SSD placement. But compaction occurs on all OSD, also on those, which doesn`t contain bucket indexes. After restarting 5 times every OSD

[ceph-users] Moving OSD node from root bucket to defined 'rack' bucket

2017-07-18 Thread Mike Cave
Greetings, I’m trying to figure out the best way to move our hosts from the root=default bucket into their rack buckets. Our crush map has the notion of three racks which will hold all of our osd nodes. As we have added new nodes, we have assigned them to their correct rack location in the ma

[ceph-users] undersized pgs after removing smaller OSDs

2017-07-18 Thread Roger Brown
Problem: I have some pgs with only two OSDs instead of 3 like all the other pgs have. This is causing active+undersized+degraded status. History: 1. I started with 3 hosts, each with 1 OSD process (min_size 2) for a 1TB drive. 2. Added 3 more hosts, each with 1 OSD process for a 10TB drive. 3. Rem

Re: [ceph-users] undersized pgs after removing smaller OSDs

2017-07-18 Thread Roger Brown
I also tried ceph pg query, but it gave no helpful recommendations for any of the stuck pgs. On Tue, Jul 18, 2017 at 7:45 PM Roger Brown wrote: > Problem: > I have some pgs with only two OSDs instead of 3 like all the other pgs > have. This is causing active+undersized+degraded status. > > Hist

Re: [ceph-users] undersized pgs after removing smaller OSDs

2017-07-18 Thread Brad Hubbard
ID WEIGHT TYPE NAME -5 1.0 host osd1 -6 9.09560 host osd2 -2 9.09560 host osd3 The weight allocated to host "osd1" should presumably be the same as the other two hosts? Dump your crushmap and take a good look at it, specifically the weighting of "osd1". On Wed, Jul 19, 2017

Re: [ceph-users] undersized pgs after removing smaller OSDs

2017-07-18 Thread Roger Brown
Ah, that was the problem! So I edited the crushmap ( http://docs.ceph.com/docs/master/rados/operations/crush-map/) with a weight of 10.000 for all three 10TB OSD hosts. The instant result was all those pgs with only 2 OSDs were replaced with 3 OSDs while the cluster started rebalancing the data. I

Re: [ceph-users] undersized pgs after removing smaller OSDs

2017-07-18 Thread Roger Brown
Resolution confirmed! $ ceph -s cluster: id: eea7b78c-b138-40fc-9f3e-3d77afb770f0 health: HEALTH_OK services: mon: 3 daemons, quorum desktop,mon1,nuc2 mgr: desktop(active), standbys: mon1 osd: 3 osds: 3 up, 3 in data: pools: 19 pools, 372 pgs objects: 5424

Re: [ceph-users] Moving OSD node from root bucket to defined 'rack' bucket

2017-07-18 Thread David Turner
You do not need to empty the host before moving it in the crush map. It will just cause data movement because you are removing an item under root and changing the crush weight of the rack. There is no way I am aware of to really ease into this data movement other than to stare it head on and util

Re: [ceph-users] Moving OSD node from root bucket to defined 'rack' bucket

2017-07-18 Thread David Turner
I would still always recommend having at least having n+1 failure domain in any production cluster where n is your replica size. On Tue, Jul 18, 2017, 11:20 PM David Turner wrote: > You do not need to empty the host before moving it in the crush map. It > will just cause data movement because y

[ceph-users] 答复: How's cephfs going?

2017-07-18 Thread 许雪寒
Is there anyone else willing to share some usage information of cephfs? Could developers tell whether cephfs is a major effort in the whole ceph development? 发件人: 许雪寒 发送时间: 2017年7月17日 11:00 收件人: ceph-users@lists.ceph.com 主题: How's cephfs going? Hi, everyone. We intend to use cephfs of Jewel ve

Re: [ceph-users] undersized pgs after removing smaller OSDs

2017-07-18 Thread David Turner
I would recommend sucking with the weight of 9.09560 for the osds as that is the TiB size of the osds that ceph details to as supposed to the TB size of the osds. New osds will have their weights based on the TiB value. What is your `ceph osd df` output just to see what things look like? Hopefully

Re: [ceph-users] Long OSD restart after upgrade to 10.2.9

2017-07-18 Thread Anton Dmitriev
root@storage07:~$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description:Ubuntu 14.04.5 LTS Release:14.04 Codename: trusty root@storage07:~$ uname -a Linux storage07 4.4.0-83-generic #106~14.04.1-Ubuntu SMP Mon Jun 26 18:10:19 UTC 2017 x86_64 x86_64 x86

Re: [ceph-users] best practices for expanding hammer cluster

2017-07-18 Thread Laszlo Budai
Hi David, thank you for pointing this out. Google wasn't able to find it ... As far as I understand that thread is talking about a situation when you add hosts to an existing CRUSH bucket. That sounds good, and probably that will be our solution for cluster2. I wonder whether there are any rec