Re: [ceph-users] Ceph mgr dashboard, no socket could be created

2017-09-22 Thread John Spray
On Thu, Sep 21, 2017 at 3:29 PM, Bryan Banister wrote: > I’m not sure what happened but the dashboard module can no longer startup > now: I'm curious about the "no longer" part -- from the log it looks like you only just enabled the dashboard module ("module list changed...")? Was it definitely

Re: [ceph-users] trying to understanding crush more deeply

2017-09-22 Thread Maged Mokhtar
Per section 3.4.4 The default bucket type straw computes the hash of (PG number, replica number, bucket id) for all buckets using the Jenkins integer hashing function, then multiply this by bucket weight (for OSD disks the weight of 1 is for 1 TB, for higher level it is the sum of contained weights

[ceph-users] erasure code profile

2017-09-22 Thread Luis Periquito
Hi all, I've been trying to think what will be the best erasure code profile, but I don't really like the one I came up with... I have 3 rooms that are part of the same cluster, and I need to design so we can lose any one of the 3. As this is a backup cluster I was thinking on doing a k=2 m=1 co

Re: [ceph-users] erasure code profile

2017-09-22 Thread Dietmar Rieder
Hmm... not sure what happens if you loose 2 disks in 2 different rooms, isn't there is a risk that you loose data ? Dietmar On 09/22/2017 10:39 AM, Luis Periquito wrote: > Hi all, > > I've been trying to think what will be the best erasure code profile, > but I don't really like the one I came

Re: [ceph-users] erasure code profile

2017-09-22 Thread Luis Periquito
On Fri, Sep 22, 2017 at 9:49 AM, Dietmar Rieder wrote: > Hmm... > > not sure what happens if you loose 2 disks in 2 different rooms, isn't > there is a risk that you loose data ? yes, and that's why I don't really like the profile... ___ ceph-users mai

Re: [ceph-users] Bluestore OSD_DATA, WAL & DB

2017-09-22 Thread Richard Hesketh
I asked the same question a couple of weeks ago. No response I got contradicted the documentation but nobody actively confirmed the documentation was correct on this subject, either; my end state was that I was relatively confident I wasn't making some horrible mistake by simply specifying a bi

Re: [ceph-users] librmb: Mail storage on RADOS with Dovecot

2017-09-22 Thread Wido den Hollander
> Op 22 september 2017 om 8:03 schreef Adrian Saul > : > > > > Thanks for bringing this to attention Wido - its of interest to us as we are > currently looking to migrate mail platforms onto Ceph using NFS, but this > seems far more practical. > Great! Keep in mind this is still in a very

Re: [ceph-users] Ceph mgr dashboard, no socket could be created

2017-09-22 Thread Bryan Banister
Hi John, Yes, it was working for some time and then I tried updating the run_dir on the cluster for another reason so I had to restart the cluster. Now I get the issue with the socket creation. I tried reverting the run_dir configuration to default and restarted but the issue persists.

[ceph-users] luminous: index gets heavy read IOPS with index-less RGW pool?

2017-09-22 Thread Yuri Gorshkov
Hi all, Recently, we've noticed a strange behaviour on one of our test clusters. The cluster was configured to serve RGW and is running Luminous. Our standard procedure is to create blind (non-indexed) buckets so that our software manages the metadata by itself and we get less load on the index p

Re: [ceph-users] OSD memory usage

2017-09-22 Thread Sage Weil
Just a follow-up here: I'm chasing down a bug with memory accounting. On my luminous cluster I am seeing lots of memory usage that is triggered by scrub. Pretty sure this is a bluestore cache mempool issue (making it use more memory than it thinks it is); hopefully I'll have a fix shortly. Reco

Re: [ceph-users] access ceph filesystem at storage level and not via ethernet

2017-09-22 Thread James Okken
Thanks again Ronny, Ocfs2 is working well so far. I have 3 nodes sharing the same 7TB MSA FC lun. Hoping to add 3 more... James Okken Lab Manager Dialogic Research Inc. 4 Gatehall Drive Parsippany NJ 07054 USA Tel:   973 967 5179 Email:   james.ok...@dialogic.com Web:    www.dialogic.com - Th

[ceph-users] Stuck IOs

2017-09-22 Thread Matthew Stroud
It appears I have three stuck IOs after switching my tunables to optimal. We are running 10.2.9 and the offending pool is for gnocchi (which has caused us quite a bit pain at this point). Here are the stuck IOs: 2017-09-22 09:05:40.095125 osd.2 ##:6802/1453572 164 : cluster [WRN] 3 slow

Re: [ceph-users] Stuck IOs

2017-09-22 Thread David Turner
The request remains blocked if you issue `ceph osd down 2`? Marking the offending OSD as down usually clears up blocked requests for me... at least it resets the timer on it and the requests start blocking again if the OSD is starting to fail. On Fri, Sep 22, 2017 at 11:51 AM Matthew Stroud wrot

Re: [ceph-users] trying to understanding crush more deeply

2017-09-22 Thread Will Zhao
Thanks ! I still have a question. Like the code in bucket_straw2_choose below: u = crush_hash32_3(bucket->h.hash, x, ids[i], r); u &= 0x; ln = crush_ln(u) - 0x1ll; draw = div64_s64(ln, weights[i]); Because the x , id, r , don't change, so the ln won't change for old bucket, add

Re: [ceph-users] Stuck IOs

2017-09-22 Thread Matthew Stroud
Got one to clear: 2017-09-22 10:06:23.030648 osd.3 [WRN] 2 slow requests, 1 included below; oldest blocked for > 120.959814 secs 2017-09-22 10:06:23.030657 osd.3 [WRN] slow request 120.959814 seconds old, received at 2017-09-22 10:04:22.070785: osd_op(client.301013529.0:2418 7.e637a4b3 measure

Re: [ceph-users] Stuck IOs

2017-09-22 Thread David Turner
It shows that the blocked requests also reset and are now only a few minutes old instead of nearly a full day. What is your full `ceph status`? The blocked requests are referring to missing objects. On Fri, Sep 22, 2017 at 12:09 PM Matthew Stroud wrote: > Got one to clear: > > > > 2017-09-22 10

Re: [ceph-users] Stuck IOs

2017-09-22 Thread Matthew Stroud
^C[root@mon01 ceph]# ceph status cluster 55ebbc2d-c5b7-4beb-9688-0926cefee155 health HEALTH_WARN 2 requests are blocked > 32 sec monmap e1: 3 mons at {mon01=##:6789/0,mon02=##:6789/0,mon03=##:6789/0} election epoch 74, quorum 0,1,2 mon0

Re: [ceph-users] trying to understanding crush more deeply

2017-09-22 Thread Maged Mokhtar
If you have a random number generator rand() and variables A,B A = rand() B = rand() and loop 100 times to see which is bigger A or B, on average A will win 50 times and B wins 50 times Now assume you want to make A win twice as many times, you can add a weight A = 3 x rand() B = 1 x rand() If yo

[ceph-users] can't figure out why I have HEALTH_WARN in luminous

2017-09-22 Thread Michael Kuriger
I have a few running ceph clusters. I built a new cluster using luminous, and I also upgraded a cluster running hammer to luminous. In both cases, I have a HEALTH_WARN that I can't figure out. The cluster appears healthy except for the HEALTH_WARN in overall status. For now, I’m monitoring h

Re: [ceph-users] monitor takes long time to join quorum: STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH got BADAUTHORIZER

2017-09-22 Thread Gregory Farnum
On Thu, Sep 21, 2017 at 3:02 AM, Sean Purdy wrote: > On Wed, 20 Sep 2017, Gregory Farnum said: >> That definitely sounds like a time sync issue. Are you *sure* they matched >> each other? > > NTP looked OK at the time. But see below. > > >> Is it reproducible on restart? > > Today I did a straigh

Re: [ceph-users] librmb: Mail storage on RADOS with Dovecot

2017-09-22 Thread Gregory Farnum
On Thu, Sep 21, 2017 at 1:40 AM, Wido den Hollander wrote: > Hi, > > A tracker issue has been out there for a while: > http://tracker.ceph.com/issues/12430 > > Storing e-mail in RADOS with Dovecot, the IMAP/POP3/LDA server with a huge > marketshare. > > It took a while, but last year Deutsche Te

Re: [ceph-users] librmb: Mail storage on RADOS with Dovecot

2017-09-22 Thread Danny Al-Gaaf
Am 22.09.2017 um 22:59 schrieb Gregory Farnum: [..] > This is super cool! Is there anything written down that explains this > for Ceph developers who aren't familiar with the workings of Dovecot? > I've got some questions I see going through it, but they may be very > dumb. > > *) Why are indexes

Re: [ceph-users] librmb: Mail storage on RADOS with Dovecot

2017-09-22 Thread Gregory Farnum
On Fri, Sep 22, 2017 at 2:49 PM, Danny Al-Gaaf wrote: > Am 22.09.2017 um 22:59 schrieb Gregory Farnum: > [..] >> This is super cool! Is there anything written down that explains this >> for Ceph developers who aren't familiar with the workings of Dovecot? >> I've got some questions I see going thr

Re: [ceph-users] Ceph release cadence

2017-09-22 Thread Sage Weil
Here is a concrete proposal for everyone to summarily shoot down (or heartily endorse, depending on how your friday is going): - 9 month cycle - enforce a predictable release schedule with a freeze date and a release date. (The actual .0 release of course depends on no blocker bugs being o

Re: [ceph-users] librmb: Mail storage on RADOS with Dovecot

2017-09-22 Thread Danny Al-Gaaf
Am 22.09.2017 um 23:56 schrieb Gregory Farnum: > On Fri, Sep 22, 2017 at 2:49 PM, Danny Al-Gaaf > wrote: >> Am 22.09.2017 um 22:59 schrieb Gregory Farnum: >> [..] >>> This is super cool! Is there anything written down that explains this >>> for Ceph developers who aren't familiar with the working

[ceph-users] lost bluestore metadata but still have data

2017-09-22 Thread Jared Watts
Hi everyone, in the case where I’ve lost the entire directory below that contains a bluestore OSD’s config and metadata, but all the bluestore devices are intact (block, block.db, block.wal), how can I get the OSD up and running again? I tried to do a ceph-osd –mkfs again, which seemed to regen

Re: [ceph-users] Ceph release cadence

2017-09-22 Thread Gregory Farnum
On Fri, Sep 22, 2017 at 3:28 PM, Sage Weil wrote: > Here is a concrete proposal for everyone to summarily shoot down (or > heartily endorse, depending on how your friday is going): > > - 9 month cycle > - enforce a predictable release schedule with a freeze date and > a release date. (The actua

Re: [ceph-users] Ceph release cadence

2017-09-22 Thread Sage Weil
On Fri, 22 Sep 2017, Gregory Farnum wrote: > On Fri, Sep 22, 2017 at 3:28 PM, Sage Weil wrote: > > Here is a concrete proposal for everyone to summarily shoot down (or > > heartily endorse, depending on how your friday is going): > > > > - 9 month cycle > > - enforce a predictable release schedule

Re: [ceph-users] Ceph release cadence

2017-09-22 Thread Blair Bethwaite
On 23 September 2017 at 11:58, Sage Weil wrote: > I'm *much* happier with 2 :) so no complaint from me. I just heard a lot > of "2 years" and 2 releases (18 months) doesn't quite cover it. Maybe > it's best to start with that, though? It's still an improvement over the > current ~12 months. FW

Re: [ceph-users] Ceph 12.2.0 on 32bit?

2017-09-22 Thread Dyweni - Ceph-Users
It crashes with SimpleMessenger as well (ms_type = simple) I've also tried with and without these two settings, but still crashes. bluestore cache size = 536870912 bluestore cache kv max = 268435456 When using SimpleMessenger, it tells me it is crashing (Segmentation Fault) in 'thread_name:ms_

Re: [ceph-users] Ceph release cadence

2017-09-22 Thread Brady Deetz
I'll be first to admit that most of my comments are anecdotal. But, I suspect when it comes to storage many of us don't require a lot to get scared back into our dark corners. In short it seems that the dev team should get better at selecting features and delivering on the existing scheduled cadenc