Re: [ceph-users] New install error

2017-08-09 Thread Brad Hubbard
is your public and cluster network configuration and which interface is it using to try to connect to 192.168.100.11:6789? You can use wireshark or similar to find out. ceph01 need to be able to communicate with 192.168.100.11 on port 6789 so you need to find out why currently it can't. > &g

Re: [ceph-users] PG reported as inconsistent in status, but no inconsistencies visible to rados

2017-08-21 Thread Brad Hubbard
Could you provide the output of 'ceph-bluestore-tool fsck' for one of these OSDs? On Tue, Aug 22, 2017 at 2:53 AM, Edward R Huyer wrote: > This is an odd one. My cluster is reporting an inconsistent pg in ceph > status and ceph health detail. However, rados list-inconsistent-obj and > rados lis

Re: [ceph-users] PG reported as inconsistent in status, but no inconsistencies visible to rados

2017-08-23 Thread Brad Hubbard
t way we can continue the investigation there. > > -Original Message- > From: Brad Hubbard [mailto:bhubb...@redhat.com] > Sent: Monday, August 21, 2017 7:05 PM > To: Edward R Huyer > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] PG reported as inconsistent in status, bu

Re: [ceph-users] XFS attempt to access beyond end of device

2017-08-27 Thread Brad Hubbard
On Fri, Aug 25, 2017 at 4:19 PM, Götz Reinicke - IT Koordinator wrote: > Hi, > > Am 28.07.17 um 04:06 schrieb Brad Hubbard: >> An update on this. >> >> The "attempt to access beyond end of device" messages are created due to a >> kernel bug

Re: [ceph-users] Multiple OSD crashing on 12.2.0. Bluestore / EC pool / rbd

2017-09-06 Thread Brad Hubbard
These error logs look like they are being generated here, https://github.com/ceph/ceph/blob/master/src/os/bluestore/BlueStore.cc#L8987-L8993 or possibly here, https://github.com/ceph/ceph/blob/master/src/os/bluestore/BlueStore.cc#L9230-L9236. Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs cep

Re: [ceph-users] Blocked requests

2017-09-07 Thread Brad Hubbard
Is it this? https://bugzilla.redhat.com/show_bug.cgi?id=1430588 On Fri, Sep 8, 2017 at 7:01 AM, Matthew Stroud wrote: > After some troubleshooting, the issues appear to be caused by gnocchi using > rados. I’m trying to figure out why. > > > > Thanks, > > Matthew Stroud > > > > From: Brian Andrus

Re: [ceph-users] Bluestore "separate" WAL and DB (and WAL/DB size?)

2017-09-11 Thread Brad Hubbard
Take a look at these which should answer at least some of your questions. http://ceph.com/community/new-luminous-bluestore/ http://ceph.com/planet/understanding-bluestore-cephs-new-storage-backend/ On Mon, Sep 11, 2017 at 8:45 PM, Richard Hesketh wrote: > On 08/09/17 11:44, Richard Hesketh wrot

Re: [ceph-users] ceph-mgr SIGABRTs on startup after cluster upgrade from Kraken to Luminous

2017-09-11 Thread Brad Hubbard
Looks like there is a tracker opened for this. http://tracker.ceph.com/issues/21197 Please add your details there. On Tue, Sep 12, 2017 at 11:04 AM, Katie Holly wrote: > Hi, > > I recently upgraded one of our clusters from Kraken to Luminous (the cluster > was initialized with Jewel) on Ubuntu

Re: [ceph-users] ceph-mgr SIGABRTs on startup after cluster upgrade from Kraken to Luminous

2017-09-11 Thread Brad Hubbard
this issue. > > -- > Katie > > On 2017-09-12 03:15, Brad Hubbard wrote: >> Looks like there is a tracker opened for this. >> >> http://tracker.ceph.com/issues/21197 >> >> Please add your details there. >> >> On Tue, Sep 12, 2017 at 11:04

Re: [ceph-users] ceph-mgr SIGABRTs on startup after cluster upgrade from Kraken to Luminous

2017-09-11 Thread Brad Hubbard
issues. Maybe a race condition? Maybe. That at least narrows it down. Could you add this information to the tracker please? The original description in the tracker appears to show ceph-mgr segfaulting on a report from an MDS so it's not completely restricted to reports from rgws. > >

Re: [ceph-users] ceph-mgr SIGABRTs on startup after cluster upgrade from Kraken to Luminous

2017-09-11 Thread Brad Hubbard
gt;>> Scaling this Docker radosgw cluster down to just 1 instance seems to >>> allow ceph-mgr to run without issues, but as soon as I increase the amount >>> of radosgw instances, the risk of ceph-mgr crashing at any random time also >>> increases. >>>

Re: [ceph-users] Clarification on sequence of recovery and client ops after OSDs rejoin cluster (also, slow requests)

2017-09-13 Thread Brad Hubbard
On Wed, Sep 13, 2017 at 8:40 PM, Florian Haas wrote: > Hi everyone, > > > disclaimer upfront: this was seen in the wild on Hammer, and on 0.94.7 > no less. Reproducing this on 0.94.10 is a pending process, and we'll > update here with findings, but my goal with this post is really to Just making

Re: [ceph-users] Clarification on sequence of recovery and client ops after OSDs rejoin cluster (also, slow requests)

2017-09-14 Thread Brad Hubbard
On Thu, Sep 14, 2017 at 5:42 PM, Florian Haas wrote: > On Thu, Sep 14, 2017 at 3:15 AM, Brad Hubbard wrote: >> On Wed, Sep 13, 2017 at 8:40 PM, Florian Haas wrote: >>> Hi everyone, >>> >>> >>> disclaimer upfront: this was seen in the wild on Hammer

Re: [ceph-users] Jewel -> Luminous upgrade, package install stopped all daemons

2017-09-18 Thread Brad Hubbard
Well OK now. Before we go setting off the fire alarms all over town let's work out what is happening, and why. I spent some time reproducing this and, it is indeed tied to selinux being (at least) permissive. It does not happen when selinux is disabled. If we look at the journalctl output in the

Re: [ceph-users] Jewel -> Luminous upgrade, package install stopped all daemons

2017-09-18 Thread Brad Hubbard
On Sat, Sep 16, 2017 at 8:34 AM, David Turner wrote: > I don't understand a single use case where I want updating my packages using > yum, apt, etc to restart a ceph daemon. ESPECIALLY when there are so many > clusters out there with multiple types of daemons running on the same > server. > > My

Re: [ceph-users] OSD assert hit suicide timeout

2017-09-20 Thread Brad Hubbard
Start gathering something akin to what systat gathers (there are of course numerous options for this) and going over it carefully. Tools like perf or oprofile can also provide important clues. This "looks" like a network issue or some sort of resource shortage. Comprehensive monitoring and gatheri

Re: [ceph-users] librmb: Mail storage on RADOS with Dovecot

2017-09-21 Thread Brad Hubbard
This looks great Wido! Kudos to all involved. On Thu, Sep 21, 2017 at 6:40 PM, Wido den Hollander wrote: > Hi, > > A tracker issue has been out there for a while: > http://tracker.ceph.com/issues/12430 > > Storing e-mail in RADOS with Dovecot, the IMAP/POP3/LDA server with a huge > marketshar

Re: [ceph-users] ceph/systemd startup bug (was Re: Some OSDs are down after Server reboot)

2017-09-28 Thread Brad Hubbard
This looks similar to https://bugzilla.redhat.com/show_bug.cgi?id=1458007 or one of the bugs/trackers attached to that. On Thu, Sep 28, 2017 at 11:14 PM, Sean Purdy wrote: > On Thu, 28 Sep 2017, Matthew Vernon said: >> Hi, >> >> TL;DR - the timeout setting in ceph-disk@.service is (far) too small

Re: [ceph-users] ceph/systemd startup bug (was Re: Some OSDs are down after Server reboot)

2017-09-29 Thread Brad Hubbard
On Fri, Sep 29, 2017 at 8:58 PM, Matthew Vernon wrote: > Hi, > > On 29/09/17 01:00, Brad Hubbard wrote: >> This looks similar to >> https://bugzilla.redhat.com/show_bug.cgi?id=1458007 or one of the >> bugs/trackers attached to that. > > Yes, although increasing t

Re: [ceph-users] 1 osd Segmentation fault in test cluster

2017-10-03 Thread Brad Hubbard
Looks like there is one already. http://tracker.ceph.com/issues/21259 On Tue, Oct 3, 2017 at 1:15 AM, Gregory Farnum wrote: > Please file a tracker ticket with all the info you have for stuff like this. > They’re a lot harder to lose than emails are. ;) > > On Sat, Sep 30, 2017 at 8:31 AM Marc R

Re: [ceph-users] [Jewel] Crash Osd with void Hit_set_trim

2017-10-18 Thread Brad Hubbard
On Wed, Oct 18, 2017 at 11:16 PM, pascal.pu...@pci-conseil.net wrote: > hello, > > For 2 week, I lost sometime some OSD : > Here trace : > > 0> 2017-10-18 05:16:40.873511 7f7c1e497700 -1 osd/ReplicatedPG.cc: In > function '*void ReplicatedPG::hit_set_trim(*ReplicatedPG::OpContextUPtr&, > unsig

Re: [ceph-users] Not able to start OSD

2017-10-19 Thread Brad Hubbard
On Fri, Oct 20, 2017 at 6:32 AM, Josy wrote: > Hi, > >>> have you checked the output of "ceph-disk list” on the nodes where the >>> OSDs are not coming back on? > > Yes, it shows all the disk correctly mounted. > >>> And finally inspect /var/log/ceph/ceph-osd.${id}.log to see messages >>> produced

Re: [ceph-users] Slow requests

2017-10-19 Thread Brad Hubbard
I guess you have both read and followed http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/?highlight=backfill#debugging-slow-requests What was the result? On Fri, Oct 20, 2017 at 2:50 AM, J David wrote: > On Wed, Oct 18, 2017 at 8:12 AM, Ольга Ухина wrote: >> I have a p

Re: [ceph-users] Slow requests

2017-10-19 Thread Brad Hubbard
On Fri, Oct 20, 2017 at 1:09 PM, J David wrote: > On Thu, Oct 19, 2017 at 9:42 PM, Brad Hubbard wrote: >> I guess you have both read and followed >> http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/?highlight=backfill#debugging-slow-requests >> &

Re: [ceph-users] Not able to start OSD

2017-10-20 Thread Brad Hubbard
gnment=false > k=5 > m=3 > plugin=jerasure > technique=reed_sol_van > w=8 Sorry, can you post the output of 'ceph osd dump' as well please? > > > On 20-10-2017 06:52, Brad Hubbard wrote: >> >> On Fri, Oct 20, 2017 at 6:32 AM, Josy wrote: >>>

Re: [ceph-users] Slow requests

2017-10-20 Thread Brad Hubbard
I've read thread > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021588.html. > Very similar problem, can it be connected to Proxmox? I have quite old > version of proxmox-ve: 4.4-80, and ceph jewel clients on pve nodes. Anything's possible. > > С уважение

Re: [ceph-users] Inconsistent PG won't repair

2017-10-20 Thread Brad Hubbard
On Sat, Oct 21, 2017 at 1:59 AM, Richard Bade wrote: > Hi Lincoln, > Yes the object is 0-bytes on all OSD's. Has the same filesystem > date/time too. Before I removed the rbd image (migrated disk to > different pool) it was 4MB on all the OSD's and md5 checksum was the > same on all so it seems th

Re: [ceph-users] [Jewel] Crash Osd with void Hit_set_trim

2017-10-22 Thread Brad Hubbard
t; Regards, > > PS: Last sunday, I lost RBD header during remove of cache tier... a lot of > thanks to http://fnordahl.com/2017/04/17/ceph-rbd-volume-header-recovery/, > to recreate it and resurrect RBD disk :) > Le 19/10/2017 à 00:19, Brad Hubbard a écrit : > > On Wed, Oc

Re: [ceph-users] [Jewel] Crash Osd with void Hit_set_trim

2017-10-23 Thread Brad Hubbard
On Mon, Oct 23, 2017 at 4:51 PM, pascal.pu...@pci-conseil.net < pascal.pu...@pci-conseil.net> wrote: > Hello, > Le 23/10/2017 à 02:05, Brad Hubbard a écrit : > > 2017-10-22 17:32:56.031086 7f3acaff5700 1 osd.14 pg_epoch: 72024 > pg[37.1c( v 71593'41657 (60849'38594

Re: [ceph-users] [Jewel] Crash Osd with void Hit_set_trim

2017-10-23 Thread Brad Hubbard
On Tue, Oct 24, 2017 at 3:49 PM, Brad Hubbard wrote: > > > On Mon, Oct 23, 2017 at 4:51 PM, pascal.pu...@pci-conseil.net < > pascal.pu...@pci-conseil.net> wrote: > >> Hello, >> Le 23/10/2017 à 02:05, Brad Hubbard a écrit : >> >> 2017-10-22 17:32:56.

Re: [ceph-users] 回复: Re: [luminous]OSD memory usage increase when writing a lot of data to cluster

2017-11-02 Thread Brad Hubbard
On Wed, Nov 1, 2017 at 11:54 PM, Mazzystr wrote: > I experienced this as well on tiny Ceph cluster testing... > > HW spec - 3x > Intel i7-4770K quad core > 32Gb m2/ssd > 8Gb memory > Dell PERC H200 > 6 x 3Tb Seagate > Centos 7.x > Ceph 12.x > > I also run 3 memory hungry procs on the Ceph nodes.

Re: [ceph-users] Luminous LTS: `ceph osd crush class create` is gone?

2017-11-02 Thread Brad Hubbard
On Fri, Nov 3, 2017 at 4:04 PM, Linh Vu wrote: > Hi all, > > > Back in Luminous Dev and RC, I was able to do this: > > > `ceph osd crush class create myclass` This was removed as part of https://github.com/ceph/ceph/pull/16388 It looks like the set-device-class command is the replacement or equi

Re: [ceph-users] OSD is near full and slow in accessing storage from client

2017-11-12 Thread Brad Hubbard
On Mon, Nov 13, 2017 at 4:57 AM, David Turner wrote: > You cannot reduce the PG count for a pool. So there isn't anything you can > really do for this unless you create a new FS with better PG counts and > migrate your data into it. > > The problem with having more PGs than you need is in the m

Re: [ceph-users] One OSD misbehaving (spinning 100% CPU, delayed ops)

2017-11-29 Thread Brad Hubbard
# ps axHo %cpu,stat,pid,tid,pgid,ppid,comm,wchan | grep ceph-osd To find the actual thread that is using 100% CPU. # for x in `seq 1 5`; do gdb -batch -p [PID] -ex "thr appl all bt"; echo; done > /tmp/osd.stack.dump Then look at the stacks for the thread that was using all the CPU and see what i

Re: [ceph-users] ceph-volume lvm for bluestore for newer disk

2017-11-30 Thread Brad Hubbard
On Thu, Nov 30, 2017 at 5:30 PM, nokia ceph wrote: > Hello, > > I'm following > http://docs.ceph.com/docs/master/ceph-volume/lvm/prepare/#ceph-volume-lvm-prepare-bluestore > to create new OSD's. > > I took the latest branch from https://shaman.ceph.com/repos/ceph/luminous/ > > # ceph -v > ceph v

Re: [ceph-users] ceph-volume lvm for bluestore for newer disk

2017-12-01 Thread Brad Hubbard
On Fri, Dec 1, 2017 at 7:28 PM, nokia ceph wrote: > THanks brad, that got worked.. :) No problem. I created http://tracker.ceph.com/issues/22297 > > On Fri, Dec 1, 2017 at 12:18 PM, Brad Hubbard wrote: >> >> >> >> On Thu, Nov 30, 2017 at 5:30 PM,

Re: [ceph-users] PG::peek_map_epoch assertion fail

2017-12-03 Thread Brad Hubbard
A debug log captured when this happens with debug_osd set to at least 15 should tell us. On Sun, Dec 3, 2017 at 10:54 PM, Gonzalo Aguilar Delgado wrote: > Hello, > > What can make fail this assertion? > > > int r = store->omap_get_values(coll, pgmeta_oid, keys, &values); > if (r == 0) { >

Re: [ceph-users] injecting args output misleading

2017-12-04 Thread Brad Hubbard
On Tue, Dec 5, 2017 at 6:12 AM, Brady Deetz wrote: > I'm not sure if this is a bug where ceph incorrectly reports to the user or > if this is just a matter of misleading language. Thought I might bring it up > in any case. > > I under stand that "may require restart" is fairly direct in its ambigu

Re: [ceph-users] OSD down with Ceph version of Kraken

2017-12-05 Thread Brad Hubbard
On Tue, Dec 5, 2017 at 8:14 PM, wrote: > Hi, > > > > Our Ceph version is Kraken and for the storage node we have up to 90 hard > disks that can be used for OSD, we configured the messenger type as > “simple”, I noticed that “simple” type here might create lots of threads and > hence occupied lots

Re: [ceph-users] Hangs with qemu/libvirt/rbd when one host disappears

2017-12-05 Thread Brad Hubbard
On Wed, Dec 6, 2017 at 4:09 AM, Marcus Priesch wrote: > Dear Ceph Users, > > first of all, big thanks to all the devs and people who made all this > possible, ceph is amazing !!! > > ok, so let me get to the point where i need your help: > > i have a cluster of 6 hosts, mixed with ssd's and hdd's.

Re: [ceph-users] Any way to get around selinux-policy-base dependency

2017-12-06 Thread Brad Hubbard
On Thu, Dec 7, 2017 at 4:23 AM, Bryan Banister wrote: > Thanks Ken, that's understandable, > -Bryan > > -Original Message- > From: Ken Dreyer [mailto:kdre...@redhat.com] > Sent: Wednesday, December 06, 2017 12:03 PM > To: Bryan Banister > Cc: Ceph Users ; Rafael Suarez > > Subject: Re:

Re: [ceph-users] Hangs with qemu/libvirt/rbd when one host disappears

2017-12-07 Thread Brad Hubbard
On Thu, Dec 7, 2017 at 6:59 PM, Marcus Priesch wrote: > Hello Brad, Hi, >> You don't really have six MONs do you (although I know the answer to >> this question)? I think you need to take another look at some of the >> docs about monitors. Sorry, I could have phrased this much better in hindsig

Re: [ceph-users] Rados: Undefined symbol error

2015-08-31 Thread Brad Hubbard
- Original Message - > From: "Aakanksha Pudipeddi-SSI" > To: "Brad Hubbard" > Cc: ceph-us...@ceph.com > Sent: Tuesday, 1 September, 2015 3:33:38 AM > Subject: RE: [ceph-users] Rados: Undefined symbol error > > Hello Brad, > > Sorry for

Re: [ceph-users] Rados: Undefined symbol error

2015-08-31 Thread Brad Hubbard
- Original Message - > From: "Aakanksha Pudipeddi-SSI" > To: "Brad Hubbard" > Cc: ceph-us...@ceph.com > Sent: Tuesday, 1 September, 2015 7:27:04 AM > Subject: RE: [ceph-users] Rados: Undefined symbol error > > Hello Brad, > > When I type

Re: [ceph-users] Rados: Undefined symbol error

2015-08-31 Thread Brad Hubbard
- Original Message - > From: "Aakanksha Pudipeddi-SSI" > To: "Brad Hubbard" > Cc: "ceph-users" > Sent: Tuesday, 1 September, 2015 7:58:33 AM > Subject: RE: [ceph-users] Rados: Undefined symbol error > > Brad, > > Yes, you are ri

Re: [ceph-users] Rados: Undefined symbol error

2015-08-31 Thread Brad Hubbard
- Original Message - > From: "Brad Hubbard" > To: "Aakanksha Pudipeddi-SSI" > Cc: "ceph-users" > Sent: Tuesday, 1 September, 2015 8:36:33 AM > Subject: Re: [ceph-users] Rados: Undefined symbol error > > - Original Message -

Re: [ceph-users] Rados: Undefined symbol error

2015-09-01 Thread Brad Hubbard
- Original Message - > From: "Aakanksha Pudipeddi-SSI" > To: "Brad Hubbard" > Sent: Wednesday, 2 September, 2015 6:25:49 AM > Subject: RE: [ceph-users] Rados: Undefined symbol error > > Hello Brad, > > I wanted to clarify the "mak

Re: [ceph-users] Cannot add/create new monitor on ceph v0.94.3

2015-09-06 Thread Brad Hubbard
- Original Message - > From: "Fangzhe Chang (Fangzhe)" > To: ceph-users@lists.ceph.com > Sent: Saturday, 5 September, 2015 6:26:16 AM > Subject: [ceph-users] Cannot add/create new monitor on ceph v0.94.3 > > > > Hi, > > I’m trying to add a second monitor using ‘ceph-deploy mon new hos

Re: [ceph-users] Cannot add/create new monitor on ceph v0.94.3

2015-09-08 Thread Brad Hubbard
I'd suggest starting the mon with debugging turned right up and taking a good look at the output. Cheers, Brad - Original Message - > From: "Fangzhe Chang (Fangzhe)" > To: "Brad Hubbard" > Cc: ceph-users@lists.ceph.com > Sent: Wednesday, 9 Sep

Re: [ceph-users] 9 PGs stay incomplete

2015-09-11 Thread Brad Hubbard
- Original Message - > From: "Wido den Hollander" > To: "ceph-users" > Sent: Friday, 11 September, 2015 6:46:11 AM > Subject: [ceph-users] 9 PGs stay incomplete > > Hi, > > I'm running into a issue with Ceph 0.94.2/3 where after doing a recovery > test 9 PGs stay incomplete: > > osdmap

Re: [ceph-users] OSD crash

2015-09-22 Thread Brad Hubbard
- Original Message - > From: "Alex Gorbachev" > To: "ceph-users" > Sent: Wednesday, 9 September, 2015 6:38:50 AM > Subject: [ceph-users] OSD crash > Hello, > We have run into an OSD crash this weekend with the following dump. Please > advise what this could be. Hello Alex, As you kn

Re: [ceph-users] osd crash and high server load - ceph-osd crashes with stacktrace

2015-10-25 Thread Brad Hubbard
- Original Message - > From: "Jacek Jarosiewicz" > To: ceph-users@lists.ceph.com > Sent: Sunday, 25 October, 2015 8:48:59 PM > Subject: Re: [ceph-users] osd crash and high server load - ceph-osd crashes > with stacktrace > > We've upgraded ceph to 0.94.4 and kernel to 3.16.0-51-generic >

Re: [ceph-users] fedora core 22

2015-10-27 Thread Brad Hubbard
- Original Message - > From: "Andrew Hume" > To: ceph-users@lists.ceph.com > Sent: Tuesday, 27 October, 2015 11:13:04 PM > Subject: [ceph-users] fedora core 22 > > a while back, i had installed ceph (firefly i believe) on my fedora core > system and all went smoothly. > i went to repeat t

Re: [ceph-users] infernalis osd activation on centos 7

2015-12-02 Thread Brad Hubbard
- Original Message - > From: "Dan Nica" > To: ceph-us...@ceph.com > Sent: Thursday, 3 December, 2015 1:39:16 AM > Subject: [ceph-users] infernalis osd activation on centos 7 > Hi guys, > After managing to get the mons up, I am stuck at activating the osds with the > error below > [cep

Re: [ceph-users] ceph new installation of ceph 0.9.2 issue and crashing osds

2015-12-08 Thread Brad Hubbard
Looks like it's failing to create a thread. Try setting kernel.pid_max to 4194303 in /etc/sysctl.conf Cheers, Brad - Original Message - > From: "Kenneth Waegeman" > To: ceph-users@lists.ceph.com > Sent: Tuesday, 8 December, 2015 10:45:11 PM > Subject: [ceph-users] ceph new installation

Re: [ceph-users] OSD error

2015-12-08 Thread Brad Hubbard
+ceph-devel - Original Message - > From: "Dan Nica" > To: ceph-us...@ceph.com > Sent: Tuesday, 8 December, 2015 7:54:20 PM > Subject: [ceph-users] OSD error > Hi guys, > Recently I installed ceph cluster version 9.2.0, and on my osd logs I see > these errors: > 2015-12-08 04:49:12.93

Re: [ceph-users] Ceph Status - Segmentation Fault

2016-05-24 Thread Brad Hubbard
/usr/bin/ceph is a python script so it's not segfaulting but some binary it's launching is and there doesn't appear to be much information about it in the log you uploaded. Are you able to capture a core file and generate a stack trace from gdb? The following may help to get some data. $ ulimit

Re: [ceph-users] Ceph Status - Segmentation Fault

2016-05-25 Thread Brad Hubbard
Hi John, This looks a lot like http://tracker.ceph.com/issues/12417 which is, of course, fixed. Worth gathering debug-auth=20 ? Maybe on the MON end as well? Cheers, Brad - Original Message - > From: "Mathias Buresch" > To: jsp...@redhat.com > Cc: ceph-us...@ceph.com > Sent: Thursday,

Re: [ceph-users] Ceph Status - Segmentation Fault

2016-06-01 Thread Brad Hubbard
t=..., in=..., this=0x7fffea882470) > at auth/cephx/../Crypto.h:110 > #1 encode_encrypt_enc_bl (cct=, > error="", out=..., key=..., t=) > at auth/cephx/CephxProtocol.h:464 > #2 encode_encrypt (cct=, error="", > out=..., key=..., t=) > at auth/ceph

Re: [ceph-users] Crashing OSDs (suicide timeout, following a single pool)

2016-06-02 Thread Brad Hubbard
On Thu, Jun 2, 2016 at 9:07 AM, Brandon Morris, PMP wrote: > The only way that I was able to get back to Health_OK was to export/import. > * Please note, any time you use the ceph_objectstore_tool you risk data > loss if not done carefully. Never remove a PG until you have a known good

Re: [ceph-users] Ceph Status - Segmentation Fault

2016-06-13 Thread Brad Hubbard
nd saw your bug this morning. Cheers, Brad -Original Message- > From: Brad Hubbard > To: Mathias Buresch > Cc: jsp...@redhat.com , ceph-us...@ceph.com e...@ceph.com> > Subject: Re: [ceph-users] Ceph Status - Segmentation Fault > Date: Thu, 2 Jun 2016 09:50:20 +1000 > &g

Re: [ceph-users] [Ceph-community] Regarding Technical Possibility of Configuring Single Ceph Cluster on Different Networks

2016-06-16 Thread Brad Hubbard
On Fri, Jun 10, 2016 at 3:01 AM, Venkata Manojawa Paritala wrote: > Hello Friends, > > I am Manoj Paritala, working in Vedams Software Solutions India Pvt Ltd, > Hyderabad, India. We are developing a POC with the below specification. I > would like to know if it is technically possible to configur

Re: [ceph-users] Installing ceph monitor on Ubuntu denial: segmentation fault

2016-06-17 Thread Brad Hubbard
On Fri, May 20, 2016 at 7:32 PM, Daniel Wilhelm wrote: > Hi > > > > I relieved to have found a solution to this problem. > > > > The ansible script for generating the key did not pass the key to the > following command line and sent therefore an empty string to this script > (see monitor_secret).

Re: [ceph-users] Ceph 10.1.1 rbd map fail

2016-06-21 Thread Brad Hubbard
On Wed, Jun 22, 2016 at 1:35 PM, 王海涛 wrote: > Hi All > > I'm using ceph-10.1.1 to map a rbd image ,but it dosen't work ,the error > messages are: > > root@heaven:~#rbd map rbd/myimage --id admin > 2016-06-22 11:16:34.546623 7fc87ca53d80 -1 WARNING: the following dangerous > and experimental featur

Re: [ceph-users] Ceph 10.1.1 rbd map fail

2016-06-22 Thread Brad Hubbard
put of the following command please? # ceph osd crush show-tunables -f json-pretty I believe you'll need to use "ceph osd crush tunables " to adjust this. > > Thanks! > > Kind Regards, > Haitao Wang > > > At 2016-06-22 12:33:42, "Brad Hubbard" wrote:

Re: [ceph-users] Ceph 10.1.1 rbd map fail

2016-06-23 Thread Brad Hubbard
s: layering, exclusive-lock, object-map, fast-diff, deep-flatten > flags: > > It looks like that some of the features are not supported by my rbd kernel > module. > Because when I get rid of the last 4 features, and only keep the "layering" > feature, > the image see

Re: [ceph-users] ceph not replicating to all osds

2016-06-27 Thread Brad Hubbard
On Tue, Jun 28, 2016 at 1:00 AM, Ishmael Tsoaela wrote: > Hi ALL, > > Anyone can help with this issue would be much appreciated. > > I have created an image on one client and mounted it on both 2 client I > have setup. > > When I write data on one client, I cannot access the data on another clien

Re: [ceph-users] VM shutdown because of PG increase

2016-06-28 Thread Brad Hubbard
On Tue, Jun 28, 2016 at 7:39 PM, Torsten Urbas wrote: > Hello, > > are you sure about your Ceph version? Below’s output states "0.94.1“. I suspect it's quite likely that the cluster was upgraded but not the clients or, if the clients were upgraded, that the VMs were not restarted so they still ha

Re: [ceph-users] ceph not replicating to all osds

2016-06-28 Thread Brad Hubbard
On Tue, Jun 28, 2016 at 4:17 PM, Ishmael Tsoaela wrote: > Hi, > > I am new to Ceph and most of the concepts are new. > > image mounted on nodeA, FS is XFS > > sudo mkfs.xfs /dev/rbd/data/data_01 > > sudo mount /dev/rbd/data/data_01 /mnt > > cluster_master@nodeB:~$ mount|grep rbd > /dev/rbd0 on /m

Re: [ceph-users] Hammer: PGs stuck creating

2016-06-29 Thread Brad Hubbard
On Thu, Jun 30, 2016 at 3:22 AM, Brian Felton wrote: > Greetings, > > I have a lab cluster running Hammer 0.94.6 and being used exclusively for > object storage. The cluster consists of four servers running 60 6TB OSDs > each. The main .rgw.buckets pool is using k=3 m=1 erasure coding and > cont

Re: [ceph-users] Hammer: PGs stuck creating

2016-06-30 Thread Brad Hubbard
rush rule ls For each rule listed by the above command. $ ceph osd crush rule dump [rule_name] I'd then dump out the crushmap and test it showing any bad mappings with the commands listed here; http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#crush-gives-up-too-soon

Re: [ceph-users] RADOSGW buckets via NFS?

2016-07-03 Thread Brad Hubbard
On Sun, Jul 3, 2016 at 9:07 PM, Sean Redmond wrote: > Hi, > > I noticed in the jewel release notes: > > "You can now access radosgw buckets via NFS (experimental)." > > Are there any docs that explain the configuration of NFS to access RADOSGW > buckets? Here's what I found. http://tracker.ceph.

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-04 Thread Brad Hubbard
On Tue, Jul 5, 2016 at 12:13 PM, Shinobu Kinjo wrote: > Can you reproduce with debug client = 20? In addition to this I would suggest making sure you have debug symbols in your build and capturing a core file. You can do that by setting "ulimit -c unlimited" in the environment where ceph-fuse is

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-04 Thread Brad Hubbard
On Tue, Jul 5, 2016 at 1:34 PM, Patrick Donnelly wrote: > Hi Goncalo, > > I believe this segfault may be the one fixed here: > > https://github.com/ceph/ceph/pull/10027 Ah, nice one Patrick. Goncalo, the patch is fairly simple, just the addition of a lock on two lines to resolve the race. Could

Re: [ceph-users] Is anyone seeing iissues with task_numa_find_cpu?

2016-07-05 Thread Brad Hubbard
On Sun, Jul 3, 2016 at 7:51 AM, Alex Gorbachev wrote: >> Thank you Stefan and Campbell for the info - hope 4.7rc5 resolves this >> for us - please note that my workload is purely RBD, no QEMU/KVM. >> Also, we do not have CFQ turned on, neither scsi-mq and blk-mq, so I >> am surmising ceph-osd must

Re: [ceph-users] Should I restart VMs when I upgrade ceph client version

2016-07-05 Thread Brad Hubbard
On Wed, Jul 6, 2016 at 3:28 PM, 한승진 wrote: > Hi Cephers, > > I implemented Ceph with OpenStack. > > Recently, I upgrade Ceph server from Hammer to Jewel. > > Also, I plan to upgrade ceph clients that are OpenStack Nodes. > > There are a lot of VMs running in Compute Nodes. > > Should I restart the

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-06 Thread Brad Hubbard
On Thu, Jul 7, 2016 at 12:31 AM, Patrick Donnelly wrote: > > The locks were missing in 9.2.0. There were probably instances of the > segfault unreported/unresolved. Or even unseen :) Race conditions are funny things and extremely subtle changes in timing introduced by any number of things can af

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-07 Thread Brad Hubbard
Hi Goncalo, If possible it would be great if you could capture a core file for this with full debugging symbols (preferably glibc debuginfo as well). How you do that will depend on the ceph version and your OS but we can offfer help if required I'm sure. Once you have the core do the following.

Re: [ceph-users] Data recovery stuck

2016-07-08 Thread Brad Hubbard
On Sat, Jul 9, 2016 at 1:20 AM, Pisal, Ranjit Dnyaneshwar wrote: > Hi All, > > > > I am in process of adding new OSDs to Cluster however after adding second > node Cluster recovery seems to be stopped. > > > > Its more than 3 days but Objects degraded % has not improved even by 1%. > > > > Will ad

Re: [ceph-users] ceph master build fails on src/gmock, workaround?

2016-07-09 Thread Brad Hubbard
On Sat, Jul 09, 2016 at 10:43:52AM +, Kevan Rehm wrote: > Greetings, > > I cloned the master branch of ceph at https://github.com/ceph/ceph.git > onto a Centos 7 machine, then did > > ./autogen.sh > ./configure --enable-xio > make > > but the build fails when it references the src/gmock subd

Re: [ceph-users] ceph master build fails on src/gmock, workaround?

2016-07-10 Thread Brad Hubbard
On Sat, Jul 09, 2016 at 10:43:52AM +, Kevan Rehm wrote: > Greetings, > > I cloned the master branch of ceph at https://github.com/ceph/ceph.git > onto a Centos 7 machine, then did > > ./autogen.sh > ./configure --enable-xio > make BTW, you should be defaulting to cmake if you don't have a sp

Re: [ceph-users] ceph admin socket protocol

2016-07-10 Thread Brad Hubbard
On Sun, Jul 10, 2016 at 09:32:33PM +0200, Stefan Priebe - Profihost AG wrote: > > Am 10.07.2016 um 16:33 schrieb Daniel Swarbrick: > > If you can read C code, there is a collectd plugin that talks directly > > to the admin socket: > > > > https://github.com/collectd/collectd/blob/master/src/ceph.

Re: [ceph-users] Fwd: Ceph OSD suicide himself

2016-07-10 Thread Brad Hubbard
On Mon, Jul 11, 2016 at 11:48:57AM +0900, 한승진 wrote: > Hi cephers. > > I need your help for some issues. > > The ceph cluster version is Jewel(10.2.1), and the filesytem is btrfs. > > I run 1 Mon and 48 OSD in 4 Nodes(each node has 12 OSDs). > > I've experienced one of OSDs was killed himself.

Re: [ceph-users] Fwd: Ceph OSD suicide himself

2016-07-10 Thread Brad Hubbard
On Mon, Jul 11, 2016 at 1:21 PM, Brad Hubbard wrote: > On Mon, Jul 11, 2016 at 11:48:57AM +0900, 한승진 wrote: >> Hi cephers. >> >> I need your help for some issues. >> >> The ceph cluster version is Jewel(10.2.1), and the filesytem is btrfs. >> >> I run

Re: [ceph-users] Fwd: Ceph OSD suicide himself

2016-07-11 Thread Brad Hubbard
On Mon, Jul 11, 2016 at 7:18 PM, Lionel Bouton wrote: > Le 11/07/2016 04:48, 한승진 a écrit : >> Hi cephers. >> >> I need your help for some issues. >> >> The ceph cluster version is Jewel(10.2.1), and the filesytem is btrfs. >> >> I run 1 Mon and 48 OSD in 4 Nodes(each node has 12 OSDs). >> >> I've

Re: [ceph-users] Fwd: Ceph OSD suicide himself

2016-07-11 Thread Brad Hubbard
On Mon, Jul 11, 2016 at 04:53:36PM +0200, Lionel Bouton wrote: > Le 11/07/2016 11:56, Brad Hubbard a écrit : > > On Mon, Jul 11, 2016 at 7:18 PM, Lionel Bouton > > wrote: > >> Le 11/07/2016 04:48, 한승진 a écrit : > >>> Hi cephers. > >>> > >>

Re: [ceph-users] ceph master build fails on src/gmock, workaround?

2016-07-12 Thread Brad Hubbard
This was resolved in http://tracker.ceph.com/issues/16646 On Sun, Jul 10, 2016 at 5:09 PM, Brad Hubbard wrote: > On Sat, Jul 09, 2016 at 10:43:52AM +, Kevan Rehm wrote: >> Greetings, >> >> I cloned the master branch of ceph at https://github.com/ceph/ceph.git >> on

Re: [ceph-users] osd failing to start

2016-07-13 Thread Brad Hubbard
On Thu, Jul 14, 2016 at 06:06:58AM +0200, Martin Wilderoth wrote: > Hello, > > I have a ceph cluster where the one osd is failng to start. I have been > upgrading ceph to see if the error dissappered. Now I'm running jewel but I > still get the error message. > > -1> 2016-07-13 17:04:22.061

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-14 Thread Brad Hubbard
the methods and functions do mean something, > > > the > > > log seems related to issues on the management of objects in cache. This > > > pointed to a memory related problem. > > > > > > 4./ On the cluster where the application run successfully, machine

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-14 Thread Brad Hubbard
ve one instance of the user application running, ceph-fuse (in > > 10.2.2) slowly rises with time up to 10 GB of memory usage. > > > > if I submit a large number of user applications simultaneously, ceph-fuse > > goes very fast to ~10GB. > > > > PID USER PR N

Re: [ceph-users] Try to install ceph hammer on CentOS7

2016-07-22 Thread Brad Hubbard
On Sat, Jul 23, 2016 at 1:41 AM, Ruben Kerkhof wrote: > Please keep the mailing list on the CC. > > On Fri, Jul 22, 2016 at 3:40 PM, Manuel Lausch wrote: >> oh. This was a copy&pase failure. >> Of course I checked my config again. Some other variations of configurating >> didn't help as well. >>

Re: [ceph-users] Recovery stuck after adjusting to recent tunables

2016-07-22 Thread Brad Hubbard
On Sat, Jul 23, 2016 at 12:17 AM, Kostis Fardelas wrote: > Hello, > being in latest Hammer, I think I hit a bug with more recent than > legacy tunables. > > Being in legacy tunables for a while, I decided to experiment with > "better" tunables. So first I went from argonaut profile to bobtail > an

Re: [ceph-users] Recovery stuck after adjusting to recent tunables

2016-07-25 Thread Brad Hubbard
is always better to use powers >>> of 2) and see if the recover completes.. >>> >>> Cheers >>> G. >>> >>> From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Kostis >>> Fardelas [dan

Re: [ceph-users] mon_osd_nearfull_ratio (unchangeable) ?

2016-07-25 Thread Brad Hubbard
On Tue, Jul 26, 2016 at 11:01:49AM +1000, Goncalo Borges wrote: > Dear Cephers... Hi Goncalo, > > I am a bit confused about the 'unchachable' message we get in Jewel 10.2.2 > when I try to change some cluster configs. > > For example: > > 1./ if I try to change mon_osd_nearfull_ratio from 0.85

Re: [ceph-users] mon_osd_nearfull_ratio (unchangeable) ?

2016-07-25 Thread Brad Hubbard
On Tue, Jul 26, 2016 at 12:16:35PM +1000, Goncalo Borges wrote: > Hi Brad > > Thanks for replying. > > Answers inline. > > > > > I am a bit confused about the 'unchachable' message we get in Jewel 10.2.2 > > > when I try to change some cluster configs. > > > > > > For example: > > > > > > 1./

Re: [ceph-users] mon_osd_nearfull_ratio (unchangeable) ?

2016-07-26 Thread Brad Hubbard
On Tue, Jul 26, 2016 at 09:37:37AM +0200, Dan van der Ster wrote: > On Tue, Jul 26, 2016 at 3:52 AM, Brad Hubbard wrote: > >> 1./ if I try to change mon_osd_nearfull_ratio from 0.85 to 0.90, I get > >> > >># ceph tell mon.* injectargs "--mon_osd_nearfull

Re: [ceph-users] syslog broke my cluster

2016-07-26 Thread Brad Hubbard
On Tue, Jul 26, 2016 at 03:48:33PM +0100, Sergio A. de Carvalho Jr. wrote: > As per my previous messages on the list, I was having a strange problem in > my test cluster (Hammer 0.94.6, CentOS 6.5) where my monitors were > literally crawling to a halt, preventing them to ever reach quorum and > cau

Re: [ceph-users] installing multi osd and monitor of ceph in single VM

2016-08-09 Thread Brad Hubbard
On Wed, Aug 10, 2016 at 12:26 AM, agung Laksono wrote: > > Hi Ceph users, > > I am new in ceph. I've been succeed installing ceph in 4 VM using Quick > installation guide in ceph documentation. > > And I've also done to compile > ceph from source code, build and install in single vm. > > What I wa

Re: [ceph-users] Recover Data from Deleted RBD Volume

2016-08-09 Thread Brad Hubbard
On Tue, Aug 9, 2016 at 7:39 AM, George Mihaiescu wrote: > Look in the cinder db, the volumes table to find the Uuid of the deleted > volume. You could also look through the logs at the time of the delete and I suspect you should be able to see how the rbd image was prefixed/named at the time of

Re: [ceph-users] Recover Data from Deleted RBD Volume

2016-08-09 Thread Brad Hubbard
On Wed, Aug 10, 2016 at 3:16 PM, Georgios Dimitrakakis wrote: > > Hello! > > Brad, > > is that possible from the default logging or verbose one is needed?? > > I 've managed to get the UUID of the deleted volume from OpenStack but don't > really know how to get the offsets and OSD maps since "r

  1   2   3   4   5   >