Re: domino-style OSD crash

2012-07-10 Thread Tommi Virtanen
On Tue, Jul 10, 2012 at 10:36 AM, Yann Dupont wrote: >> Fundamentally, it comes down to this: the two clusters will still have >> the same fsid, and you won't be isolated from configuration errors or > (CEPH-PROD is the old btrfs volume ). /CEPH is new xfs volume, completely > redone & reformatted

Re: OSD crash on 0.48.2argonaut

2012-11-15 Thread Josh Durgin
On 11/14/2012 11:31 PM, eric_yh_c...@wiwynn.com wrote: Dear All: I met this issue on one of osd node. Is this a known issue? Thanks! ceph version 0.48.2argonaut (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe) 1: /usr/bin/ceph-osd() [0x6edaba] 2: (()+0xfcb0) [0x7f08b112dcb0] 3: (gsignal(

Re: osd crash during resync

2012-01-24 Thread Gregory Farnum
On Tue, Jan 24, 2012 at 10:48 AM, Martin Mailand wrote: > Hi, > today I tried the btrfs patch mentioned on the btrfs ml. Therefore I > rebooted osd.0 with a new kernel and created a new btrfs on the osd.0, than > I took the osd.0 into the cluster. During the the resync of osd.0 osd.2 and > osd.3 c

Re: osd crash during resync

2012-01-24 Thread Martin Mailand
Hi Greg, ok, do you guys still need the core files, or could I delete them? -martin Am 24.01.2012 22:13, schrieb Gregory Farnum: On Tue, Jan 24, 2012 at 10:48 AM, Martin Mailand wrote: Hi, today I tried the btrfs patch mentioned on the btrfs ml. Therefore I rebooted osd.0 with a new kernel an

Re: osd crash during resync

2012-01-24 Thread Gregory Farnum
On Tue, Jan 24, 2012 at 1:22 PM, Martin Mailand wrote: > Hi Greg, > ok, do you guys still need the core files, or could I delete them? Sam thinks probably not since we have the backtraces and the logs...thanks for asking, though! :) -Greg -- To unsubscribe from this list: send the line "unsubscri

Re: osd crash during resync

2012-01-25 Thread Sage Weil
Hi Martin, On Tue, 24 Jan 2012, Martin Mailand wrote: > Hi, > today I tried the btrfs patch mentioned on the btrfs ml. Therefore I rebooted > osd.0 with a new kernel and created a new btrfs on the osd.0, than I took the > osd.0 into the cluster. During the the resync of osd.0 osd.2 and osd.3 > cra

Re: osd crash during resync

2012-01-26 Thread Martin Mailand
Hi Sage, I uploaded the osd.0 log as well. http://85.214.49.87/ceph/20120124/osd.0.log.bz2 -martin Am 25.01.2012 23:08, schrieb Sage Weil: Hi Martin, On Tue, 24 Jan 2012, Martin Mailand wrote: Hi, today I tried the btrfs patch mentioned on the btrfs ml. Therefore I rebooted osd.0 with a new

Problem after ceph-osd crash

2012-02-20 Thread Oliver Francke
Hi, we are just in trouble after some mess with trying to include a new OSD-node into our cluster. We get some weird "libceph: corrupt inc osdmap epoch 880 off 102 (c9001db8990a of c9001db898a4-c9001db89dae)" on the console. The whole system is in a state ala: 012-02-20 17:56:2

OSD Crash for xattr "_" absent issue.

2014-11-26 Thread Wenjunh
> Hi, Samuel & Sage > > In our current production environment, there exists osd crash because of the > inconsistence of data, when reading the “_” xattr. Which is described in the > issue: > > http://tracker.ceph.com/issues/10117. > > And I also find a t

Re: osd crash when deep-scrubbing

2015-10-19 Thread changtao381
readPool::worker(ThreadPool::WorkThread*)+0x53d) [0x9e05dd] > 17: (ThreadPool::WorkThread::entry()+0x10) [0x9e1760] > 18: (()+0x7a51) [0x7f384b6b0a51] > 19: (clone()+0x6d) [0x7f384a6409ad] > > ceph version is v0.80.9, manually executes `ceph pg deep-scrub 3.d70` would also > cau

Re: OSD crash, ceph version 0.56.1

2013-01-09 Thread Sage Weil
On Wed, 9 Jan 2013, Ian Pye wrote: > Hi, > > Every time I try an bring up an OSD, it crashes and I get the > following: "error (121) Remote I/O error not handled on operation 20" This error code (EREMOTEIO) is not used by Ceph. What fs are you using? Which kernel version? Anything else unusua

Re: OSD crash, ceph version 0.56.1

2013-01-09 Thread Ian Pye
On Wed, Jan 9, 2013 at 4:38 PM, Sage Weil wrote: > On Wed, 9 Jan 2013, Ian Pye wrote: >> Hi, >> >> Every time I try an bring up an OSD, it crashes and I get the >> following: "error (121) Remote I/O error not handled on operation 20" > > This error code (EREMOTEIO) is not used by Ceph. What fs ar

Re: Problem after ceph-osd crash

2012-02-20 Thread Sage Weil
a, 3734 GB used, 26059 GB / > 29794 GB avail; 272914/1349073 degraded (20.230%) > > and sometimes the ceph-osd on node0 is crashing. At the moment of writing, the > degrading continues to shrink down below 20%. How did ceph-osd crash? Is there a dump in the log? sage > > Any

Re: Problem after ceph-osd crash

2012-02-20 Thread Oliver Francke
, 26059 GB / 29794 GB avail; 272914/1349073 degraded (20.230%) and sometimes the ceph-osd on node0 is crashing. At the moment of writing, the degrading continues to shrink down below 20%. How did ceph-osd crash? Is there a dump in the log? 'course I will provide all logs, uhm, a bit later,

Re: Problem after ceph-osd crash

2012-02-20 Thread Sage Weil
> > 29794 GB avail; 272914/1349073 degraded (20.230%) > > > > > > and sometimes the ceph-osd on node0 is crashing. At the moment of writing, > > > the > > > degrading continues to shrink down below 20%. > > How did ceph-osd crash? Is there a dum

osd crash with object store set to newstore

2015-06-01 Thread Srikanth Madugundi
Hi Sage and all, I build ceph code from wip-newstore on RHEL7 and running performance tests to compare with filestore. After few hours of running the tests the osd daemons started to crash. Here is the stack trace, the osd crashes immediately after the restart. So I could not get the osd up and ru

Bobtail to dumpling (was: OSD crash during repair)

2013-09-10 Thread Chris Dunlop
On Fri, Sep 06, 2013 at 08:21:07AM -0700, Sage Weil wrote: > On Fri, 6 Sep 2013, Chris Dunlop wrote: >> On Thu, Sep 05, 2013 at 07:55:52PM -0700, Sage Weil wrote: >>> Also, you should upgrade to dumpling. :) >> >> I've been considering it. It was initially a little scary with >> the various issue

osd crash in ReplicatedPG::add_object_context_to_pg_stat(ReplicatedPG::ObjectContext*, pg_stat_t*)

2012-10-11 Thread Yann Dupont
-up-to-date version, then 0.48, 49,50,51... Data store is on XFS. I'm currently in the process of growing my ceph from 6 nodes to 12 nodes. 11 nodes are currently in ceph, for a 130 TB total. Declaring new osd was OK, the data has moved "quite" ok (in fact I had some OSD crash - no

Re: osd crash with object store set to newstore

2015-06-01 Thread Sage Weil
On Mon, 1 Jun 2015, Srikanth Madugundi wrote: > Hi Sage and all, > > I build ceph code from wip-newstore on RHEL7 and running performance > tests to compare with filestore. After few hours of running the tests > the osd daemons started to crash. Here is the stack trace, the osd > crashes immediate

Re: osd crash with object store set to newstore

2015-06-01 Thread Srikanth Madugundi
Hi Sage, The assertion failed at line 1639, here is the log message 2015-05-30 23:17:55.141388 7f0891be0700 -1 os/newstore/NewStore.cc: In function 'virtual int NewStore::collection_list_partial(coll_t, ghobject_t, int, int, snapid_t, std::vector*, ghobject_t*)' thread 7f0891be0700 time 2015-05-

Re: osd crash with object store set to newstore

2015-06-01 Thread Sage Weil
I pushed a commit to wip-newstore-debuglist.. can you reproduce the crash with that branch with 'debug newstore = 20' and send us the log? (You can just do 'ceph-post-file '.) Thanks! sage On Mon, 1 Jun 2015, Srikanth Madugundi wrote: > Hi Sage, > > The assertion failed at line 1639, here is

Re: osd crash with object store set to newstore

2015-06-01 Thread Srikanth Madugundi
Hi Sage, Unfortunately I purged the cluster yesterday and restarted the backfill tool. I did not see the osd crash yet on the cluster. I am monitoring the OSDs and will update you once I see the crash. With the new backfill run I have reduced the rps by half, not sure if this is the reason for

Re: osd crash with object store set to newstore

2015-06-03 Thread Srikanth Madugundi
e > backfill tool. I did not see the osd crash yet on the cluster. I am > monitoring the OSDs and will update you once I see the crash. > > With the new backfill run I have reduced the rps by half, not sure if > this is the reason for not seeing the crash yet. > > Regards > Srik

Re: osd crash with object store set to newstore

2015-06-05 Thread Srikanth Madugundi
Let me know if you need anything else. > > Regards > Srikanth > > > On Mon, Jun 1, 2015 at 10:25 PM, Srikanth Madugundi > wrote: >> Hi Sage, >> >> Unfortunately I purged the cluster yesterday and restarted the >> backfill tool. I did not see the os

Re: osd crash with object store set to newstore

2015-06-05 Thread Sage Weil
sted. > > > > ceph-post-file: ddfcf940-8c13-4913-a7b9-436c1a7d0804 > > > > Let me know if you need anything else. > > > > Regards > > Srikanth > > > > > > On Mon, Jun 1, 2015 at 10:25 PM, Srikanth Madugundi > > wrote: > >> H

Re: Bobtail to dumpling (was: OSD crash during repair)

2013-09-10 Thread Sage Weil
On Wed, 11 Sep 2013, Chris Dunlop wrote: > On Fri, Sep 06, 2013 at 08:21:07AM -0700, Sage Weil wrote: > > On Fri, 6 Sep 2013, Chris Dunlop wrote: > >> On Thu, Sep 05, 2013 at 07:55:52PM -0700, Sage Weil wrote: > >>> Also, you should upgrade to dumpling. :) > >> > >> I've been considering it. It w

Re: osd crash in ReplicatedPG::add_object_context_to_pg_stat(ReplicatedPG::ObjectContext*, pg_stat_t*)

2012-10-15 Thread Samuel Just
s to 12 nodes. 11 > nodes are currently in ceph, for a 130 TB total. Declaring new osd was OK, > the data has moved "quite" ok (in fact I had some OSD crash - not > definitive, the osd restart ok-, maybe related to an error in my new nodes > network configuration that I dis

Re: osd crash in ReplicatedPG::add_object_context_to_pg_stat(ReplicatedPG::ObjectContext*, pg_stat_t*)

2012-10-22 Thread Yann Dupont
Le 22/10/2012 17:37, Yann Dupont a écrit : Le 15/10/2012 21:57, Samuel Just a écrit : debug filestore = 20 debug osd = 20 ok, just had a core you can grab it here : http://filex.univ-nantes.fr/get?k=xojcpgmGoN4pR1rpqf5 now, I'll run with debug options cheers Ok, I've collected a big log. I'

Re: osd crash in ReplicatedPG::add_object_context_to_pg_stat(ReplicatedPG::ObjectContext*, pg_stat_t*)

2012-10-22 Thread Samuel Just
Yeah, I think I've seen that before, but not yet with logs. filestore and osd logging would help greatly if it's reproducible. I've put it in as #3386. -Sam On Mon, Oct 22, 2012 at 10:40 AM, Yann Dupont wrote: > Le 22/10/2012 17:37, Yann Dupont a écrit : > >> Le 15/10/2012 21:57, Samuel Just a

Should an OSD crash when journal device is out of space?

2012-06-20 Thread Travis Rhoden
Not sure if this is a bug or not. It was definitely user error -- but since the OSD process bailed, figured I would report it. I had /tmpfs mounted with 2.5GB of space: tmpfs on /tmpfs type tmpfs (rw,size=2560m) Then I decided to increase my journal size to 5G, but forgot to increase the limit

Re: Should an OSD crash when journal device is out of space?

2012-06-20 Thread Matthew Roy
I hit this a couple times and wondered the same thing. Why does the OSD need to bail when it runs out of journal space? On Wed, Jun 20, 2012 at 3:56 PM, Travis Rhoden wrote: > Not sure if this is a bug or not.  It was definitely user error -- but > since the OSD process bailed, figured I would re

Re: Should an OSD crash when journal device is out of space?

2012-07-02 Thread Gregory Farnum
Hey guys, Thanks for the problem report. I've created an issue to track it at http://tracker.newdream.net/issues/2687. It looks like we just assume that if you're using a file, you've got enough space for it. It shouldn't be a big deal to at least do some startup checks which will fail gracefully.

OSD-crash on 0.48.1argonout, error "void ReplicatedPG::recover_got(hobject_t, eversion_t)" not seen on list

2012-09-19 Thread Oliver Francke
Hi all, after adding a new node into our ceph-cluster yesterday, we had a crash of one OSD. I have found this kind of message in the bugtracker as being solved ( http://tracker.newdream.net/issues/2075), I will update this one for my convenience and attach the according log ( due to producti

FW: [Ceph - Bug #10080] Pipe::connect() cause osd crash when osd reconnect to its peer

2014-11-24 Thread GuangYang
ct: [Ceph - Bug #10080] Pipe::connect() cause osd crash when osd > reconnect to its peer > > Issue #10080 has been updated by Guang Yang. > > I am wondering if the following race occurred: > > Let us assume A and B are two OSDs having the connection (pipe) between > e

<    1   2