Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-19 Thread Yan, Zheng
rthreading) > and I am not so sure anymore that this is memory related. > > For further debugging, I've updated >http://tracker.ceph.com/issues/16610 > with a summary of my finding plus some log files: > - The gdb.txt I get after running > $ gdb /path/to/ceph-fuse

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-18 Thread Goncalo Borges
The debug.out (gzipped) I get after running ceph-fuse in debug mode with 'debug client 20' and 'debug objectcacher = 20' Cheers Goncalo ____ From: Gregory Farnum [gfar...@redhat.com] Sent: 12 July 2016 03:07 To: Goncalo Borges Cc: John Spray;

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-14 Thread Goncalo Borges
I get after running ceph-fuse in debug mode with 'debug client 20' and 'debug objectcacher = 20' Cheers Goncalo From: Gregory Farnum [gfar...@redhat.com] Sent: 12 July 2016 03:07 To: Goncalo Borges Cc: John Spray; ceph-users Subject:

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-14 Thread Brad Hubbard
- The gdb.txt I get after running > > $ gdb /path/to/ceph-fuse core. > > (gdb) set pag off > > (gdb) set log on > > (gdb) thread apply all bt > > (gdb) thread apply all bt full > > as advised by Brad > > - The debug.out (gzipped) I get after runn

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-14 Thread Yan, Zheng
(gdb) set log on > (gdb) thread apply all bt > (gdb) thread apply all bt full > as advised by Brad > - The debug.out (gzipped) I get after running ceph-fuse in debug mode with > 'debug client 20' and 'debug objectcacher = 20' > > Cheers > Goncalo >

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-14 Thread Brad Hubbard
b.txt I get after running > >$ gdb /path/to/ceph-fuse core.XXXX > >(gdb) set pag off > >(gdb) set log on > >(gdb) thread apply all bt > >(gdb) thread apply all bt full > >as advised by Brad > > - The debug.out (gzipped) I get after runnin

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-14 Thread Goncalo Borges
_ From: Gregory Farnum [gfar...@redhat.com] Sent: 12 July 2016 03:07 To: Goncalo Borges Cc: John Spray; ceph-users Subject: Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2) Oh, is this one of your custom-built packages? Are they using tcmalloc? That difference between VSZ and RSS looks l

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-12 Thread Goncalo Borges
ectcacher = 20' Cheers Goncalo From: Gregory Farnum [gfar...@redhat.com] Sent: 12 July 2016 03:07 To: Goncalo Borges Cc: John Spray; ceph-users Subject: Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2) Oh, is this one of your custom-built packages? A

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-11 Thread Yan, Zheng
On Tue, Jul 12, 2016 at 1:07 AM, Gregory Farnum wrote: > Oh, is this one of your custom-built packages? Are they using > tcmalloc? That difference between VSZ and RSS looks like a glibc > malloc problem. > -Greg > ceph-fuse at http://download.ceph.com/rpm-jewel/el7/x86_64/ is not linked to libtcm

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-11 Thread Gregory Farnum
Oh, is this one of your custom-built packages? Are they using tcmalloc? That difference between VSZ and RSS looks like a glibc malloc problem. -Greg On Mon, Jul 11, 2016 at 12:04 AM, Goncalo Borges wrote: > Hi John... > > Thank you for replying. > > Here is the result of the tests you asked but I

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-11 Thread Patrick Donnelly
Hi Goncalo, On Fri, Jul 8, 2016 at 3:01 AM, Goncalo Borges wrote: > 5./ I have noticed that ceph-fuse (in 10.2.2) consumes about 1.5 GB of > virtual memory when there is no applications using the filesystem. > > 7152 root 20 0 1108m 12m 5496 S 0.0 0.0 0:00.04 ceph-fuse > > When I onl

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-11 Thread John Spray
On Mon, Jul 11, 2016 at 8:04 AM, Goncalo Borges wrote: > Hi John... > > Thank you for replying. > > Here is the result of the tests you asked but I do not see nothing abnormal. Thanks for running through that. Yes, nothing in the output struck me as unreasonable either :-/ > Actually, your sugg

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-11 Thread Goncalo Borges
On 07/11/2016 05:04 PM, Goncalo Borges wrote: Hi John... Thank you for replying. Here is the result of the tests you asked but I do not see nothing abnormal. Actually, your suggestions made me see that: 1) ceph-fuse 9.2.0 is presenting the same behaviour but with less memory consumption,

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-11 Thread Goncalo Borges
Hi John... Thank you for replying. Here is the result of the tests you asked but I do not see nothing abnormal. Actually, your suggestions made me see that: 1) ceph-fuse 9.2.0 is presenting the same behaviour but with less memory consumption, probably, less enought so that it doesn't brake c

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-08 Thread John Spray
On Fri, Jul 8, 2016 at 8:01 AM, Goncalo Borges wrote: > Hi Brad, Patrick, All... > > I think I've understood this second problem. In summary, it is memory > related. > > This is how I found the source of the problem: > > 1./ I copied and adapted the user application to run in another cluster of >

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-08 Thread Goncalo Borges
Hi Brad, Patrick, All... I think I've understood this second problem. In summary, it is memory related. This is how I found the source of the problem: 1./ I copied and adapted the user application to run in another cluster of ours. The idea was for me to understand the application an

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-07 Thread Brad Hubbard
Hi Goncalo, If possible it would be great if you could capture a core file for this with full debugging symbols (preferably glibc debuginfo as well). How you do that will depend on the ceph version and your OS but we can offfer help if required I'm sure. Once you have the core do the following.

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-07 Thread Patrick Donnelly
On Thu, Jul 7, 2016 at 2:01 AM, Goncalo Borges wrote: > Unfortunately, the other user application breaks ceph-fuse again (It is a > completely different application then in my previous test). > > We have tested it in 4 machines with 4 cores. The user is submitting 16 > single core jobs which are a

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-06 Thread Goncalo Borges
My previous email did not go through because of its size. Here goes a new attempt: Cheers Goncalo --- * --- Hi Patrick, Brad... Unfortunately, the other user application breaks ceph-fuse again (It is a completely different application then in my previous test). We have tested it in 4 machi

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-06 Thread Brad Hubbard
On Thu, Jul 7, 2016 at 12:31 AM, Patrick Donnelly wrote: > > The locks were missing in 9.2.0. There were probably instances of the > segfault unreported/unresolved. Or even unseen :) Race conditions are funny things and extremely subtle changes in timing introduced by any number of things can af

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-06 Thread Patrick Donnelly
Hi Goncalo, On Wed, Jul 6, 2016 at 2:18 AM, Goncalo Borges wrote: > Just to confirm that, after applying the patch and recompiling, we are no > longer seeing segfaults. > > I just tested with a user application which would kill ceph-fuse almost > instantaneously. Now it is running for quite some

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-05 Thread Goncalo Borges
Hi All... Just to confirm that, after applying the patch and recompiling, we are no longer seeing segfaults. I just tested with a user application which would kill ceph-fuse almost instantaneously. Now it is running for quite some time, reading and updating the files that it should. I sho

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-04 Thread Goncalo Borges
Will do Brad. From you answer it should be a safe thing to do. Will report later. Thanks for the help Cheers Goncalo On 07/05/2016 02:42 PM, Brad Hubbard wrote: On Tue, Jul 5, 2016 at 1:34 PM, Patrick Donnelly wrote: Hi Goncalo, I believe this segfault may be the one fixed here: https:

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-04 Thread Goncalo Borges
Hi Brad, Shinobu, Patrick... Indeed if I run with 'debug client = 20' it seems I get a very similar log to what Patrick has in the patch. However it is difficult for me to really say if it is exactly the same thing. One thing I could try is simply to apply the fix in the source code and reco

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-04 Thread Brad Hubbard
On Tue, Jul 5, 2016 at 1:34 PM, Patrick Donnelly wrote: > Hi Goncalo, > > I believe this segfault may be the one fixed here: > > https://github.com/ceph/ceph/pull/10027 Ah, nice one Patrick. Goncalo, the patch is fairly simple, just the addition of a lock on two lines to resolve the race. Could

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-04 Thread Patrick Donnelly
Hi Goncalo, I believe this segfault may be the one fixed here: https://github.com/ceph/ceph/pull/10027 (Sorry for brief top-post. Im on mobile.) On Jul 4, 2016 9:16 PM, "Goncalo Borges" wrote: > > Dear All... > > We have recently migrated all our ceph infrastructure from 9.2.0 to 10.2.2. > > W

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-04 Thread Brad Hubbard
On Tue, Jul 5, 2016 at 12:13 PM, Shinobu Kinjo wrote: > Can you reproduce with debug client = 20? In addition to this I would suggest making sure you have debug symbols in your build and capturing a core file. You can do that by setting "ulimit -c unlimited" in the environment where ceph-fuse is

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-04 Thread Shinobu Kinjo
Can you reproduce with debug client = 20? On Tue, Jul 5, 2016 at 10:16 AM, Goncalo Borges < goncalo.bor...@sydney.edu.au> wrote: > Dear All... > > We have recently migrated all our ceph infrastructure from 9.2.0 to 10.2.2. > > We are currently using ceph-fuse to mount cephfs in a number of client

[ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-04 Thread Goncalo Borges
Dear All... We have recently migrated all our ceph infrastructure from 9.2.0 to 10.2.2. We are currently using ceph-fuse to mount cephfs in a number of clients. ceph-fuse 10.2.2 client is segfaulting in some situations. One of the scenarios where ceph-fuse segfaults is when a user submits a pa