Thank you. In the meantime, turning off parallel readdir should prevent the
first crash.
On 20 June 2018 at 21:42, mohammad kashif wrote:
> Hi Nithya
>
> Thanks for the bug report. This new crash happened only once and only at
> one client in the last 6 days. I will let you know if it happened
Hi Nithya
Thanks for the bug report. This new crash happened only once and only at
one client in the last 6 days. I will let you know if it happened again or
more frequently.
Cheers
Kashif
On Wed, Jun 20, 2018 at 12:28 PM, Nithya Balachandran
wrote:
> Hi Mohammad,
>
> This is a different cras
Hi Mohammad,
This is a different crash. How often does it happen?
We have managed to reproduce the first crash you reported and a bug has
been filed at [1].
We will work on a fix for this.
Regards,
Nithya
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1593199
On 18 June 2018 at 14:09, moha
Hi
Problem appeared again after few days. This time, the client
is glusterfs-3.10.12-1.el6.x86_64 and performance.parallel-readdir is off.
The log level was set to ERROR and I got this log at the time of crash
[2018-06-14 08:45:43.551384] E [rpc-clnt.c:365:saved_frames_unwind] (-->
/usr/lib64/lib
On Mon, Jun 18, 2018 at 9:39 AM, Raghavendra Gowdappa
wrote:
>
>
> On Mon, Jun 18, 2018 at 8:11 AM, Raghavendra Gowdappa > wrote:
>
>> From the bt:
>>
>> #8 0x7f6ef977e6de in rda_readdirp (frame=0x7f6eec862320,
>> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=357, off=2,
>> xdata=0x7f6eec008
On Mon, Jun 18, 2018 at 8:11 AM, Raghavendra Gowdappa
wrote:
> From the bt:
>
> #8 0x7f6ef977e6de in rda_readdirp (frame=0x7f6eec862320,
> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=357, off=2,
> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
> #9 0x7f6ef952db4c in dht_readdirp_cbk (fra
On Mon, Jun 18, 2018 at 8:11 AM, Raghavendra Gowdappa
wrote:
> From the bt:
>
> #8 0x7f6ef977e6de in rda_readdirp (frame=0x7f6eec862320,
> this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=357, off=2,
> xdata=0x7f6eec0085a0) at readdir-ahead.c:266
> #9 0x7f6ef952db4c in dht_readdirp_cbk (fra
>From the bt:
#8 0x7f6ef977e6de in rda_readdirp (frame=0x7f6eec862320,
this=0x7f6ef4019f20, fd=0x7f6ed40077b0, size=357, off=2,
xdata=0x7f6eec0085a0) at readdir-ahead.c:266
#9 0x7f6ef952db4c in dht_readdirp_cbk (frame=,
cookie=0x7f6ef4019f20, this=0x7f6ef40218a0, op_ret=2, op_errno=0,
or
Hi Nithya
Fuse volfiles is here after disabling parallel-readdir
http://www-pnp.physics.ox.ac.uk/~mohammad/atlasglust.tcp-fuse.vol
a
Unfortunately I can't take risk of enabling parallel-readdir as the cluster
is in heavy use and likely to kill many jobs if clients unmounted again.
There is one th
On 15 June 2018 at 13:45, Nithya Balachandran wrote:
> Hi Mohammad,
>
> I was unable to reproduce this on a volume created on a system running
> 3.12.9.
>
> Can you send me the FUSE volfiles for the volume atlasglust? They will be
> in /var/lib/glusterd/vols/atlasglust/ on any of the gluster se
Hi Mohammad,
I was unable to reproduce this on a volume created on a system running
3.12.9.
Can you send me the FUSE volfiles for the volume atlasglust? They will be
in /var/lib/glusterd/vols/atlasglust/ on any of the gluster servers
hosting the volume and called *.tcp-fuse.vol.
Thanks,
Nithy
Hi Nithya
It seems that problem can be solved by either turning parallel-readir off
or downgrading client to 3.10.12-1 . Yesterday I downgraded some clients to
3.10.12-1 and it seems to fixed the problem. Today when I saw your email
then I disabled parallel-readir off and the current client 3.12.9
+Poornima who works on parallel-readdir.
@Poornima, Have you seen anything like this before?
On 14 June 2018 at 10:07, Nithya Balachandran wrote:
> This is not the same issue as the one you are referring - that was in the
> RPC layer and caused the bricks to crash. This one is different as it s
This is not the same issue as the one you are referring - that was in the
RPC layer and caused the bricks to crash. This one is different as it seems
to be in the dht and rda layers. It does look like a stack overflow though.
@Mohammad,
Please send the following information:
1. gluster volume in
+Nithya
Nithya,
Do these logs [1] look similar to the recursive readdir() issue that you
encountered just a while back ?
i.e. recursive readdir() response definition in the XDR
[1] http://www-pnp.physics.ox.ac.uk/~mohammad/backtrace.log
On Wed, Jun 13, 2018 at 4:29 PM, mohammad kashif
wrote:
Hi Milind
Thanks a lot, I manage to run gdb and produced traceback as well. Its here
http://www-pnp.physics.ox.ac.uk/~mohammad/backtrace.log
I am trying to understand but still not able to make sense out of it.
Thanks
Kashif
On Wed, Jun 13, 2018 at 11:34 AM, Milind Changire
wrote:
> Kashif
Kashif,
FYI: http://debuginfo.centos.org/centos/6/storage/x86_64/
On Wed, Jun 13, 2018 at 3:21 PM, mohammad kashif
wrote:
> Hi Milind
>
> There is no glusterfs-debuginfo available for gluster-3.12 from
> http://mirror.centos.org/centos/6/storage/x86_64/gluster-3.12/ repo. Do
> you know from whe
Hi Milind
There is no glusterfs-debuginfo available for gluster-3.12 from
http://mirror.centos.org/centos/6/storage/x86_64/gluster-3.12/ repo. Do you
know from where I can get it?
Also when I run gdb, it says
Missing separate debuginfos, use: debuginfo-install
glusterfs-fuse-3.12.9-1.el6.x86_64
Hi Milind
I will send you links for logs.
I collected these core dumps at client and there is no glusterd process
running on client.
Kashif
On Tue, Jun 12, 2018 at 4:14 PM, Milind Changire
wrote:
> Kashif,
> Could you also send over the client/mount log file as Vijay suggested ?
> Or maybe
Hi Vijay
I have enabled TRACE for client and there are lots of Trace messages in log
but no 'crash'
The only error I can see is about inode context is NULL
[io-cache.c:564:ioc_open_cbk] 0-atlasglust-io-cache: inode context is NULL
(748157d2-274f-4595-9bb6-afb1fb5a0642) [Invalid argument]
Kashif
Kashif,
Could you also send over the client/mount log file as Vijay suggested ?
Or maybe the lines with the crash backtrace lines
Also, you've mentioned that you straced glusterd, but when you ran gdb, you
ran it over /usr/sbin/glusterfs
On Tue, Jun 12, 2018 at 8:19 PM, Vijay Bellur wrote:
>
>
On Tue, Jun 12, 2018 at 7:40 AM, mohammad kashif
wrote:
> Hi Milind
>
> The operating system is Scientific Linux 6 which is based on RHEL6. The
> cpu arch is Intel x86_64.
>
> I will send you a separate email with link to core dump.
>
You could also grep for crash in the client log file and the
Hi Milind
The operating system is Scientific Linux 6 which is based on RHEL6. The cpu
arch is Intel x86_64.
I will send you a separate email with link to core dump.
Thanks for your help.
Kashif
On Tue, Jun 12, 2018 at 3:16 PM, Milind Changire
wrote:
> Kashif,
> Could you share the core dump
Kashif,
Could you share the core dump via Google Drive or something similar
Also, let me know the CPU arch and OS Distribution on which you are running
gluster.
If you've installed the glusterfs-debuginfo package, you'll also get the
source lines in the backtrace via gdb
On Tue, Jun 12, 2018 a
Hi Milind, Vijay
Thanks, I have some more information now as I straced glusterd on client
138544 0.000131 mprotect(0x7f2f70785000, 4096, PROT_READ|PROT_WRITE) =
0 <0.26>
138544 0.000128 mprotect(0x7f2f70786000, 4096, PROT_READ|PROT_WRITE) =
0 <0.27>
138544 0.000126 mprotect
Kashif,
You can change the log level by:
$ gluster volume set diagnostics.brick-log-level TRACE
$ gluster volume set diagnostics.client-log-level TRACE
and see how things fare
If you want fewer logs you can change the log-level to DEBUG instead of
TRACE.
On Tue, Jun 12, 2018 at 3:37 PM, moha
Hi Vijay
Now it is unmounting every 30 mins !
The server log at /var/log/glusterfs/bricks/glusteratlas-brics001-gv0.log
have this line only
2018-06-12 09:53:19.303102] I [MSGID: 115013]
[server-helpers.c:289:do_fd_cleanup] 0-atlasglust-server: fd cleanup on
/atlas/atlasdata/zgubic/hmumu/histogra
On Mon, Jun 11, 2018 at 8:50 AM, mohammad kashif
wrote:
> Hi
>
> Since I have updated our gluster server and client to latest version
> 3.12.9-1, I am having this issue of gluster getting unmounted from client
> very regularly. It was not a problem before update.
>
> Its a distributed file system
Hi
Since I have updated our gluster server and client to latest version
3.12.9-1, I am having this issue of gluster getting unmounted from client
very regularly. It was not a problem before update.
Its a distributed file system with no replication. We have seven servers
totaling around 480TB data
29 matches
Mail list logo