Re: [Lustre-discuss] Frequent appearence of LustreError: no handle for file close ino

2010-04-19 Thread Christos Theodosiou
On Fri, 2010-04-16 at 10:49 -0700, Andreas Dilger wrote: On 2010-04-16, at 01:27, Christos Theodosiou wrote: our lustre installation uses two failover MDSes, which serve 10 file-systems. We recently upgraded from 1.8.1.1 to 1.8.2 version. By monitoring the MDSes I noticed that we get

[Lustre-discuss] On-disk bitmap corrupted

2010-04-19 Thread Lu Wang
Dear all, We envolve in a same situation as problem discussed here: http://lists.lustre.org/pipermail/lustre-discuss/2009-January/009512.html One OST is set as read only the first time after it is remounted after a server crash. Apr 16 17:40:31 boss27 kernel: LDISKFS-fs

Re: [Lustre-discuss] Lost Files - How to remove from MDT

2010-04-19 Thread Charles Taylor
On Apr 18, 2010, at 11:46 AM, Bernd Schubert wrote: You don't need to take the filesystem offline for lfsck. You sure about that? Looking at http://wiki.lustre.org/manual/LustreManual18_HTML/LustreRecovery.html#50598012_37365 step 1 says Stop the Lustre File System. Also, I have

Re: [Lustre-discuss] Lost Files - How to remove from MDT

2010-04-19 Thread Charles Taylor
On Apr 18, 2010, at 1:14 PM, Andreas Dilger wrote: On 2010-04-18, at 07:16, Charles Taylor wrote: On Apr 18, 2010, at 9:35 AM, Miguel Afonso Oliveira wrote: You are going to have to use unlink with something like this: for file in lost_files unlink $file Nope. That's really no

[Lustre-discuss] Lustre Client - Memory Issue

2010-04-19 Thread Jagga Soorma
Hi Guys, My users are reporting some issues with memory on our lustre 1.8.1 clients. It looks like when they submit a single job at a time the run time was about 4.5 minutes. However, when they ran multiple jobs (10 or less) on a client with 192GB of memory on a single node the run time for each

Re: [Lustre-discuss] Lustre Client - Memory Issue

2010-04-19 Thread Andreas Dilger
There is a known problem with the DLM LRU size that may be affecting you. It may be something else too. Please check /proc/ {slabinfo,meminfo} to see what is using the memory on the client. Cheers, Andreas On 2010-04-19, at 10:43, Jagga Soorma jagg...@gmail.com wrote: Hi Guys, My users

[Lustre-discuss] LBUG: ost_rw_hpreq_check() ASSERTION(nb != NULL) failed

2010-04-19 Thread Erich Focht
Hi, we saw this LBUG 3 times within past week, and are puzzled of what's going on, and how comes there's no bugzilla entry for this... What happens is that on an OSS a request (must be read or write) expects (according to the content of the ioobj structure) to find an array of 22 struct

Re: [Lustre-discuss] Lustre Client - Memory Issue

2010-04-19 Thread Jagga Soorma
Thanks for the response Andreas. What is the known problem with the DLM LRU size? Here is what my slabinfo/meminfo look like on one of the clients. I don't see anything out of the ordinary: (then again there are no jobs currently running on this system) Thanks -J -- slabinfo: .. slabinfo -

Re: [Lustre-discuss] Lost Files - How to remove from MDT

2010-04-19 Thread Lundgren, Andrew
I was also going to recommend the unlink. We have had to do this as well, the unlink worked for us. It did need to be run with privileges for the file. (root in our case.) -- Andrew -Original Message- From: lustre-discuss-boun...@lists.lustre.org

[Lustre-discuss] Lustre MDS unable to start

2010-04-19 Thread neutron
Hi all, I'm running a small Luster system(1.8.1.1): 1 MDS, 1 OSS, 2 clients. Each node has 1gig and infiniband (mlx4_0) with ipoib setup. I'm trying to use IB transport. The /etc/modprobe.conf is the same for all nodes: -- alias eth0 e1000e alias eth1 e1000e alias eth2 8139too alias

Re: [Lustre-discuss] LBUG: ost_rw_hpreq_check() ASSERTION(nb != NULL) failed

2010-04-19 Thread Bernd Schubert
Hello Erich, check out my bug report: https://bugzilla.lustre.org/show_bug.cgi?id=19992 It was closed as duplicate of bug 16129, although that is probably not correct, as 16129 is the root cause, but not the solution. As we never observed it with 1.6.7.2 I didn't complain bug 19992 was

Re: [Lustre-discuss] Inactive OST

2010-04-19 Thread Andreas Dilger
On 2010-04-19, at 01:41, x...@xgl.pereslavl.ru wrote: I have 1 OST that seems like inactive device on client: [Client] lfs df -h UUID bytes Used Available Use% Mounted on lustre00-MDT_UUID814.8G471.8M767.8G0% /mnt/ lustre00[MDT:0]

Re: [Lustre-discuss] Frequent appearence of LustreError: no handle for file close ino

2010-04-19 Thread Dmitry Zogin
Christos Theodosiou wrote: On Fri, 2010-04-16 at 10:49 -0700, Andreas Dilger wrote: On 2010-04-16, at 01:27, Christos Theodosiou wrote: our lustre installation uses two failover MDSes, which serve 10 file-systems. We recently upgraded from 1.8.1.1 to 1.8.2 version. By monitoring the

Re: [Lustre-discuss] Lustre Client - Memory Issue

2010-04-19 Thread Andreas Dilger
On 2010-04-19, at 11:16, Jagga Soorma wrote: What is the known problem with the DLM LRU size? It is mostly a problem on the server, actually. Here is what my slabinfo/meminfo look like on one of the clients. I don't see anything out of the ordinary: (then again there are no jobs