Re: [Lustre-discuss] slow direct_io , slow journal .. in OST log

2010-01-26 Thread Lex
There was a problem with our raid controller this morning, raid array was degraded ( I reinstalled the hard drive and its state was *rebuilding* right after that ) and from messages log, i saw many warnings like this: *Jan 26 13:59:13 OST6 kernel: Lustre:

Re: [Lustre-discuss] slow direct_io , slow journal .. in OST log

2010-01-26 Thread Brian J. Murrell
On Tue, 2010-01-26 at 14:12 +0700, Lex wrote: There was a problem with our raid controller this morning, raid array was degraded ( I reinstalled the hard drive and its state was rebuilding right after that ) and from messages log, i saw many warnings like this: Jan 26 13:59:13 OST6

[Lustre-discuss] problem with few partitions

2010-01-26 Thread Giacinto Donvito
Hi all, I have some problem in getting the client connected with few partition of a lustre file-system In particular some days ago, we have serious issues on two raidset. After a reboot the partition becomes already available, at least locally to the file server. I tried to mount the

Re: [Lustre-discuss] Permanently delete OST

2010-01-26 Thread Lundgren, Andrew
15345 iirc -Original Message- From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Brian J. Murrell Sent: Monday, January 25, 2010 1:20 PM To: lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] Permanently delete OST On

Re: [Lustre-discuss] Permanently delete OST

2010-01-26 Thread Brian J. Murrell
On Tue, 2010-01-26 at 08:23 -0700, Lundgren, Andrew wrote: 15345 iirc Yes, I looked at that bug yesterday. I don't see anything in there that provides any sort of --perm argument to completely purge an OST from the configuration. b. signature.asc Description: This is a digitally signed

[Lustre-discuss] 8T Limit

2010-01-26 Thread Roger Spellman
I recall that there is an 8T limit on the size of an OST. Is that just for Lustre 1.6? Or also 1.8 and 2.0? Now that we are starting to get 2T drives, is there any way to exceed 8T? Thanks. Roger Spellman Staff Engineer Terascala, Inc. 508-588-1501 www.terascala.com

Re: [Lustre-discuss] 8T Limit

2010-01-26 Thread Brian J. Murrell
On Tue, 2010-01-26 at 09:18 -0500, Roger Spellman wrote: I recall that there is an 8T limit on the size of an OST. Is that just for Lustre 1.6? Or also 1.8 and 2.0? Now that we are starting to get 2T drives, is there any way to exceed 8T? I believe 16T is coming in 1.8.2 for distros

Re: [Lustre-discuss] 8T Limit

2010-01-26 Thread Peter Jones
Yes, that is correct. So far we have only tested on RHEL5, but I believe that this should also work in theory on SLES11. However, we have so far been unable to find a volunteer able to help with SLES11 testing to confirm that the practice matches up with the theory. I have been talking with a

Re: [Lustre-discuss] problem with few partitions

2010-01-26 Thread Brian J. Murrell
On Tue, 2010-01-26 at 15:48 +0100, Giacinto Donvito wrote: Jan 26 14:43:59 dot1-se-01 kernel: LustreError: 6542:0:(filter_io_26.c:684:filter_commitrw_write()) error starting transaction: rc = -30 Your target is read-only, typically because ldiskfs has found a critical error and switched the

Re: [Lustre-discuss] 8T Limit

2010-01-26 Thread Johann Lombardi
On Tue, Jan 26, 2010 at 09:15:13AM -0800, Peter Jones wrote: Yes, that is correct. So far we have only tested on RHEL5 To be clear, 16TB support is only for rhel5 using ext4-based ldiskfs. The default for rhel5 is still to use ext3. We will actually provide 2 sets of rpms, one with ext3-based

Re: [Lustre-discuss] problem with few partitions

2010-01-26 Thread Giacinto Donvito
Thanks Brian, but, I have tested it already with at least one partition: e2fsck -fy /dev/sdb1 but anyhow the client still do not see it after a partition remount. It could be possible that I need to switch the entire server off and restart it? What else I could check ? Cheers, Giacinto

Re: [Lustre-discuss] 8T Limit

2010-01-26 Thread Tommi T
--- On Tue, 1/26/10, Johann Lombardi joh...@sun.com wrote To be clear, 16TB support is only for rhel5 using ext4-based ldiskfs. The default for rhel5 is still to use ext3. We will actually provide 2 sets of rpms, one with ext3-based ldiskfs (for people who do not need 8TB support) and one

Re: [Lustre-discuss] 8T Limit

2010-01-26 Thread Johann Lombardi
On Tue, Jan 26, 2010 at 08:30:32PM +0100, Peter Kjellstrom wrote: Why is there a 16TiB limit? Ext4 has no such limit (except for 16TiB _file_ size). Because we have only tested validated up to 16TB. You are of course free to try with bigger device (use the force_over_16tb mount option), but at

Re: [Lustre-discuss] Understanding MDS and documentation of MDS/MDT

2010-01-26 Thread Johann Lombardi
On Sun, Jan 24, 2010 at 12:47:32AM -0500, Vilobh Meshram wrote: Hi, I am new to Luster File System and wanted to understand the internals.I wanted to know more about MDS/MDT/OSS/OST component. Please point me to some link. The lustre manual is available online:

[Lustre-discuss] /etc/init.d/openibd hanging system shutdown

2010-01-26 Thread Jagga Soorma
Hi Guys, My SLES clients are hanging during a system shutdown right after giving me a message of Shutting down D-Bus Daemon done: /etc/init.d/rc3.d: ls K08* K08dbus K08openibd Has anyone noticed this on any of their clients if you are running sles? I have to kill the power on my client every

Re: [Lustre-discuss] /etc/init.d/openibd hanging system shutdown

2010-01-26 Thread Jagga Soorma
Did some more digging around and the mlx4_ib module is not unloading: ..snip.. + '[' '!' -z 'ERROR: Removing '\''mlx4_ib'\'': Device or resource busy' ']' ..snip.. # rmmod mlx4_ib ERROR: Removing 'mlx4_ib': Device or resource busy #lctl net down LNET busy How can I shutdown lctl network? I do

Re: [Lustre-discuss] slow direct_io , slow journal .. in OST log

2010-01-26 Thread Lex
Hi all I heard somewhere about oversubscribing issue related to ost thread, but i just wonder why i calculated followed the function that i founded in the manual ( *thread_number = RAM * CPU core / 128 MB* - do correct me if there's something wrong with it, please ) , the oversubscribing warning

Re: [Lustre-discuss] /etc/init.d/openibd hanging system shutdown

2010-01-26 Thread Erik Froese
try lctl net down or lctl net unconfigure If those fail run lustre_rmmod Erik On Tue, Jan 26, 2010 at 6:56 PM, Jagga Soorma jagg...@gmail.com wrote: Did some more digging around and the mlx4_ib module is not unloading: ..snip.. + '[' '!' -z 'ERROR: Removing '\''mlx4_ib'\'': Device or

Re: [Lustre-discuss] /etc/init.d/openibd hanging system shutdown

2010-01-26 Thread Christopher J. Morrone
Here we use an init script that does that. We use it on RHEL based systems, but it shouldn't take much work to modify it for your needs: http://github.com/morrone/lustre/raw/1.8.2.0-5chaos/lustre/scripts/lnet It is also in a patch attached to bug 20165 (/etc/init.d/lnet) along with heartbeat