Re: [Lustre-discuss] Future of lustre 1.8.3+

2010-05-19 Thread Andreas Dilger
I've used a SLES kernel on an FC install for a long time on my home system. With newer distros there are also fewer changes to the base kernel, so there shouldn't be as much trouble to use e.g. the SLES 11 SP1 kernel (2.6.32) when it is released. Cheers, Andreas On 2010-05-19, at 6:01, Heik

Re: [Lustre-discuss] Future of lustre 1.8.3+

2010-05-19 Thread Andreas Dilger
e sun src patches are still missing in the lustre AND > e2fsprogs branches. I'm not sure what you mean. The e2fsprogs patches have always been in a separate repository from the core Lustre code, and all of the Lustre/ldiskfs kernel patches are in the Git repository. Cheers, Andreas --

Re: [Lustre-discuss] Ldiskfs corrupt

2010-05-18 Thread Andreas Dilger
e journal inode with "tune2fs -j /dev/XXX". Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Is using resize2fs on the MDT a supported option?

2010-05-18 Thread Andreas Dilger
On 2010-05-14, at 10:41, Adeyemi Adesanya wrote: > We are are about to install a new Lustre 1.8.2 installation with ~1PB > of filesystem space. We have to make a decision regarding the MDT > storage and someone suggested that in the event we run out of inodes > on the MDT, using resize2fs wou

Re: [Lustre-discuss] problem with too many (default) ACLs on a directory

2010-05-13 Thread Andreas Dilger
On 2010-05-13, at 04:38, Frederik Ferner wrote: > Andreas Dilger wrote: >> On 2010-05-12, at 06:15, Frederik Ferner wrote: >>> we are having problems with ACLs at the moment. As far as we understand >>> this is what has happened. >>> >>> We have a dir

Re: [Lustre-discuss] problem with too many (default) ACLs on a directory

2010-05-13 Thread Andreas Dilger
, since the RDMA reply buffers have to be allocated before the client knows how many ACLs are stored on the file. Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc. ___ Lustre-discuss mailing list Lustre-discuss@l

Re: [Lustre-discuss] Problems with MDS Crashing

2010-05-12 Thread Andreas Dilger
etdump to get the actual error messages on the console when it hangs. Doing "sysrq-p" or "sysrq-t" to see if it is stuck in some thread, if there are no error messages on the console. Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc. __

Re: [Lustre-discuss] Lustre 1.6.6 to 1.8.3 upgrade + MDT hardware migration

2010-05-11 Thread Andreas Dilger
r on the server, or both. > Any comments to this? Does it basically work to have 1.8 Clients and 1.6 > Servers? We do test 1.8.latest with 1.6.latest whenever we make a release. > On 5/11/2010 9:56 AM, Andreas Dilger wrote: >> On 2010-05-10, at 15:59, Greg Mason wrote: >>&

Re: [Lustre-discuss] Lustre 1.6.6 to 1.8.3 upgrade + MDT hardware migration

2010-05-11 Thread Andreas Dilger
6.7 clients? I know a few sites are currently going through the same process on the servers, and I expect they have to run with clients at 1.6 for at least a short time before they upgrade to 1.8 due to complex environments that don't allo

Re: [Lustre-discuss] Lustre, NFS and mds_getattr_lock operation

2010-05-07 Thread Andreas Dilger
On 2010-05-07, at 05:12, Frederik Ferner wrote: > Andreas Dilger wrote: >> On 2010-05-06, at 11:57, Frederik Ferner wrote: >>> On our Lustre system we are seeing the following error fairly >>> regularly, so far we have not had complaints from users and have >>&

Re: [Lustre-discuss] Lustre, NFS and mds_getattr_lock operation

2010-05-06 Thread Andreas Dilger
but I don't know at all. > Does anyone know if we should worry about those messages or if we can > safely ignore them? Or should we assume that some of our users might > have a problem accessing data that they have just not reported? Even > though I find that unlikely. Cheers

Re: [Lustre-discuss] MDS inode allocation question

2010-04-28 Thread Andreas Dilger
On 2010-04-28, at 7:44, Gary Molenkamp wrote: > When I create the MDS, I specified '-i 1024' and I can see (locally) > 800M inodes, but only part of the available space is allocated. This is to be expected. There needs to be free space on the MDS for directories, striping and other internal us

Re: [Lustre-discuss] Newbie w/issues

2010-04-27 Thread Andreas Dilger
This means that your OST is not available. Maybe it is nor mounted? Cheers, Andreas On 2010-04-27, at 19:38, Brian Andrus wrote: > On 4/27/2010 6:10 PM, Oleg Drokin wrote: >> Hello! >> >> On Apr 27, 2010, at 7:29 PM, Brian Andrus wrote: >> >>> Apr 27 16:15:19 nas-0-1 kernel: LustreError: 4133:0

Re: [Lustre-discuss] Future of LusterFS?

2010-04-26 Thread Andreas Dilger
ing. >>> >>> Have you had any hardware failures? >>> If yes, how well has the cluster cooped with the loss of the machine(s)? >>> >>> >>> Any advice you can share from your initial setup of lustre? >> >> ________

Re: [Lustre-discuss] 1.8/2.6.32 support

2010-04-26 Thread Andreas Dilger
e we can't compile 1.6.7.2 on 2.6.32 and 2.0 is still not > in a production state. There is work going on in bugzilla for b1_8 SLES11 SP1(?) kernel support, which will hopefully also be usable for RHEL6, when it is available. Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corpor

Re: [Lustre-discuss] Lots of "No ctxt" after OST crush

2010-04-26 Thread Andreas Dilger
The missing logdile problem is easily fixed - delete the CATALOGS file on the MDT and restart. There is a bug just opened to handle this better, but it isn't fixed yet. Cheers, Andreas On 2010-04-26, at 7:00, Thomas Roth wrote: > Hi all, > > one of our OSTs crushed - actually we ran into Bu

Re: [Lustre-discuss] Moving files off an OST

2010-04-24 Thread Andreas Dilger
disk. One possibility is that you have open files that are holding this space in use. If you unmount the MDT (use "umount -f", which will evict all of the clients, though this will cause applications to see IO errors, if that is acceptable) and mount it again, does the sp

Re: [Lustre-discuss] MDS inode allocation question

2010-04-24 Thread Andreas Dilger
t; MDS/MGS on 880G logical drive: >> mkfs.lustre --fsname gulfwork --mdt --mgs --mkfsoptions='-i 1024' >> --failnode=10.18.12.1 /dev/sda >> >> OSSs on 9.1TB logical drives: >> /usr/sbin/mkfs.lustre --fsname gulfwork --ost --mgsnode=10.18.1...@tcp >> --mgsnode=10.18.1...@t

Re: [Lustre-discuss] Lustre loadgen error

2010-04-22 Thread Andreas Dilger
..@o2ib > Added uuid OSS_UUID: 192.168.11...@o2ib > Target OST name is 'lustre-OST-osc' > loadgen> st 3 > start 0 to 3 > loadgen: running thread #1 > Segmentation fault > > > Meet same error on both OSS-es and client using any number of clients. I believ

Re: [Lustre-discuss] Future of LusterFS?

2010-04-21 Thread Andreas Dilger
into the future. > If using lusterfs in a production environment, it would be good to know > that it won't be discontinued. > > Will there be a long term future for lusterfs? Yes. Cheers, Andreas -- Andreas Dilger Principal Enginee

Re: [Lustre-discuss] Dual Homed Filesystem Issue

2010-04-21 Thread Andreas Dilger
expected some sort of failure message unless it is not reaching it at > all. I suspect you need to rewrite the filesystem configuration to include these new interfaces. I believe there is a section in the manual on how to correctly change network interfaces. Cheers, Andreas -- Andreas

Re: [Lustre-discuss] Lustre Client - Memory Issue

2010-04-19 Thread Andreas Dilger
. They also noticed that the swap space kept climbing even though there was plenty of free memory on the system. Could this possibly be related to the lustre client? Does it reserve any memory that is not accessible by any other process even though it might not be in use? Cheers, Andreas

Re: [Lustre-discuss] Inactive OST

2010-04-19 Thread Andreas Dilger
ystem, one in the "lustre00" filesystem, so it seems you have some sort of a configuration problem. Cheers, Andreas -- Andreas Dilger Principal Engineer, Lustre Group Oracle Corporation Canada Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Lustre Client - Memory Issue

2010-04-19 Thread Andreas Dilger
There is a known problem with the DLM LRU size that may be affecting you. It may be something else too. Please check /proc/ {slabinfo,meminfo} to see what is using the memory on the client. Cheers, Andreas On 2010-04-19, at 10:43, Jagga Soorma wrote: > Hi Guys, > > My users are reporting som

Re: [Lustre-discuss] Lost Files - How to remove from MDT

2010-04-18 Thread Andreas Dilger
d gone anyway. What error messages are posted on the console log (dmesg/syslog)? Cheers, Andreas -- Andreas Dilger Principal Engineer, Lustre Group Oracle Corporation Canada Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] odd kernel crash after a heartbeat failover

2010-04-16 Thread Andreas Dilger
33 to see what line it is. This Oops shouldn't be happening, even if the journal has aborted. Cheers, Andreas -- Andreas Dilger Principal Engineer, Lustre Group Oracle Corporation Canada Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] llapi stripe_size

2010-04-16 Thread Andreas Dilger
t at runtime? The manual is incorrect in this case. The correct limit is 65536 bytes, not 4096. It _used_ to be 4096 bytes, but since Linux supports client PAGE_SIZE up to 65536 bytes, and the VM cannot partially dirty a page, we do not support a stripe size that is smaller than a single

Re: [Lustre-discuss] Frequent appearence of LustreError: no handle for file close ino

2010-04-16 Thread Andreas Dilger
any way to find out more info about it. e.g. filesystem, > filename and lustre client that are related to this error? > c) is there any way to resolve this errors? > Cheers, Andreas -- Andreas Dilger Principal Engineer, Lustre Group Oracle Corporation Canada Inc. _

Re: [Lustre-discuss] OST fails to activate

2010-04-15 Thread Andreas Dilger
________ > Lustre-discuss mailing list > Lustre-discuss@lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss Cheers, Andreas -- Andreas Dilger Principal Engineer, Lustre Group Oracle Corporation Canada Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Multiply claimed blocks

2010-04-14 Thread Andreas Dilger
uot; causes files to be unlinked after cloning so they will be reconnected to /lost+found in pass 3. "delete" skips cloning entirely and simply deletes the files. You probably want to use the "-E shared=delete" option. Cheers, Andreas -

Re: [Lustre-discuss] fseeks on lustre

2010-04-14 Thread Andreas Dilger
2097152) = 2097152 > > As Andreas suspected, your application is doing 2MB reads every time. > Does it really need 2MB of data on each read? If not, can you fix > your > application to only read as much data as it actually wants? Cheers, Andreas -- Andreas Dilger Principal En

Re: [Lustre-discuss] fseeks on lustre

2010-04-13 Thread Andreas Dilger
data from the file was seeking and reading 2MB of extra data for each seek. It would be worthwhile to strace your application to see if it is doing the same thing. > Andreas Dilger wrote: >> On 2010-04-07, at 14:09, Ronald K Long wrote: >> > I am having an issue with our lustre

Re: [Lustre-discuss] Upgrading our filesystem

2010-04-13 Thread Andreas Dilger
On 2010-04-12, at 15:11, Norberto Meijome wrote: > On 12 April 2010 10:15, Andreas Dilger > wrote: >> I would suggest to keep the OST size uniform that you migrate the >> existing OSTs to the new 600GB drive LUNs then combine pairs of (now >> unused) 300GB LUNs into do

Re: [Lustre-discuss] Un-export filesystem on multi-filesystem MGS

2010-04-12 Thread Andreas Dilger
re, and mount it as type ldiskfs, do a backup of the filesystem, then delete the configuration file for the old filesystem that you want to re-use. This should be in the CONFIGS/ subdirectory, IIRC. Cheers, Andreas -- Andreas Dilger Principal Engineer, Lustre Group Oracl

Re: [Lustre-discuss] Upgrading our filesystem

2010-04-11 Thread Andreas Dilger
I would suggest to keep the OST size uniform that you migrate the existing OSTs to the new 600GB drive LUNs then combine pairs of (now unused) 300GB LUNs into double-sized OSTs to match the new ones. While the MDS will handle different-sized OSTs OK, it isn't the ideal situation. Cheers, An

Re: [Lustre-discuss] fseeks on lustre

2010-04-09 Thread Andreas Dilger
an expensive operation. Using SEEK_CUR or SEEK_SET has no cost at all. > Are there any tunable parameter in lustre that can alleviate this > problem? It depends on what the problem really is. Cheers, Andreas -- Andreas Dilger Principal Engineer, Lustre Group Oracle Corp

Re: [Lustre-discuss] "Setstripe count -1" what's the happen when one of the osts outs of space disk?

2010-04-05 Thread Andreas Dilger
will stripe across all "available" OSTs, which should skip the full OST. If you specify stripe_count=N, where N = number of OSTs (including the full one) then the allocation will fail. Cheers, Andreas -- Andreas Dilger Principal Engineer, Lustre Gro

Re: [Lustre-discuss] [Lustre-devel] Lustre Test systems

2010-03-27 Thread Andreas Dilger
Lustre (for basic functionality) on a laptop running a single virtual machine. No extra hardware required. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lu

Re: [Lustre-discuss] odd mount behavior

2010-03-26 Thread Andreas Dilger
used why it's attempting the o2ib NID repeatedly and never > tries the tcp NID... Ideas? A common cause for newly-installed systems is hosts.deny or firewall rules that are preventing connections on port 988. Cheers, Andreas -- Andreas Dilger Sr. Staf

Re: [Lustre-discuss] programmatic access to parameters

2010-03-26 Thread Andreas Dilger
. We are adding an llapi_get_param() interface for a future release of Lustre, but it wouldn't be too hard for someone to create a wrapper for this in 1.8.x either, Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Lustre, automount and EIO

2010-03-25 Thread Andreas Dilger
kernel debug logs for this failure. If there was an RPC timeout during connection (e.g. if the OST is slow to respond) then that should have produced an earlier console error. If the above operation is failing before trying to connect to the OST, then that should be fixed. Cheers, Andreas -- Andrea

Re: [Lustre-discuss] lustre 1.8.2 patchless client on suse 2.6.31 kernel

2010-03-25 Thread Andreas Dilger
heers, >> >> Wojciech > > > On 20 March 2010 05:46, Andreas Dilger wrote: > On 2010-03-19, at 08:56, Wojciech Turek wrote: > Thanks for a quick answer. I have tried to compile lustre from the > b1_8 branch but build process failed at the same place, so I guess &g

Re: [Lustre-discuss] programmatic access to parameters

2010-03-25 Thread Andreas Dilger
On 2010-03-25, at 15:12, Andreas Dilger wrote: >> The llapi_* functions are great, I see how to set the stripe count >> and size. I wasn't sure if there was also a function to query about >> the configuration, eg number of OST's deployed? > > There isn't d

Re: [Lustre-discuss] programmatic access to parameters

2010-03-25 Thread Andreas Dilger
Cray, SGI). The MPI hints will only be useful on implementation > that support the particular hint. From a consistency point of view > we need to both make use of MPI hints and direct access via the > llapi so that we run well on all those systems, regardless of which > MPI imp

Re: [Lustre-discuss] programmatic access to parameters

2010-03-24 Thread Andreas Dilger
n optimize these things for you, based on application hints. If you could elaborate on your needs, there may not be any need to make your application more Lustre-aware. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.

Re: [Lustre-discuss] BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/6024

2010-03-23 Thread Andreas Dilger
gt; [] ll_file_aio_read+0xf1a/0x2350 [lustre] > [] ll_file_read+0xb9/0xd0 [lustre] > [] vfs_read+0xaa/0x133 > [] sys_read+0x45/0x6e Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-di

Re: [Lustre-discuss] lustre 1.8.2 patchless client on suse 2.6.31 kernel

2010-03-19 Thread Andreas Dilger
7;t find anything related to that particular > problem. Do you maybe recall a BUG number? Bug 21500. I found it by searching for blk_queue_hardsect_size in attachments, patches only, for bugs that changed in the last 120 days (to avoid searching very old bugs). > On 18 March 2010 22:55

Re: [Lustre-discuss] Running fsck on disabled ost?

2010-03-18 Thread Andreas Dilger
cale catastrophic events... If the filesystem is damaged and you need to run e2fsck on it, then modifying the filesystem by trying to drain the files from the OST is a bad idea. You should minimize the amount of changes made to the filesystem before you can run e2fsck on it. Cheers, Andreas

Re: [Lustre-discuss] lustre 1.8.2 patchless client on suse 2.6.31 kernel

2010-03-18 Thread Andreas Dilger
c/lustre-1.8.2' > make: *** [all] Error 2 Please try the latest b1_8 Git repo and/or search bugzilla. I believe this is already fixed for 1.8.3, but it may still only be attached to a bug. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, In

Re: [Lustre-discuss] BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/6024

2010-03-18 Thread Andreas Dilger
preempt kernel. Will this fix make it into > mainline? I've submitted bug 22409 with this patch, though I've updated the comment. I can't say for sure which release it will be in, but I don't see a big barrier to accepting it in short order. Cheers, Andreas -- Andreas

Re: [Lustre-discuss] BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/6024

2010-03-17 Thread Andreas Dilger
gt; http://bugzilla.kernel.org/show_bug.cgi?id=12518 > > Just need to figure out if the fix can be backported to 2.6.27.39 I think that is a different bug. In lnet/libcfs/tracefile.c::ibcfs_debug_vmsg2() you could try moving set_ptldebug_header() after the call to trace_get_tcd(),

Re: [Lustre-discuss] lfs and df

2010-03-12 Thread Andreas Dilger
blem. "lfs df" should behave like "df" in this respect, printing the stats for all of the filesystems. I've filed bug 22327 for this issue, and it already has a patch for the fix. The only affected release is 1.8.2. Cheers, Andreas -- Andreas Dilger Sr. S

Re: [Lustre-discuss] Single client performance

2010-03-12 Thread Andreas Dilger
ats. For single-threaded IO, TCP + user->kernel data copy overhead can saturate a single core, leaving other cores idle. Running with multiple IO threads, and using an RDMA-capable network (IB is the most popular) will definitely avoid the CPU bottleneck. Cheers, And

Re: [Lustre-discuss] Trouble compiling lustre sources

2010-03-11 Thread Andreas Dilger
;> I have since abandoned this attempt and looking to down-rev to >>> EL5u3. >>> >> >> Correct, you have to build against a supported kernel. >> >> Nico >> > > > ___ > Lus

Re: [Lustre-discuss] RPC limitation

2010-03-10 Thread Andreas Dilger
al Message- > From: andreas.dil...@sun.com [mailto:andreas.dil...@sun.com] On > Behalf Of Andreas Dilger > Sent: Friday, March 05, 2010 2:05 AM > To: Jeffrey Bennett > Cc: oleg.dro...@sun.com; lustre-discuss@lists.lustre.org > Subject: Re: [Lustre-discuss] One or two OSS, no

Re: [Lustre-discuss] inode size with mkfs

2010-03-10 Thread Andreas Dilger
ext4 by default, so I imagine that they are using 256-byte inodes with GRUB. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Question regarding caution statement in 1.8 manual for the consistent mode flock option

2010-03-05 Thread Andreas Dilger
ch has minimal performance impact, but that was confusing to applications. The "noflock" default now reports an error as you saw and it is up to the administrator to pick either "localflock" (fastest, low impact, not coherent between nodes) or "flock" (slower, per

Re: [Lustre-discuss] One or two OSS, no difference?

2010-03-05 Thread Andreas Dilger
t;> random IOPS when using one OSS or two OSS? A quick test with "dd" >>>> also shows the same MB/sec when using one or two OSTs. >>> >>> I wonder if you just don't saturate even one OST (both backend SSD >>> and IB interconnect) with this nu

Re: [Lustre-discuss] Problem with flock and perl on Lustre FS

2010-03-05 Thread Andreas Dilger
> DB<6> flock(FOO, LOCK_EX) || die "SHIE: $!" > SHIE: Function not implemented at (eval > 10)[/usr/lib/perl5/5.10.0/perl5db.pl:638] line 2. Search the list or manual for "-o flock", "-o localflock", and "-o noflock" mount options for the cl

Re: [Lustre-discuss] problems restoring from MDT backup (test file system)

2010-03-05 Thread Andreas Dilger
ery > time I create a new archive it seems to be broken at the same place. > > Other tar files created on the same machine don't have that problem, > but > I'll try creating a new archive with a new executable. Make sure you use "--sparse" so tha

Re: [Lustre-discuss] One or two OSS, no difference?

2010-03-05 Thread Andreas Dilger
t;> further? >> Increasing maximum number of in-flight rpcs might help in that case. >> Also are all of your clients writing to the same file or each >> client does io to a separate file (I hope)? >> >> Bye, >> Oleg > > ___ > Lustre-discuss mailing list > Lustre-discuss@lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] ext3 and over 8Tb OSTs

2010-03-05 Thread Andreas Dilger
ver_16tb" option). I wonder if it makes sense to just disable this option for the ext3- based ldiskfs and require that anyone using > 8TB OSTs use the ext4- based ldiskfs? That would avoid any confusion/problems as above. Cheers, Andreas -- Andreas Dilger Sr. Sta

Re: [Lustre-discuss] Curious about iozone findings of new Lustre FS

2010-03-03 Thread Andreas Dilger
s from 2 OSS nodes. Also, what is the interconnect on the client? If you are using a single 10GigE then 1GB/s is as fast as you can possibly write large files to the OSTs, regardless of the striping. Cheers, Andreas -- Andreas Dilger Sr. Staff Eng

Re: [Lustre-discuss] Unbalanced OST--for discussion purposes

2010-03-02 Thread Andreas Dilger
T/crew8- > OST0010 > /dev/sdk2 6.3T 3.8T 2.2T 64% /srv/lustre/OST/crew8- > OST0011 Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Restricting 'lfs setstripe' command + kernelpanic

2010-02-26 Thread Andreas Dilger
e process thread to OOPS due to the NULL dereference, and that thread will hang, or possibly exit, but it shouldn't cause any serious problems. I've filed bug 22187 for this, thanks for reporting it. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of

Re: [Lustre-discuss] Not updated mtime after editted

2010-02-25 Thread Andreas Dilger
010-02-25 23:34:11.0 You are free to look through the lustre/ChangeLog to see which bug(s) contained fixes for this problem, but the above shows it is at least fixed in 1.8.2. > At 10/02/26(金)13:25, Andreas Dilger wrote: >> On 2010-02-25, at 20:17, Satoshi Isono wrote: >>

Re: [Lustre-discuss] Better friendly Linux dist on Lustre

2010-02-25 Thread Andreas Dilger
kernel versions. > At 10/02/26(金)13:27, Andreas Dilger wrote: >> On 2010-02-25, at 20:26, Satoshi Isono wrote: >>> I have short question to you. When we choose Linux distribution like >>> RHEL, SLES, CentOS or etc, to use Lustre, which one do you >>> recommend?

Re: [Lustre-discuss] Better friendly Linux dist on Lustre

2010-02-25 Thread Andreas Dilger
istributiion? The majority of sites use RHEL or CentOS. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/lis

Re: [Lustre-discuss] Not updated mtime after editted

2010-02-25 Thread Andreas Dilger
est that you upgrade to a newer version of Lustre. There were a number of mtime fixes, along with hundreds of other bug fixes since 1.6.5. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lu

Re: [Lustre-discuss] Lustre 1.8.2 OST pool problem

2010-02-25 Thread Andreas Dilger
s ENODEV > Of course I am on a mounted Lustre fs. > I wonder if anyone else has the same problem? Is there any known > solution/workaround for this problem? I haven't heard of anything similar. Can you please file a bug with details (including relevant /var/log/messages outpu

Re: [Lustre-discuss] Unbalanced OSTs

2010-02-18 Thread Andreas Dilger
til 1.8.2 where it goes to 16TB (enough for a tier of 2TB disks). Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] 16T LUNs

2010-02-10 Thread Andreas Dilger
On 2010-02-10, at 17:29, David Simas wrote: > On Wed, Feb 10, 2010 at 02:41:55PM -0700, Andreas Dilger wrote: >> >> - primarily, the upstream e2fsprogs does not yet have full support >> for >> > 16TB filesystems, and while experimental patches exist there >

Re: [Lustre-discuss] 16T LUNs

2010-02-10 Thread Andreas Dilger
f improvements to speed up e2fsck time, there is a limit to what can be done with this. >> -Original Message- >> From: andreas.dil...@sun.com [mailto:andreas.dil...@sun.com] On >> Behalf > Of >> Andreas Dilger >> Sent: Tuesday, February 09, 2010 7:13 PM

Re: [Lustre-discuss] 16T LUNs

2010-02-09 Thread Andreas Dilger
is. I'll let them speak for themselves. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] BAD last_transno problem

2010-02-09 Thread Andreas Dilger
oops, but it won't fix the transno error. You can mount the OST filesystem as ldiskfs and delete the "last_rcvd" file to clear the transno Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Download side down?

2010-02-05 Thread Andreas Dilger
was redirected over to the Oracle webserver some of the links were broken. The URL to use for now is http://www.sun.com/download/index.jsp?tab=2&check_1=on Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___

Re: [Lustre-discuss] Backup of trusted extended attributes

2010-02-05 Thread Andreas Dilger
> Or perhaps it's linked with the MDS backup, in case of full > desaster / recovery. If you are doing a MDT-filesystem backup on an ldiskfs-type mount of the MDT, then it is critical to back up the trusted.lov attributes, or your filesystem will contain no

Re: [Lustre-discuss] Recommended segment size for MDS raid

2010-02-03 Thread Andreas Dilger
r IOs to the disk don't go significantly faster. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Reply High difference in I/O network traffic in lustre client

2010-02-01 Thread Andreas Dilger
d transit > direction > > in our lustre client - web server ? > > > > > > i'm really stressed with poor performance in our storage system > and hope > > anyone here can help me point out some thing > > > > Any help would be highly app

Re: [Lustre-discuss] building example taken from manual page, api, 1.8.1.1

2010-01-31 Thread Andreas Dilger
tre/lustre/include \ >-I/usr/src/modules/lustre/lustre/include/lustre \ >-L/usr/src/modules/lustre/lnet/utils \ >-llustre -llustreapi -lncurses -lreadline \ > -lnetsnmpagent -lnetsnmphelpers -lnetsnmpmibs -lnetsnmp \ >-o lopenex lopenex.c You don't need mos

Re: [Lustre-discuss] full ost

2010-01-31 Thread Andreas Dilger
ctually _reserve_ that space, so if multiple nodes are writing huge files and there isn't enough space in the filesystem, you can still run out of space. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___

Re: [Lustre-discuss] Lustre SLES 11 Clients - How to deal with updates

2010-01-31 Thread Andreas Dilger
s and kernel-ib instead of using the sun > provided rpm's. We will have to compile every time we upgrade our > servers. You don't need to patch the kernel to build clients. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group

Re: [Lustre-discuss] On which OSS are the OST?

2010-01-25 Thread Andreas Dilger
tems mounted on this node?' >return 2 > >if opts.dir: >fs_uuid = path_to_fs_uuid(opts.dir) > if not fs_uuid: >print '"'+opts.dir+'" is not a lustre filesystem' >return 3 >fs_uuids =

Re: [Lustre-discuss] MDS crashes daily at the same hour

2010-01-24 Thread Andreas Dilger
I can't find it > in bugzilla. Would you like /tmp/lustre-log.* too? If they are call traces due to the watchdog timer, then this is somewhat expected for extremely high load. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre

Re: [Lustre-discuss] MDS crashes daily at the same hour

2010-01-22 Thread Andreas Dilger
On 2010-01-06, at 04:25, David Cohen wrote: > On Monday 04 January 2010 20:42:12 Andreas Dilger wrote: >> On 2010-01-04, at 03:02, David Cohen wrote: >>> I'm using a mixed environment of 1.8.0.1 MDS and 1.6.6 OSS's (had a >>> problem with qlogic drivers and

Re: [Lustre-discuss] e2fsck claims OST is clean, then dirty

2010-01-21 Thread Andreas Dilger
rs. Could it be > because the OST is active? (Log attached) > Then I ran it again and it e2fsck reported that the OST was clean. Checking a mounted filesystem is always at risk of producing inconsistent results. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun M

Re: [Lustre-discuss] Cluster Status Confusion - lctl dl

2010-01-19 Thread Andreas Dilger
of the OSTs, or read from /proc/fs/lustre/lov/*/target_obds (which contains the same data). Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Clustered MDS & OSS Servers

2010-01-19 Thread Andreas Dilger
MDS failover nodes itself. > b) On the oss's there is no need for a virtual IP that would need to > fail over in an outage. I would simply have heartbeat mount the > filesystems on the other OSS node. Cheers, Andreas -- Andreas Dilger Sr. Staff Enginee

Re: [Lustre-discuss] Kernel Panic on MDS

2010-01-18 Thread Andreas Dilger
On 2010-01-18, at 23:09, Wojciech Turek wrote: > Thanks Andreas for quick answer. So upgrading to a newer version of > colletcl should fix it? No, it is a Lustre bug, not collectl. I think a newer version of Lustre has fixes in lprocfs to avoid such races. > 2010/1/18 Andreas Dilge

Re: [Lustre-discuss] Kernel Panic on MDS

2010-01-18 Thread Andreas Dilger
{vfs_read+207} >{sys_read+69} >{system_call+126} This looks like collectl reading from a /proc entry after it was cleaned up. I think several such bugs were already fixed. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group S

Re: [Lustre-discuss] Fw: Re: Unable to activate OST

2010-01-14 Thread Andreas Dilger
reading and testing. I found by > naming things uniquely helped me clarify what was actually > required. Try calling your filesystem "Dusty" or > "Mark" and that should make things clearer for you. > > --- On Thu, 1/14/10, Andreas Dilger wrote: >>

Re: [Lustre-discuss] Unable to activate OST

2010-01-14 Thread Andreas Dilger
ut network configuration. I suspect the .0.2 network is not your eth0 network interface, and your modprobe.conf needs to be fixed. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mail

Re: [Lustre-discuss] Unable to activate OST

2010-01-14 Thread Andreas Dilger
Is the MGS running? There is probably an error in /var/log/messages and/or "dmesg" that will tell you what is going wrong. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss

Re: [Lustre-discuss] No space left on device for just one file

2010-01-12 Thread Andreas Dilger
6] scratch-OST0011_UUID >> 366288896 >> 5409560 3608793361% /lustre/scratch[OST:17] scratch-OST0012_UUID >> 366288896 5369406 3609194901% /lustre/scratch[OST:18] >> scratch-OST0013_UUID 366288896 5502974 3607859221% >> /lustre/scratch[OST:19] scrat

Re: [Lustre-discuss] No space left on device for just one file

2010-01-11 Thread Andreas Dilger
to know the best way, short of >>> taking the filesystem offline, to fix this problem. >>> >>> Any ideas? Thanks in advance, >>> Mike Robbert >>> ___ >>> Lustre-discuss mailing list >>> Lustre-

Re: [Lustre-discuss] odd ost disconnects during production

2010-01-08 Thread Andreas Dilger
d to say where the "o8" (OST_CONNECT) RPC is being sent, but I suspect the debug message is slightly incorrect (i.e. a minor code bug) because it has no connection from which to get this information. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems

Re: [Lustre-discuss] client I/O

2010-01-07 Thread Andreas Dilger
> last line of the panic reads: > > RIP [ Mike Robbert > > On Dec 4, 2009, at 11:39 PM, Andreas Dilger wrote: > >> On 2009-12-04, at 20:18, Mag Gam wrote: >>> Is it possible to figure out what client is taking up the most I/ >>> O? We >>> have 8

Re: [Lustre-discuss] e2scan wrong file list mtime/ctime

2010-01-07 Thread Andreas Dilger
0 00 f5 cd 0c 00 00 00 00 > 00 00 00 00 00 00 00 00 00 00 > 00 00 00 03 00 00 00 f5 cd 0c 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00 02 00 00 00 " (22 > 4) > BLOCKS: > > > Thanks for your kindness > > Andrea > > > > Andreas Dilger wrote: &g

Re: [Lustre-discuss] Enqueue wait from MDS log

2010-01-06 Thread Andreas Dilger
but I don't know when that was done, so it might not appear until 1.8.2. > On Wed, Jan 6, 2010 at 5:22 PM, Andreas Dilger > wrote: >> On 2010-01-06, at 01:42, Tung Dam wrote: >> I have an issue with lustre log from our MDS, like this: >> >> Jan 6 14:00:

Re: [Lustre-discuss] Enqueue wait from MDS log

2010-01-06 Thread Andreas Dilger
re. In particular, with FLK (flock) type locks, they can be held indefinitely, so there is no reason to print a message at all. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discu

<    4   5   6   7   8   9   10   11   12   13   >