I've used a SLES kernel on an FC install for a long time on my home
system. With newer distros there are also fewer changes to the base
kernel, so there shouldn't be as much trouble to use e.g. the SLES 11
SP1 kernel (2.6.32) when it is released.
Cheers, Andreas
On 2010-05-19, at 6:01, Heik
e sun src patches are still missing in the lustre AND
> e2fsprogs branches.
I'm not sure what you mean. The e2fsprogs patches have always been in a
separate repository from the core Lustre code, and all of the Lustre/ldiskfs
kernel patches are in the Git repository.
Cheers, Andreas
--
e journal inode with "tune2fs -j /dev/XXX".
Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
On 2010-05-14, at 10:41, Adeyemi Adesanya wrote:
> We are are about to install a new Lustre 1.8.2 installation with ~1PB
> of filesystem space. We have to make a decision regarding the MDT
> storage and someone suggested that in the event we run out of inodes
> on the MDT, using resize2fs wou
On 2010-05-13, at 04:38, Frederik Ferner wrote:
> Andreas Dilger wrote:
>> On 2010-05-12, at 06:15, Frederik Ferner wrote:
>>> we are having problems with ACLs at the moment. As far as we understand
>>> this is what has happened.
>>>
>>> We have a dir
, since the RDMA
reply buffers have to be allocated before the client knows how many ACLs are
stored on the file.
Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.
___
Lustre-discuss mailing list
Lustre-discuss@l
etdump to get
the actual error messages on the console when it hangs. Doing "sysrq-p" or
"sysrq-t" to see if it is stuck in some thread, if there are no error messages
on the console.
Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.
__
r on the server, or both.
> Any comments to this? Does it basically work to have 1.8 Clients and 1.6
> Servers?
We do test 1.8.latest with 1.6.latest whenever we make a release.
> On 5/11/2010 9:56 AM, Andreas Dilger wrote:
>> On 2010-05-10, at 15:59, Greg Mason wrote:
>>&
6.7 clients?
I know a few sites are currently going through the same process on the servers,
and I expect they have to run with clients at 1.6 for at least a short time
before they upgrade to 1.8 due to complex environments that don't allo
On 2010-05-07, at 05:12, Frederik Ferner wrote:
> Andreas Dilger wrote:
>> On 2010-05-06, at 11:57, Frederik Ferner wrote:
>>> On our Lustre system we are seeing the following error fairly
>>> regularly, so far we have not had complaints from users and have
>>&
but I don't know at all.
> Does anyone know if we should worry about those messages or if we can
> safely ignore them? Or should we assume that some of our users might
> have a problem accessing data that they have just not reported? Even
> though I find that unlikely.
Cheers
On 2010-04-28, at 7:44, Gary Molenkamp wrote:
> When I create the MDS, I specified '-i 1024' and I can see (locally)
> 800M inodes, but only part of the available space is allocated.
This is to be expected. There needs to be free space on the MDS for
directories, striping and other internal us
This means that your OST is not available. Maybe it is nor mounted?
Cheers, Andreas
On 2010-04-27, at 19:38, Brian Andrus wrote:
> On 4/27/2010 6:10 PM, Oleg Drokin wrote:
>> Hello!
>>
>> On Apr 27, 2010, at 7:29 PM, Brian Andrus wrote:
>>
>>> Apr 27 16:15:19 nas-0-1 kernel: LustreError: 4133:0
ing.
>>>
>>> Have you had any hardware failures?
>>> If yes, how well has the cluster cooped with the loss of the machine(s)?
>>>
>>>
>>> Any advice you can share from your initial setup of lustre?
>>
>> ________
e we can't compile 1.6.7.2 on 2.6.32 and 2.0 is still not
> in a production state.
There is work going on in bugzilla for b1_8 SLES11 SP1(?) kernel support, which
will hopefully also be usable for RHEL6, when it is available.
Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corpor
The missing logdile problem is easily fixed - delete the CATALOGS file
on the MDT and restart. There is a bug just opened to handle this
better, but it isn't fixed yet.
Cheers, Andreas
On 2010-04-26, at 7:00, Thomas Roth wrote:
> Hi all,
>
> one of our OSTs crushed - actually we ran into Bu
disk.
One possibility is that you have open files that are holding this space in use.
If you unmount the MDT (use "umount -f", which will evict all of the clients,
though this will cause applications to see IO errors, if that is acceptable)
and mount it again, does the sp
t; MDS/MGS on 880G logical drive:
>> mkfs.lustre --fsname gulfwork --mdt --mgs --mkfsoptions='-i 1024'
>> --failnode=10.18.12.1 /dev/sda
>>
>> OSSs on 9.1TB logical drives:
>> /usr/sbin/mkfs.lustre --fsname gulfwork --ost --mgsnode=10.18.1...@tcp
>> --mgsnode=10.18.1...@t
..@o2ib
> Added uuid OSS_UUID: 192.168.11...@o2ib
> Target OST name is 'lustre-OST-osc'
> loadgen> st 3
> start 0 to 3
> loadgen: running thread #1
> Segmentation fault
>
>
> Meet same error on both OSS-es and client using any number of clients.
I believ
into the future.
> If using lusterfs in a production environment, it would be good to know
> that it won't be discontinued.
>
> Will there be a long term future for lusterfs?
Yes.
Cheers, Andreas
--
Andreas Dilger
Principal Enginee
expected some sort of failure message unless it is not reaching it at
> all.
I suspect you need to rewrite the filesystem configuration to include these new
interfaces. I believe there is a section in the manual on how to correctly
change network interfaces.
Cheers, Andreas
--
Andreas
. They also noticed that the swap space kept
climbing even though there was plenty of free memory on the
system. Could this possibly be related to the lustre client? Does
it reserve any memory that is not accessible by any other process
even though it might not be in use?
Cheers, Andreas
ystem, one in the "lustre00" filesystem, so it seems you
have some sort of a configuration problem.
Cheers, Andreas
--
Andreas Dilger
Principal Engineer, Lustre Group
Oracle Corporation Canada Inc.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
There is a known problem with the DLM LRU size that may be affecting
you. It may be something else too. Please check /proc/
{slabinfo,meminfo} to see what is using the memory on the client.
Cheers, Andreas
On 2010-04-19, at 10:43, Jagga Soorma wrote:
> Hi Guys,
>
> My users are reporting som
d gone anyway.
What error messages are posted on the console log (dmesg/syslog)?
Cheers, Andreas
--
Andreas Dilger
Principal Engineer, Lustre Group
Oracle Corporation Canada Inc.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
33
to see what line it is. This Oops shouldn't be happening, even if the
journal has aborted.
Cheers, Andreas
--
Andreas Dilger
Principal Engineer, Lustre Group
Oracle Corporation Canada Inc.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
t at runtime?
The manual is incorrect in this case. The correct limit is 65536
bytes, not 4096. It _used_ to be 4096 bytes, but since Linux supports
client PAGE_SIZE up to 65536 bytes, and the VM cannot partially dirty
a page, we do not support a stripe size that is smaller than a single
any way to find out more info about it. e.g. filesystem,
> filename and lustre client that are related to this error?
> c) is there any way to resolve this errors?
>
Cheers, Andreas
--
Andreas Dilger
Principal Engineer, Lustre Group
Oracle Corporation Canada Inc.
_
________
> Lustre-discuss mailing list
> Lustre-discuss@lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
Cheers, Andreas
--
Andreas Dilger
Principal Engineer, Lustre Group
Oracle Corporation Canada Inc.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
uot; causes
files to be unlinked after cloning so they will be reconnected to
/lost+found in pass 3. "delete" skips cloning entirely and simply
deletes the files.
You probably want to use the "-E shared=delete" option.
Cheers, Andreas
-
2097152) = 2097152
>
> As Andreas suspected, your application is doing 2MB reads every time.
> Does it really need 2MB of data on each read? If not, can you fix
> your
> application to only read as much data as it actually wants?
Cheers, Andreas
--
Andreas Dilger
Principal En
data from the file was
seeking and reading 2MB of extra data for each seek.
It would be worthwhile to strace your application to see if it is
doing the same thing.
> Andreas Dilger wrote:
>> On 2010-04-07, at 14:09, Ronald K Long wrote:
>> > I am having an issue with our lustre
On 2010-04-12, at 15:11, Norberto Meijome wrote:
> On 12 April 2010 10:15, Andreas Dilger
> wrote:
>> I would suggest to keep the OST size uniform that you migrate the
>> existing OSTs to the new 600GB drive LUNs then combine pairs of (now
>> unused) 300GB LUNs into do
re, and mount it as type ldiskfs, do a backup of the filesystem,
then delete the configuration file for the old filesystem that you
want to re-use. This should be in the CONFIGS/ subdirectory, IIRC.
Cheers, Andreas
--
Andreas Dilger
Principal Engineer, Lustre Group
Oracl
I would suggest to keep the OST size uniform that you migrate the
existing OSTs to the new 600GB drive LUNs then combine pairs of (now
unused) 300GB LUNs into double-sized OSTs to match the new ones.
While the MDS will handle different-sized OSTs OK, it isn't the ideal
situation.
Cheers, An
an expensive
operation. Using SEEK_CUR or SEEK_SET has no cost at all.
> Are there any tunable parameter in lustre that can alleviate this
> problem?
It depends on what the problem really is.
Cheers, Andreas
--
Andreas Dilger
Principal Engineer, Lustre Group
Oracle Corp
will stripe across all "available"
OSTs, which should skip the full OST. If you specify stripe_count=N,
where N = number of OSTs (including the full one) then the allocation
will fail.
Cheers, Andreas
--
Andreas Dilger
Principal Engineer, Lustre Gro
Lustre (for basic functionality) on
a laptop running a single virtual machine. No extra hardware required.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
___
Lustre-discuss mailing list
Lu
used why it's attempting the o2ib NID repeatedly and never
> tries the tcp NID... Ideas?
A common cause for newly-installed systems is hosts.deny or firewall
rules that are preventing connections on port 988.
Cheers, Andreas
--
Andreas Dilger
Sr. Staf
.
We are adding an llapi_get_param() interface for a future release of
Lustre, but it wouldn't be too hard for someone to create a wrapper
for this in 1.8.x either,
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
kernel debug logs for this
failure. If there was an RPC timeout during connection (e.g. if the
OST is slow to respond) then that should have produced an earlier
console error. If the above operation is failing before trying to
connect to the OST, then that should be fixed.
Cheers, Andreas
--
Andrea
heers,
>>
>> Wojciech
>
>
> On 20 March 2010 05:46, Andreas Dilger wrote:
> On 2010-03-19, at 08:56, Wojciech Turek wrote:
> Thanks for a quick answer. I have tried to compile lustre from the
> b1_8 branch but build process failed at the same place, so I guess
&g
On 2010-03-25, at 15:12, Andreas Dilger wrote:
>> The llapi_* functions are great, I see how to set the stripe count
>> and size. I wasn't sure if there was also a function to query about
>> the configuration, eg number of OST's deployed?
>
> There isn't d
Cray, SGI). The MPI hints will only be useful on implementation
> that support the particular hint. From a consistency point of view
> we need to both make use of MPI hints and direct access via the
> llapi so that we run well on all those systems, regardless of which
> MPI imp
n optimize these
things for you, based on application hints.
If you could elaborate on your needs, there may not be any need to
make your application more Lustre-aware.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
gt; [] ll_file_aio_read+0xf1a/0x2350 [lustre]
> [] ll_file_read+0xb9/0xd0 [lustre]
> [] vfs_read+0xaa/0x133
> [] sys_read+0x45/0x6e
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
___
Lustre-di
7;t find anything related to that particular
> problem. Do you maybe recall a BUG number?
Bug 21500. I found it by searching for blk_queue_hardsect_size in
attachments, patches only, for bugs that changed in the last 120 days
(to avoid searching very old bugs).
> On 18 March 2010 22:55
cale catastrophic events...
If the filesystem is damaged and you need to run e2fsck on it, then
modifying the filesystem by trying to drain the files from the OST is
a bad idea. You should minimize the amount of changes made to the
filesystem before you can run e2fsck on it.
Cheers, Andreas
c/lustre-1.8.2'
> make: *** [all] Error 2
Please try the latest b1_8 Git repo and/or search bugzilla. I believe
this is already fixed for 1.8.3, but it may still only be attached to
a bug.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, In
preempt kernel. Will this fix make it into
> mainline?
I've submitted bug 22409 with this patch, though I've updated the
comment. I can't say for sure which release it will be in, but I
don't see a big barrier to accepting it in short order.
Cheers, Andreas
--
Andreas
gt; http://bugzilla.kernel.org/show_bug.cgi?id=12518
>
> Just need to figure out if the fix can be backported to 2.6.27.39
I think that is a different bug.
In lnet/libcfs/tracefile.c::ibcfs_debug_vmsg2() you could try moving
set_ptldebug_header() after the call to trace_get_tcd(),
blem.
"lfs df" should behave like "df" in this respect, printing the stats
for all of the filesystems. I've filed bug 22327 for this issue, and
it already has a patch for the fix. The only affected release is 1.8.2.
Cheers, Andreas
--
Andreas Dilger
Sr. S
ats. For single-threaded IO, TCP + user->kernel
data copy overhead can saturate a single core, leaving other cores
idle. Running with multiple IO threads, and using an RDMA-capable
network (IB is the most popular) will definitely avoid the CPU
bottleneck.
Cheers, And
;> I have since abandoned this attempt and looking to down-rev to
>>> EL5u3.
>>>
>>
>> Correct, you have to build against a supported kernel.
>>
>> Nico
>>
>
>
> ___
> Lus
al Message-
> From: andreas.dil...@sun.com [mailto:andreas.dil...@sun.com] On
> Behalf Of Andreas Dilger
> Sent: Friday, March 05, 2010 2:05 AM
> To: Jeffrey Bennett
> Cc: oleg.dro...@sun.com; lustre-discuss@lists.lustre.org
> Subject: Re: [Lustre-discuss] One or two OSS, no
ext4 by default, so I imagine that they are using 256-byte inodes with
GRUB.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
ch has
minimal performance impact, but that was confusing to applications.
The "noflock" default now reports an error as you saw and it is up to
the administrator to pick either "localflock" (fastest, low impact,
not coherent between nodes) or "flock" (slower, per
t;> random IOPS when using one OSS or two OSS? A quick test with "dd"
>>>> also shows the same MB/sec when using one or two OSTs.
>>>
>>> I wonder if you just don't saturate even one OST (both backend SSD
>>> and IB interconnect) with this nu
> DB<6> flock(FOO, LOCK_EX) || die "SHIE: $!"
> SHIE: Function not implemented at (eval
> 10)[/usr/lib/perl5/5.10.0/perl5db.pl:638] line 2.
Search the list or manual for "-o flock", "-o localflock", and "-o
noflock" mount options for the cl
ery
> time I create a new archive it seems to be broken at the same place.
>
> Other tar files created on the same machine don't have that problem,
> but
> I'll try creating a new archive with a new executable.
Make sure you use "--sparse" so tha
t;> further?
>> Increasing maximum number of in-flight rpcs might help in that case.
>> Also are all of your clients writing to the same file or each
>> client does io to a separate file (I hope)?
>>
>> Bye,
>> Oleg
>
> ___
> Lustre-discuss mailing list
> Lustre-discuss@lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
ver_16tb" option).
I wonder if it makes sense to just disable this option for the ext3-
based ldiskfs and require that anyone using > 8TB OSTs use the ext4-
based ldiskfs? That would avoid any confusion/problems as above.
Cheers, Andreas
--
Andreas Dilger
Sr. Sta
s from 2 OSS nodes.
Also, what is the interconnect on the client? If you are using a
single 10GigE then 1GB/s is as fast as you can possibly write large
files to the OSTs, regardless of the striping.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Eng
T/crew8-
> OST0010
> /dev/sdk2 6.3T 3.8T 2.2T 64% /srv/lustre/OST/crew8-
> OST0011
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
e process thread to
OOPS due to the NULL dereference, and that thread will hang, or
possibly exit, but it shouldn't cause any serious problems.
I've filed bug 22187 for this, thanks for reporting it.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of
010-02-25 23:34:11.0
You are free to look through the lustre/ChangeLog to see which bug(s)
contained fixes for this problem, but the above shows it is at least
fixed in 1.8.2.
> At 10/02/26(金)13:25, Andreas Dilger wrote:
>> On 2010-02-25, at 20:17, Satoshi Isono wrote:
>>
kernel versions.
> At 10/02/26(金)13:27, Andreas Dilger wrote:
>> On 2010-02-25, at 20:26, Satoshi Isono wrote:
>>> I have short question to you. When we choose Linux distribution like
>>> RHEL, SLES, CentOS or etc, to use Lustre, which one do you
>>> recommend?
istributiion?
The majority of sites use RHEL or CentOS.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/lis
est that you upgrade to a newer version of Lustre. There were a
number of mtime fixes, along with hundreds of other bug fixes since
1.6.5.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
___
Lu
s ENODEV
> Of course I am on a mounted Lustre fs.
> I wonder if anyone else has the same problem? Is there any known
> solution/workaround for this problem?
I haven't heard of anything similar. Can you please file a bug with
details (including relevant /var/log/messages outpu
til 1.8.2 where it goes to 16TB (enough for a tier
of 2TB disks).
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
On 2010-02-10, at 17:29, David Simas wrote:
> On Wed, Feb 10, 2010 at 02:41:55PM -0700, Andreas Dilger wrote:
>>
>> - primarily, the upstream e2fsprogs does not yet have full support
>> for
>> > 16TB filesystems, and while experimental patches exist there
>
f improvements to speed up e2fsck time, there is a limit
to what can be done with this.
>> -Original Message-
>> From: andreas.dil...@sun.com [mailto:andreas.dil...@sun.com] On
>> Behalf
> Of
>> Andreas Dilger
>> Sent: Tuesday, February 09, 2010 7:13 PM
is. I'll
let them speak for themselves.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
oops, but it won't fix the transno error. You can mount the OST
filesystem as ldiskfs and delete the "last_rcvd" file to clear the
transno
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
was redirected over to the Oracle webserver some of the links
were broken. The URL to use for now is
http://www.sun.com/download/index.jsp?tab=2&check_1=on
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
___
> Or perhaps it's linked with the MDS backup, in case of full
> desaster / recovery.
If you are doing a MDT-filesystem backup on an ldiskfs-type mount of
the MDT, then it is critical to back up the trusted.lov attributes, or
your filesystem will contain no
r IOs to the disk don't go
significantly faster.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
d transit
> direction
> > in our lustre client - web server ?
> >
> >
> > i'm really stressed with poor performance in our storage system
> and hope
> > anyone here can help me point out some thing
> >
> > Any help would be highly app
tre/lustre/include \
>-I/usr/src/modules/lustre/lustre/include/lustre \
>-L/usr/src/modules/lustre/lnet/utils \
>-llustre -llustreapi -lncurses -lreadline \
> -lnetsnmpagent -lnetsnmphelpers -lnetsnmpmibs -lnetsnmp \
>-o lopenex lopenex.c
You don't need mos
ctually _reserve_ that space, so if multiple nodes are
writing huge files and there isn't enough space in the filesystem, you
can still run out of space.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
___
s and kernel-ib instead of using the sun
> provided rpm's. We will have to compile every time we upgrade our
> servers.
You don't need to patch the kernel to build clients.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
tems mounted on this node?'
>return 2
>
>if opts.dir:
>fs_uuid = path_to_fs_uuid(opts.dir)
> if not fs_uuid:
>print '"'+opts.dir+'" is not a lustre filesystem'
>return 3
>fs_uuids =
I can't find it
> in bugzilla. Would you like /tmp/lustre-log.* too?
If they are call traces due to the watchdog timer, then this is somewhat
expected for extremely high load.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre
On 2010-01-06, at 04:25, David Cohen wrote:
> On Monday 04 January 2010 20:42:12 Andreas Dilger wrote:
>> On 2010-01-04, at 03:02, David Cohen wrote:
>>> I'm using a mixed environment of 1.8.0.1 MDS and 1.6.6 OSS's (had a
>>> problem with qlogic drivers and
rs. Could it be
> because the OST is active? (Log attached)
> Then I ran it again and it e2fsck reported that the OST was clean.
Checking a mounted filesystem is always at risk of producing
inconsistent results.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun M
of the OSTs, or read from /proc/fs/lustre/lov/*/target_obds
(which contains the same data).
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
MDS failover
nodes itself.
> b) On the oss's there is no need for a virtual IP that would need to
> fail over in an outage. I would simply have heartbeat mount the
> filesystems on the other OSS node.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Enginee
On 2010-01-18, at 23:09, Wojciech Turek wrote:
> Thanks Andreas for quick answer. So upgrading to a newer version of
> colletcl should fix it?
No, it is a Lustre bug, not collectl. I think a newer version of
Lustre has fixes in lprocfs to avoid such races.
> 2010/1/18 Andreas Dilge
{vfs_read+207}
>{sys_read+69}
>{system_call+126}
This looks like collectl reading from a /proc entry after it was cleaned
up. I think several such bugs were already fixed.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
S
reading and testing. I found by
> naming things uniquely helped me clarify what was actually
> required. Try calling your filesystem "Dusty" or
> "Mark" and that should make things clearer for you.
>
> --- On Thu, 1/14/10, Andreas Dilger wrote:
>>
ut network configuration. I
suspect the .0.2 network is not your eth0 network interface, and your
modprobe.conf needs to be fixed.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
___
Lustre-discuss mail
Is the MGS running?
There is probably an error in /var/log/messages and/or "dmesg" that
will tell you what is going wrong.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
___
Lustre-discuss
6] scratch-OST0011_UUID
>> 366288896
>> 5409560 3608793361% /lustre/scratch[OST:17] scratch-OST0012_UUID
>> 366288896 5369406 3609194901% /lustre/scratch[OST:18]
>> scratch-OST0013_UUID 366288896 5502974 3607859221%
>> /lustre/scratch[OST:19] scrat
to know the best way, short of
>>> taking the filesystem offline, to fix this problem.
>>>
>>> Any ideas? Thanks in advance,
>>> Mike Robbert
>>> ___
>>> Lustre-discuss mailing list
>>> Lustre-
d to say where the
"o8" (OST_CONNECT) RPC is being sent, but I suspect the debug message
is slightly incorrect (i.e. a minor code bug) because it has no
connection from which to get this information.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems
> last line of the panic reads:
>
> RIP [ Mike Robbert
>
> On Dec 4, 2009, at 11:39 PM, Andreas Dilger wrote:
>
>> On 2009-12-04, at 20:18, Mag Gam wrote:
>>> Is it possible to figure out what client is taking up the most I/
>>> O? We
>>> have 8
0 00 f5 cd 0c 00 00 00 00
> 00 00 00 00 00 00 00 00 00 00
> 00 00 00 03 00 00 00 f5 cd 0c 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00 02 00 00 00 " (22
> 4)
> BLOCKS:
>
>
> Thanks for your kindness
>
> Andrea
>
>
>
> Andreas Dilger wrote:
&g
but I don't know when that was done, so it might not appear until 1.8.2.
> On Wed, Jan 6, 2010 at 5:22 PM, Andreas Dilger
> wrote:
>> On 2010-01-06, at 01:42, Tung Dam wrote:
>> I have an issue with lustre log from our MDS, like this:
>>
>> Jan 6 14:00:
re. In
particular, with FLK (flock) type locks, they can be held
indefinitely, so there is no reason to print a message at all.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
___
Lustre-discu
801 - 900 of 1572 matches
Mail list logo