We have a small luster setup with two ost on two oss servers and I'm
curious if moving to one ost per oss with 4 oss servers would increase
performance?
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listin
We are still running 1.8.4 back when Lustre was still hosted by Oracle, and
its been mostly stable except for a few bugs here and there that I see have
been fixed in the last 1.8.8 release from whamcloud. I'm wondering, can I
update the server side with the whamcloud's rpms without updating the
cli
In my experience, if there is a particular driver for multipathing from the
vendor, go for that. In our setup, we have Oracle/Sun disk arrays and with
the standard linux multipathing daemon, I would get lots of weird I/O
errors. Turns out the disk arrays had picked their preferred path, but
Linux w
We run a small cluster with a two node lustre setup, so its easy to
see when some program thrashes the file system. Not being a
programmer, what tools or methods could I use to monitor and log data
to help the developer regarding their io usage on lustre?
___
your thread count you'll need to reboot your system.
>
> Thanks,
> JF
>
>
> On Tue, Oct 9, 2012 at 6:00 PM, David Noriega wrote:
>>
>> Is this a parameter, ost.OSS.ost_io.threads_max, when set via lctl
>> conf_parm will persist between reboots/remounts?
Is this a parameter, ost.OSS.ost_io.threads_max, when set via lctl
conf_parm will persist between reboots/remounts?
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
, the clients reconnected and the system is happy
again.
On Mon, Jul 2, 2012 at 12:42 AM, David Noriega wrote:
> Sorry for the rushed email. For some reason the LVM metadata got
> screwed up, managed to restore it, though now running into another
> issue. I've mounted the OSTs yet it
re information about your setup. It sounds more
> like a RAID/disk issue than a Lustre issue.
>
>
> From: "David Noriega"
> To: lustre-discuss@lists.lustre.org
> Sent: Monday, 2 July, 2012 8:51:18 AM
> Subject: [Lustre-discuss] Lustr
Just recently used heartbeat to failover resources so that I could
power down a lustre node to add more ram and failed back to do the
same to our second lustre node. Only then do I find that now our
lustre install is missing a physical volume out of lvm. pvscan only
shows three out of four partitio
+0x11/0xb0
RSP
<0>Kernel panic -not syncing : Fatal exception
I dont see anything on the OSS or meta data nodes except for the I
think its dead I'm evicting it message.
--
David Noriega
System Administrator
Computational Biology Initiative
High Performance Computing Center
University
On our system, we typically have more reading then writing going on
and was wondering what are the best parameters to tune?
I have set lnet.debug to 0, and have increased max rpcs in flight as
well as dirty mb. I left lru_size dynamic as setting it didn't seem to
have any affect.
--
increasing at_min(currently 0) or at_max(currently 600) be best?
On Thu, Feb 2, 2012 at 12:07 PM, Andreas Dilger wrote:
> On 2012-02-02, at 8:54 AM, David Noriega wrote:
>> We have two OSSs, each with two quad core AMD Opterons and 8GB of ram
>> and two OSTs each(4.4T and 3.5T). B
On a side note, what about increasing the MDS service threads?
Checking that, its running at its max of 128.
On Thu, Feb 2, 2012 at 9:54 AM, David Noriega wrote:
> We have two OSSs, each with two quad core AMD Opterons and 8GB of ram
> and two OSTs each(4.4T and 3.5T). Backend storage is
1 (303) 519-0578
> ctho...@ddn.com | Skype ID: carlosthomaz
> DataDirect Networks, Inc.
> 9960 Federal Dr., Ste 100 Colorado Springs, CO 80921
> ddn.com <http://www.ddn.com/> | Twitter: @ddn_limitless
> <http://twitter.com/ddn_limitless> | 1.800.TERABYTE
>
>
>
>
HPC Systems Architect
>> Mobile: +1 (303) 519-0578
>> ctho...@ddn.com | Skype ID: carlosthomaz
>> DataDirect Networks, Inc.
>> 9960 Federal Dr., Ste 100 Colorado Springs, CO 80921
>> ddn.com <http://www.ddn.com/> | Twitter: @ddn_limitless
>> <http://twitt
at do these messages mean?
--
David Noriega
System Administrator
Computational Biology Initiative
High Performance Computing Center
University of Texas at San Antonio
One UTSA Circle
San Antonio, TX 78249
Office: BSE 3.112
Phone: 210-458-7100
http://www.cbi.uts
looks like you're
> timing out with overloaded network.
>
> -cf
>
>
> On 10/27/2011 10:08 AM, David Noriega wrote:
>> I get these errors, any ideas? Running Lustre 1.8.4. This client is
>> also the server where we nfs export the filesystem.
>>
>> Lus
I get these errors, any ideas? Running Lustre 1.8.4. This client is
also the server where we nfs export the filesystem.
LustreError: 4994:0:(dir.c:384:ll_readdir_18()) error reading dir
575283686/935610515 page 0: rc -110
LustreError: 11-0: an error occurred while communicating with
192.168.5.104@
How easy would it be to upgrade from 1.8.6 to 2.1? Would simply
dropping in the new packages be enough? Would it require downtime of
the whole system? Also could I have the servers move to 2.1 while
still having the clients at 1.8.6?
--
Personally, I liked the university. They gave us money and f
I think I'll add the lctl ping to a start up script as a workaround,
but any ideas why this is happening?
On Mon, Aug 29, 2011 at 10:26 AM, David Noriega wrote:
> I've begun to notice this behavor in my clients. Not sure whats going
> on, but when a client reboots, its unable to
I've begun to notice this behavor in my clients. Not sure whats going
on, but when a client reboots, its unable to mount lustre. I have to
use 'lctrl ping' to ping any of the lustre nodes before I'm able to
mount the lustre filesystem. Any ideas?
Lustre: OBD class driver, http://www.lustre.org/
Lu
ack of my mind.
Thanks,
David
On Wed, Jul 20, 2011 at 4:15 PM, Kevin Van Maren
wrote:
> David Noriega wrote:
>>
>> We already use multipathd in our install already, but this was
>> something I wondered about. We use Sun disk arrays and they mention
>> the use of t
We already use multipathd in our install already, but this was
something I wondered about. We use Sun disk arrays and they mention
the use of their RDAC driver to multipathing on Linux. Since its from
the vendor, one would think it be better. What does the collective
think?
Sun StorageTek RDAC Mul
Just installed a new node on the cluster, imaged just like the rest,
but it was unable to mount lustre on boot. I tried to mount but got
the following from dmesg:
Lustre: OBD class driver, http://www.lustre.org/
Lustre: Lustre Version: 1.8.4
Lustre: Build Version:
1.8.4-20100726215630-PRIS
I was checking out zfsonlinux.org to see how things have been going
lately and I had a question. Whats the difference, or whats better:
Use a hardware raid5(or 6) or use zfs to create a raidz pool? In terms
of Lustre, is one preferred over another?
David
--
Personally, I liked the university. Th
We are running lustre 1.8.4 and I can confirm that I see this message
on one of our clients, the 'file server.' It serves up the lustre fs
to machines outside our network via samba and nfs. On other
clients(nodes in our compute cluster), I see the same message on a few
times, though it says "-19" o
the kernel
> panic and the controller complaining about its BBU.
>
> Cheers,
> Thomas
>
> On 04/06/2011 04:58 PM, David Noriega wrote:
>> Our adaptec raid card is a Sun StorageTek RAID INT card, made by intel
>> of all people. So I installed the raid manager software, w
data has made it to the disks before the
>> crash seems to be quite sensible. Reboot and never buy Adaptec again.
>>
>> Cheers,
>> Thomas
>>
>> On 04/06/2011 07:03 AM, David Noriega wrote:
>>> Ok I updated the aacraid driver and the raid firmware, yet I sti
here.
>
> The firmware and system drivers usually have a utility that will check the
> current version and upgrade it for you.
>
> Hope this helps (I use different cards, so I can't tell you exactly).
>
> -Jason
>
> -Original Message-
> From: David Noriega [mai
ents, modulo the OST restart needed.
> cliffw
>
> On Tue, Apr 5, 2011 at 11:36 AM, David Noriega wrote:
>>
>> What about this example?
>> http://comments.gmane.org/gmane.comp.file-systems.lustre.user/6687
>>
>> Also to my second question, would these changes
all your nets are TCP, I think using standard networking methods will be
> better for you, simpler and easier to maintain.
> cliffw
>
> On Mon, Apr 4, 2011 at 6:50 PM, David Noriega wrote:
>>
>> The file server does sit on both networks, internal and external. I
>> wou
work as a router?
On Mon, Apr 4, 2011 at 3:43 PM, Cliff White wrote:
>
>
> On Mon, Apr 4, 2011 at 1:32 PM, David Noriega wrote:
>>
>> Reading up on LNET routing and have a question. Currently have nothing
>> special going on, simply specified tcp0(bond0) on the OSSs a
Reading up on LNET routing and have a question. Currently have nothing
special going on, simply specified tcp0(bond0) on the OSSs and MDS.
Same for all the clients as well, we have an internal network for our
cluster, 192.168.x.x. How would I go about doing the following?
Data1,Data2 = OSS, Meta1
Had some crazyness happen to our lustre system. We have two OSSs, both
identical sun x4140 servers and on only one of them have I've seen
this pop up in the kernel messages and then a kernel panic. The panic
seemed to then spread and caused the network to go down and the second
OSS to try to failov
t; that the stock RedHat kernels are compiled with too small of a stack
> size option and that running NFS and lustre on the same node will not
> behave well together. A minimum of a 8k stack size is needed for this
> configuration.
>
> -mb
>
> On Mar 11, 2011, at 12:37 PM
We've been running Lustre happily for a few months now, but we have
one client that can be troublesome at times and it happens to be the
most important client. Its our "file server" client as it runs NFS and
Samba. I'm not sure where to start. I've seen this client disconnect
from lustre nodes, but
Well we are running lustre 1.8.4, so thats great to hear. Thanks
On Thu, Mar 10, 2011 at 12:15 PM, Johann Lombardi wrote:
> On Thu, Mar 10, 2011 at 11:51:44AM -0600, David Noriega wrote:
>> I've been reading up on setting up quotas and looks like luster needs
>> to be shut
I've been reading up on setting up quotas and looks like luster needs
to be shut down for that as it scans the entire filesystem. The thing
is we already have ours up and running and with quite a bit of data on
it. So any idea on how to estimate how long it would be to setup
quotas on lustre?
Davi
I know the subject line isn't the best, but I don't know what to say
other then a luster client is acting up while others are fine. This
client is our 'file' server. It runs a nfs and samba server on top of
the lustre mount.
/etc/fstab
92.168.5.104@tcp0:192.168.5.105@tcp0:/lustre /lustre lustre
We've got the latest lustre running(1.8.4) and kernel
2.6.18-194.3.1.el5. I call it our primary client as it is what exposes
the file system for others to use via nfs/samba. Today the machine
seeminly rebooted on its own and checking the logs I see these
messages
Nov 22 12:25:52 cajal kernel: Lus
So then Samba isn't Lustre-aware in the sense it checks and respects quotas?
On Tue, Oct 5, 2010 at 7:18 AM, Johann Lombardi
wrote:
> Hi David,
>
> On Mon, Oct 04, 2010 at 12:09:21PM -0500, David Noriega wrote:
>> Moved our samba server to use Lustre as its backend file syste
Can I setup quotas after lustre is active? Or does that require taking
everything offline? Or could I just run "lfs quota on" and then start
setting quotas for every user? Will running this command on one client
then effect all of them? or do I have to run it everywhere? And is
there a way to notif
If I'm wrong please let me know, but my understanding of how lustre
1.8 works is metadata is only accessible from a single host. So should
there be alot of activity, the metadata server becomes a bottleneck.
But I've heard that in ver 2.x that we'll be able to setup multiple
machines for metadata j
Moved our samba server to use Lustre as its backend file system and
things look like they are working, but I'm seeing the following
message repeat over and over
[2010/10/04 11:09:40, 0] lib/sysquotas.c:sys_get_quota(421)
sys_path_to_bdev() failed for path [.]!
[2010/10/04 11:09:40, 0] lib/sysquot
This question isn't really about Lustre, but file system
administration. I was wondering what tools exist, particularly
anything free/open source, that can scan for old files and either
report to the admin or user that said files are say 1yr old, please
archive them or delete them. Also any tools t
I've read you can export lustre via nfs but I'm running into some
trouble. I tried nfs3 but when I would check a directory, all the
files where labeled red and ls -al showed no username or permissions,
just "?" This was on the server
nfsd: non-standard errno: -43
LustreError: 11-0: an error occurr
le client, such as
> the machine running samba. if you have other lustre clients
> also mounting that filesystem, you'll need flock not localflock to provide
> consistency.
>
> -mark
>
>> On Fri, Aug 27, 2010 at 6:15 PM, Oleg Drokin
>> wrote:
>>>
>>>
27, 2010 at 6:15 PM, Oleg Drokin wrote:
> Hello!
>
> On Aug 27, 2010, at 6:41 PM, David Noriega wrote:
>> But I also found out about the flock option for lustre. Should I set
>> flock on all clients? or can I just use localflock option on the
>> fileserver?
>
> I
Are their issues with Samba and Lustre working together? I remember
something about turning oplocks off in samba, and while testing samba
I noticed this
[2010/08/27 17:30:59, 3] lib/util.c:fcntl_getlock(2064)
fcntl_getlock: lock request failed at offset 75694080 count 65536
type 1 (Function not
OK our lustre system is up and running, but currently its hooked into
our internal network. How do we go about accessing it from the
external network(university).
Its the basic setup, two OSSs, and two MDS/MGS, all setup with
failover, all mount options are currently set using their internal
ips(1
I'm curious about the underlying framework of lustre in regards to failover.
When creating the filesystems, one can provide --failnode=x.x@tcp0
and even for the OSTs you can provide two nids for the MDS/MGS. What
do these options tell lustre and the clients? Are these required for
use with hea
hardware? An OST can't be put into a general 'pool' for use between
the two?
David
On Wed, Aug 18, 2010 at 12:33 PM, Kevin Van Maren
wrote:
> David Noriega wrote:
>>
>> OK hooray! Lustre setup with failover of all nodes, but now we have
>> this huge lust
OK hooray! Lustre setup with failover of all nodes, but now we have
this huge lustre mount point. How can I say create /lustre/home and
/lustre/groups and mount on the client?
David
--
Personally, I liked the university. They gave us money and facilities,
we didn't have to produce anything! You'
I've read through the 'More Complicated Configurations' section in the
manual and it says as part of setting up failover with
two(active/passive) MDS/MGS and two OSSs(active/active) to use the
following:
mkfs.lustre --fsname=lustre --ost --failnode=192.168.5@tcp0
--mgsnode=192.168.5@tcp0,1
is: MGT -> MDT -> OSTs
>
> Best regards,
>
> Wojciech
>
>
>
> On 17 August 2010 18:19, David Noriega wrote:
>>
>> Oppps some how I changed the target name of all OSTs to lustre-OST
>> and trying to mount any other ost fails. I've gone and foun
, Aug 17, 2010 at 11:26 AM, David Noriega wrote:
> Some info:
> MDS/MGS 192.168.5.104
> Passive failover MDS/MGS 192.168.5.105
> OSS1 192.168.5.100
> OSS2 192.168.5.101
>
> I've got some more questions about setting up failover. Besides having
> heartbeat setup, what abo
e second MDS?
David
On Mon, Aug 16, 2010 at 2:14 PM, Kevin Van Maren
wrote:
> David Noriega wrote:
>>
>> Ok I've gotten heartbeat setup with the two OSSs, but I do have a
>> question that isn't stated in the documentation. Shouldn't the lustre
>> m
Ok I've gotten heartbeat setup with the two OSSs, but I do have a
question that isn't stated in the documentation. Shouldn't the lustre
mounts be removed from fstab once they are given to heartbeat since
when it comes online, it will mount the resources, correct?
David
--
Personally, I liked the
Still very new to lustre, and now I'm going over the failover part. I
use tune2fs to set MMP, but I would get this warning about
needs_recovery, do a journal replay or else the setting will be lost.
With dumpe2fs I could see the needs_recovery flag was set on all of
the OST/MDT. Reading over the re
installed or configured properly. Some times the array
> controllers requires a special driver to be installed on Linux host (for
> example RDAC mpp driver) to properly present and handle configured volumes
> in the OS. What sort of disk raid array are you using?
>
> Best gerads,
>
> Woj
We just setup a lustre system, and all looks good, but there is this
nagging error thats floating about. When I reboot any of the nodes, be
it a OSS or MDS, I will get this:
[r...@meta1 ~]# dmesg | grep sdc
sdc : very big device. try to use READ CAPACITY(16).
SCSI device sdc: 4878622720 512-byte h
We just got our lustre system online, and as we continue to play with
it, I need some help supporting my argument that we should have two
file servers. One nfs server to host up user's home directories and
then the lustre file server to host up space for their jobs to run. My
manager's concern is c
So your script resets the server so there is no fail-over(ie the other
server takes over resources from that server?) or there is failover
but you then manually return resources back to the server that was
reset?
On Tue, Aug 10, 2010 at 1:39 PM, Bernd Schubert
wrote:
> On Tuesday, August 10, 2010
#x27;t _know_ that the
> node is off.
> So it is absolutely not a 1-line script.
>
> Kevin
>
>
> David Noriega wrote:
>>
>> I think I'll go the ipmi route. So reading on STONITH, its just a
>> script, so all I would need is a script to run ipmi that tells t
title=Clu_Manager no longer exists,
and noticed this too when I found the lustre quick guide is no longer
available.
Thanks
David
On Tue, Aug 10, 2010 at 10:57 AM, Kevin Van Maren
wrote:
> David Noriega wrote:
>>
>> Could you describe this resource fencing in more detail? As for
&
:
> On Aug 9, 2010, at 11:45 AM, David Noriega wrote:
>
>> My understanding of setting up fail-over is you need some control over
>> the power so with a script it can turn off a machine by cutting its
>> power? Is this correct?
>
> It is the recommended configu
My understanding of setting up fail-over is you need some control over
the power so with a script it can turn off a machine by cutting its
power? Is this correct? Is there a way to do fail-over without having
access to the pdu(power strips)?
Thanks
David
--
Personally, I liked the university. Th
This isnt a question directly about Lustre, but a question towards the
admins. What software/hardware do you use to archive data? I've heard
of other places where they have an archive server, where users copy
files to it and on its own will moves those to tape, and at the user's
request, bring them
What tools do you use to keep track of who is using and how much of the
filesystem? Are there any free tools to keep track of old files, temp files,
large files, etc? Basically how to you keep things running in an orderly
fashion and keep users in line, besides adding more space?
--
Personally, I
We are pre-Lustre right now and have some questions. Currently our cluster
uses LDAP+automount to mount user's home directories from our file server.
Once we go Lustre, is there any sort of modification to LDAP or
automount(besides the installation of the Lustre client programs) needed?
--
Person
ill need to do some raid for 3x2tb ost
> and mirror for mds
> Hope that U have enough memory and cpu power and network for MDS and OST.
> these are dedicate for lustre, u need other compute nodes.
> regards
>
>
>
> On 5/19/2010 1:31 PM, David Noriega wrote:
>
> My supe
. What about what comes
with Redhat/CentOS in their cluster packages?
On Wed, May 19, 2010 at 12:47 PM, Brian J. Murrell wrote:
> On Wed, 2010-05-19 at 12:31 -0500, David Noriega wrote:
> >
> > We have 7 workstations, and the idea was to put into them 3 2TB
> > drives, for
My supervisor has this idea and I would like the input of the
Lustre community as we are still very new to Lustre.
We have 7 workstations, and the idea was to put into them 3 2TB drives, for
a total of 42TB, and set them up as object servers, and another workstation
as a meta data server. How feas
73 matches
Mail list logo