Re: [lustre-discuss] backup zfs MDT or migrate from ZFS back to ldiskfs

2017-07-20 Thread Isaac Huang
On Fri, Jul 21, 2017 at 12:54:15PM +0800, Stu Midgley wrote: > Afternoon > > I have an MDS running on spinning media and wish to migrate it to SSD's. > > Lustre 2.9.52 > ZFS 0.7.0-rc3 This may not be a stable combination - I don't think Lustre officially supports 0.7.0-rc yet. Plus, ther

Re: [lustre-discuss] upgrading zfs back end file system lustre doesn't mount anymore

2015-06-09 Thread Isaac Huang
On Tue, Jun 09, 2015 at 11:10:21AM -0400, Kurt Strosahl wrote: > Good Morning, > >That seems to have done the trick. For the benefit of everyone on this > list using zfs... The issue I encountered with zfs is described here: > https://github.com/zfsonlinux/zfs/issues/2523 > >To resolve

Re: [lustre-discuss] zfs -- mds/mdt -- ssd model / type recommendation

2015-05-06 Thread Isaac Huang
The dnodes are stored in data blocks of the meta_dnode, whose block size is a fixed constant: #define DNODE_BLOCK_SHIFT 14 /* 16k */ Again, this is not affected by ZFS recordsize. -Isaac On Wed, May 06, 2015 at 09:02:11PM -0600, Isaac Huang wrote: > On Tue, May 05, 2015 at 05:16:1

Re: [lustre-discuss] zfs -- mds/mdt -- ssd model / type recommendation

2015-05-06 Thread Isaac Huang
On Tue, May 05, 2015 at 05:16:14PM +, Alexander I Kulyavtsev wrote: > .. > Shall we use smaller ZFS record size on MDT, say 8KB or 16KB? If inode is > ~10KB and zfs record 128KB, we are dropping caches and read data we do not > need. The ZFS recordsize does not affect Lustre OST/MDT. The

Re: [lustre-discuss] zfs -- mds/mdt -- ssd model / type recommendation

2015-05-06 Thread Isaac Huang
The dnodes are ditto'ed over whatever redundancy the raidz/mirror already provides, so for 2-way mirrors that'd be multiplied by 4 from the compressed dnode size. BTW, all ZFS meta-data are compressed by default. The recent 0.6.4 release supports LZ4 compression of meta data, which I found in some

Re: [lustre-discuss] zfs -- mds/mdt -- ssd model / type recommendation

2015-05-06 Thread Isaac Huang
Since there's no TRIM support for ZFS on Linux yet, I wonder if someone has data/experience to share about ZFS on SSD performance as the SSDs age. Some believe for modern over-provisioned SSDs, lack of TRIM isn't any big deal but I talked with some SSD developers here and they all disagreed. -Isaa

Re: [Lustre-discuss] [HPDD-discuss] What's the status of liblustre ?

2015-02-25 Thread Isaac Huang
I'm not sure about liblustre, but user space support has already been removed from Lustre networking stack. I believe that'd eliminate any chance of FUSE Lustre client. -Isaac On Thu, Feb 26, 2015 at 11:59:23AM +0800, 邓尧 wrote: > The lustre wiki page > (http://wiki.lustre.org/index.php/LibLustre_

Re: [Lustre-discuss] options lnet routes section in lustre.conf

2015-01-13 Thread Isaac Huang
You don't have to wait for Lustre 2.7. The dynamic LNet config feature will enable configuration of LNet interfaces and other parameters without reloading the kernel module, but the LNet routes has always been dynamically configurable with "lctl add_route/del_route". -Isaac On Wed, Jan 07, 2015 a

Re: [Lustre-discuss] Network name o2ib0 collision in two discrete filesystems

2014-09-09 Thread Isaac Huang
On Tue, Sep 09, 2014 at 05:04:58AM -0600, James Robnett wrote: > > I'm having difficulty figuring out a solution to an LNET issue I'm having. > > We have two Lustre filesystems separated by about 60 miles, both of > which have o2ib0(ib0) and tcp(eth0) networks defined. Both have IB > and TCP cli

Re: [Lustre-discuss] number of inodes in zfs MDT

2014-06-18 Thread Isaac Huang
On Wed, Jun 18, 2014 at 06:11:33AM -0400, Anjana Kar wrote: > .. > Instead we have moved to ldiskfs MDT and zfs OSTs, with the same lustre/zfs > versions, and have a lot more inodes available. > > FilesystemInodes IUsed IFree IUse% Mounted on > x.x.x.x@o2ib:/iconfs >

Re: [Lustre-discuss] number of inodes in zfs MDT

2014-06-17 Thread Isaac Huang
On Thu, Jun 12, 2014 at 04:41:14PM +, Dilger, Andreas wrote: > It looks like you've already increased arc_meta_limit beyond the default, > which is c_max / 4. That was critical to performance in our testing. > > There is also a patch from Brian that should help performance in your case: > htt

Re: [Lustre-discuss] Understanging LNET routing

2013-08-15 Thread Isaac Huang
On Thu, Aug 15, 2013 at 04:09:45PM +0400, Vsevolod Nikonorov wrote: > .. > Is Lustre routing something to do with TCP/IP routing? Should I set > net.ipv4.ip_forward to 1 in sysctl.conf? Should I do some IP masquerade for > Lustre routing to work properly? No. - Isaac ___

Re: [Lustre-discuss] Multirail IB Configuration Issue

2013-02-26 Thread Isaac Huang
On Tue, Feb 26, 2013 at 01:04:06PM -0500, mages, brian wrote: > Hi, > > It appears that I've resolved the issue and therefore wanted to provide an > update to this list. As I noted in the description of my configuration, the > client only has a single IB interface. After changing the options f

Re: [Lustre-discuss] LNET over multiple NICs

2013-01-28 Thread Isaac Huang
On Mon, Jan 28, 2013 at 04:23:37PM +0100, Alexander Oltu wrote: > On Thu, 24 Jan 2013 17:29:00 +0100 > Sébastien Buisson wrote: > > > > > > In your case, I think it would mean: > > routes="gni0 xxx.xxx.110.xxx@tcp0 \ > > gni1 xxx.xxx.111.xxx@tcp1" > > > > Looks like this can be a work

Re: [Lustre-discuss] LNET over multiple NICs

2013-01-23 Thread Isaac Huang
On Wed, Jan 23, 2013 at 02:10:54PM +0100, Alexander Oltu wrote: > .. > routes="gni0 xxx.xxx.110.xxx@tcp0 xxx.xxx.111.xxx@tcp1" > > And getting: > > LustreError: 5598:0:(router.c:399:lnet_check_routes()) Routes to gni > via xxx.xxx.111.xxx@tcp1 and xxx.xxx.110.xxx@tcp not supported > > I chec

Re: [Lustre-discuss] lctl ping of Pacemaker IP

2012-11-02 Thread Isaac Huang
On Fri, Nov 02, 2012 at 12:04:02AM -0400, Ms. Megan Larko wrote: > .. > What steps should I take to generate a successful "lctl ping a.b.c.d"? There must be a LNet instance running over SOCKLND on a.b.c.d. - Isaac ___ Lustre-discuss mailing list Lus

Re: [Lustre-discuss] is there a way to run Lustre over UDP instead TCP?

2012-04-09 Thread Isaac Huang
You'll have to write a UDP driver for the Lustre networking stack, not an easy task. - Isaac On Mon, Apr 09, 2012 at 10:44:11PM +, Hebenstreit, Michael wrote: > > See title... > > Thanks > Michael > > > Michael Hebens

Re: [Lustre-discuss] EXTERNAL: Re: LNET Performance Issue

2012-02-23 Thread Isaac Huang
Hi all, I'd suggest to start from simple point to point tests. There's too many variables involved in a 'dd'. Please: - Do a native IB write test from A to B, of 1M transfers, which is the max payload per Lustre RPC. With native IB bandwidth test tool, I remember there used to be an ib_write_

Re: [Lustre-discuss] Finding bugs in Lustre with Coccinelle

2012-01-09 Thread Isaac Huang
On Sun, Jan 08, 2012 at 10:20:36AM -0700, Andreas Dilger wrote: > Isaac, > I'm all in favor of using static code analysis tools to find bugs like this. > The first step, as you have done is to find and fix the bugs (though with > proper patches since LASSERT() as a means of error handling is unac

Re: [Lustre-discuss] Finding bugs in Lustre with Coccinelle

2012-01-09 Thread Isaac Huang
On Sun, Jan 08, 2012 at 09:43:00AM +, Nikitas Angelinas wrote: > Hi Isaac, > > Funny, I was planning to have a look at this, this weekend if time > permitted. I was interested in finding out how noticeable the issue of > false positives may be in Coccinelle, but that shouldn't be a big > probl

[Lustre-discuss] Finding bugs in Lustre with Coccinelle

2012-01-07 Thread Isaac Huang
Today I decided to try Coccinelle on latest Lustre code found on master at git://git.whamcloud.com/fs/lustre-release.git. I came up with a simple Coccinelle script that tries to detect the case where a new object is allocated and dereferenced without checking it against NULL. Eight such bugs were

Re: [Lustre-discuss] Line rate performance for clients

2011-08-02 Thread Isaac Huang
On Mon, Aug 01, 2011 at 02:52:07PM +0200, Peter Kjellström wrote: > > > On 2011-07-29, at 11:33 AM, Brock Palen wrote: > > .. > > Does that make sense? Is it even right for me to expect that I could > > combine the performance together and expect full speed in and full speed > > out if I can c

Re: [Lustre-discuss] LNET o2ib networking and MTU

2011-07-14 Thread Isaac Huang
On Thu, Jul 14, 2011 at 12:43:32PM -0700, Adesanya, Adeyemi wrote: > > Just need some clarification on this: > > We use the o2ib driver for Lustre IB communication. We also use IPoIB to > define IP addresses for the IB interfaces in the network. Does the MTU > configuration parameter impact Lu

Re: [Lustre-discuss] New wc-discuss Lustre Mailing List

2011-07-14 Thread Isaac Huang
On Tue, Jul 12, 2011 at 02:12:38PM -0700, Peter Jones wrote: > Isaac > > If you (or anyone else for that matter) is having trouble joining the > group let me know privately at pjo...@whamcloud.com which email address > that you would like to use and I will add you manually. Thanks Peter, I got

Re: [Lustre-discuss] Client Eviction Preceded by EHOSTUNREACH and then ENOTCONN?

2011-07-12 Thread Isaac Huang
On Tue, Jul 12, 2011 at 11:06:40AM -0700, Rick Wagner wrote: > On Jul 12, 2011, at 11:01 AM, Isaac Huang wrote: > > > On Mon, Jul 11, 2011 at 03:39:34PM -0700, Rick Wagner wrote: > >> Hi, > >> .. > >> I am assuming that -113 is EHOSTUNREACH and -107 is E

Re: [Lustre-discuss] New wc-discuss Lustre Mailing List

2011-07-12 Thread Isaac Huang
On Sun, Jul 03, 2011 at 10:36:46PM +0200, Adrian Ulrich wrote: > > > you can subscribe simply by sending an e-mail to > > wc-discuss+subscr...@googlegroups.com. > > This bounces, but sending an e-mail to > works. > However: The link in the verification mail will take you to a login page - so >

Re: [Lustre-discuss] Client Eviction Preceded by EHOSTUNREACH and then ENOTCONN?

2011-07-12 Thread Isaac Huang
On Mon, Jul 11, 2011 at 03:39:34PM -0700, Rick Wagner wrote: > Hi, > .. > I am assuming that -113 is EHOSTUNREACH and -107 is ENOTCONN, and that the > error codes from errno.h are being used. > > We've been experiencing similar problems for a while, and we've never seen IP > traffic have a p

Re: [Lustre-discuss] high CPU usage on MDS

2011-07-06 Thread Isaac Huang
I think it's TCP/IP according to the process list. It'd help to find out where the CPU time was spent, e.g. by oprofile. - Isaac On Wed, Jul 06, 2011 at 12:14:54PM -0600, Colin Faber wrote: > Hi, > > More details are needed here. What type of interconnect are you using? > What are your clients

Re: [Lustre-discuss] Using Infiniband QoS with Lustre 1.8.5

2011-02-08 Thread Isaac Huang
On Tue, Feb 08, 2011 at 05:44:35PM +0100, Ramiro Alba wrote: > Hi everybody, > > We have a 128 nodes (8 cores/node) 4x DDR IB cluster with 2:1 > oversubscription and I use the IB net for: > > - OpenMPI > - Lustre > - Admin (may change in future) > > I'am very interested in using IB QoS, as in th

Re: [Lustre-discuss] lnet route tracing

2010-10-18 Thread Isaac Huang
On Tue, Oct 19, 2010 at 11:05:08AM +0800, liang.whamcloud wrote: > .. > to confirm realtime data flow. Of course, It's not difficult to make > LNet record information like forwarded bytes on each router. I think it's already recorded - in the second to last field in "/proc/sys/lnet/stats".

[Lustre-discuss] test - pls ignore

2010-03-26 Thread Isaac Huang
Test Oracle SMTP server connectivity issue - bug 22291 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] High Load and high system CPU for mds

2010-03-01 Thread Isaac Huang
On Mon, Mar 01, 2010 at 02:35:18PM -0500, Oleg Drokin wrote: > Hello! > > On Feb 28, 2010, at 9:31 PM, huangql wrote: > > We got a problem that the MDS has high load value and the system CPU is up > > to 60% when running chown command on client. It's strange that the load > > value and system CP

Re: [Lustre-discuss] Lustre-1.8.1.1 over o2ib gives Input/Output error while executing lctl ping

2010-02-25 Thread Isaac Huang
On Mon, Feb 22, 2010 at 03:22:52AM -0800, Vipul Pandya wrote: > Hello Issac, Hi Vipul, > .. > I lowered the map_on_demand value to 16 and now it works fine. > > However, I had once concern, whether lowering down this map_on_demand > value would impact the performance of Lustre or not? For i

Re: [Lustre-discuss] LNET error help

2010-02-23 Thread Isaac Huang
On Tue, Feb 23, 2010 at 08:27:40PM +0530, Vineet ghatge wrote: >Hi all, >I am trying to get Lustre 1.8 version up and running on fedora >(standalone for the time being) > When I try to run the command "lctl network up" I get The following >error:- >opening /dev/lnet failed:

Re: [Lustre-discuss] Lustre-1.8.1.1 over o2ib gives Input/Output error while executing lctl ping

2010-02-16 Thread Isaac Huang
On Mon, Feb 15, 2010 at 09:45:10PM -0800, Vipul Pandya wrote: > .. > -> I tried to load the ko2iblnd module as you have suggested. But still > I am unable to do 'lctl ping'. I am getting the same error as shown > below. > #> modprobe ko2iblnd map_on_demand=64 Please lower it to "map_on_demand=

Re: [Lustre-discuss] Lustre-1.8.1.1 over o2ib gives Input/Output error while executing lctl ping

2010-02-15 Thread Isaac Huang
On Fri, Feb 12, 2010 at 05:53:19AM -0800, Vipul Pandya wrote: >.. >#> lctl network up >LNET configured >Above command gave me following error in dmesg >#> dmesg > >Lustre: Listener bound to eth2:102.88.88.188:987:cxgb3_0 >Lustre: Register global MR array, MR size: 0

Re: [Lustre-discuss] lctl ping between 1.8.1 1.8.2 protocol error

2010-02-11 Thread Isaac Huang
On Thu, Feb 11, 2010 at 03:33:33PM +0100, Sebastian Reitenbach wrote: > Hi, > > in my test system I installed Lustre 1.8.2 from source on a opensuse 10.2 > i386 > (2.6.18.8-0.13-xenpaelustre) as a client. Other clients and the servers are > running 1.8.1 on SLES 11 x86_64 (2.6.27.39-0.3-xen-lus

Re: [Lustre-discuss] QDR Questions

2010-02-09 Thread Isaac Huang
On Wed, Jan 27, 2010 at 08:35:30AM -0800, Frank Leers wrote: > .. > > Thanks Frank. My questions are for QDR and IB-Bonding with Lustre. > > None of this is really QDR-specific, but have a look at : > > https://bugzilla.lustre.org/show_bug.cgi?id=20153 > and > https://bugzilla.lustre.org/sh

Re: [Lustre-discuss] Multi-Rail Configurations on a Multi-Port IB HCA

2009-11-16 Thread Isaac Huang
On Mon, Nov 16, 2009 at 08:01:12PM -0500, Dardo D Kleiner - CONTRACTOR wrote: > So are you suggesting I could just comment out the check in router.c? That's enough for lnet but Lustre changes must also be made. Isaac ___ Lustre-discuss mailing list Lust

Re: [Lustre-discuss] Multi-Rail Configurations on a Multi-Port IB HCA

2009-11-16 Thread Isaac Huang
On Mon, Nov 16, 2009 at 04:38:03PM -0500, Dardo D Kleiner - CONTRACTOR wrote: > Stand down. Don't know what was wrong with my configuration at first, > but it does instantiate the two NIDs on the host with multiple ports > on a single HCA. Unfortunately, > > LustreError: 17771:0:(router.c:464:ln

Re: [Lustre-discuss] Determine addresses of connected clients/servers

2009-11-16 Thread Isaac Huang
On Mon, Nov 16, 2009 at 02:51:01PM -0700, Lundgren, Andrew wrote: >Is there a command to pull the addresses of every device connected to a >cluster? > >I have found: > >lct -net tcp [peer_list | conn_list] This would only show immediate peers, i.e. next-hop peers. In a routed con

Re: [Lustre-discuss] Multi-Rail Configurations on a Multi-Port IB HCA

2009-11-14 Thread Isaac Huang
On Fri, Nov 13, 2009 at 03:34:14PM -0500, Dardo D Kleiner - CONTRACTOR wrote: > Mellanox ConnectX MT25418, two ports, each connected to a separate > IB fabric - ib0 and ib1 have distinct IP subnets, each connected > to a separate Lustre router. > .. > ip ad ls: > 4: ib0: mtu 65520 qdisc pfifo_

Re: [Lustre-discuss] Mount Failure

2009-11-12 Thread Isaac Huang
On Thu, Nov 12, 2009 at 12:47:33PM -0500, Brian J. Murrell wrote: > On Thu, 2009-11-12 at 10:37 +, Chris Exton wrote: > > I am having a few problems with Lustre and I can???t seem to find the > > answer to my problem on the web so I wondered if you could help? > > You have networking problems

Re: [Lustre-discuss] Dual NICs issue -- How to enforce Lustre to use the second NIC

2009-11-11 Thread Isaac Huang
On Wed, Nov 11, 2009 at 04:07:39PM -0600, Daneil Goodman wrote: >Hello list, >By searching the archive, I found a similar message dated back in >January 2008 -- How do you make an MGS/OSS listen on 2 NICs? Looks like >there is no final solution and I am facing the similar situation

Re: [Lustre-discuss] poor lustre wan performance

2009-11-10 Thread Isaac Huang
On Tue, Nov 10, 2009 at 08:02:03AM -0500, Dardo D Kleiner - CONTRACTOR wrote: > .. > At this point it clearly doesn't matter if I mess with max_rpcs_in_flight > which used > to be a way to mitigate the high BDP. > > Are there new parameters and/or tunings for ko2iblnd we're supposed to be >

Re: [Lustre-discuss] Help with lustre routing

2009-11-10 Thread Isaac Huang
On Mon, Nov 09, 2009 at 10:55:34PM -0800, Eric Adint wrote: > OK i have read the manual and i have read the boards and done as much > research as i can, but i cant seem to bend my head around this, what i > want to do is create a router so that i can keep my OST and MGS/MDT on > the IB networ

Re: [Lustre-discuss] lnet & client client access

2009-11-09 Thread Isaac Huang
On Fri, Nov 06, 2009 at 12:34:34AM +0100, Piotr Wadas wrote: > .. > -- > options lnet networks=tcp0 > -- When an interface name has been omitted, the lnet would iterate over the list of system IP interfaces (by SIOCGIFCONF) and choose the 1st one whose status is "up" (SIOCGIFFLAGS) and has bee

Re: [Lustre-discuss] Network Package loss

2009-11-09 Thread Isaac Huang
On Mon, Nov 09, 2009 at 02:48:34PM +0100, Heiko Schröter wrote: > Hello, > > we do encounter peaks of upto 30% package loss in our Gigabit Network. It would be helpful if you'd elaborate on where the 30% came from. > This is sporadic, say once every hour remaining for some seconds. We cannot >

Re: [Lustre-discuss] o2ib and tcp(IPoIB) on the same IB interface.

2009-09-07 Thread Isaac Huang
On Mon, Sep 07, 2009 at 06:58:39PM +0100, Wojciech Turek wrote: >Hi, >I am designing lustre file system that will be serving two separate >clusters. One of the clusters is old and uses Ethernet data network. >Second of the clusters is new and uses QDR IB data network. I would >l

Re: [Lustre-discuss] lustre errors when system stressed; bad hardware?

2009-08-27 Thread Isaac Huang
On Wed, Aug 26, 2009 at 06:52:24PM -0700, Abe Ingersoll wrote: >.. >kiblnd_tx_complete()) Tx -> 10.168.22@o2ib cookie 0xc8dd6 sending 1 >waiting 1: failed 12 12 == IB_WC_RETRY_EXC_ERR, which usually indicates faulty links in the network or some other application (like a MPI app

Re: [Lustre-discuss] [Fwd: [ofa-general] IPoIB Transmit Timeouts]

2009-08-17 Thread Isaac Huang
On Mon, Aug 17, 2009 at 12:23:35PM -0400, Charles A. Taylor wrote: > FWIW, I posted this to ofa-general a little earlier. Anyone else > seeing this?Suggestions?I think this is an OFED 1.4.1 problem > but they may point the finger at you guys. :) > > We've tried limiting OST threads to n

Re: [Lustre-discuss] Lustre playground in VirtualBox?

2009-08-12 Thread Isaac Huang
On Mon, Aug 10, 2009 at 03:39:52PM +0200, Wolfgang Stief wrote: > Hi out there! > > Before I start installing and fiddling around: Are there any reasons > AGAINST setting up a Lustre playground in a VirtualBox environment? I > just want to play around w/ recovery and debugging situations and > upg

Re: [Lustre-discuss] Help: NIC Changed Error

2009-08-12 Thread Isaac Huang
On Mon, Aug 10, 2009 at 03:56:13PM +0800, Lee Amy wrote: > .. > It seems this method cannot solve my problem. My NID is > 10.0.38@tcp, and furthermore when I add the item > > options lnet network=tcp0(eth1) > > I still encountered the same problem and after this failure I change > this it

Re: [Lustre-discuss] LBUG encountered in 1.8.0

2009-08-05 Thread Isaac Huang
On Fri, Jul 31, 2009 at 10:52:46AM -0600, Daniel Kulinski wrote: >Unmounting lustre when our heartbeat software was misconfigured (IPMI >password changed). > > >tx1oss3-clusternet kernel: LustreError: >19350:0:(quota_context.c:1369:lqs_exit()) >ASSERTION(atomic_read(&q->lqs_re

Re: [Lustre-discuss] Question about changing NIDs

2009-08-04 Thread Isaac Huang
On Tue, Jul 28, 2009 at 02:24:12PM -0600, Daniel Kulinski wrote: >I have read the very brief section on changing NIDs in the Lustre >Manual. In the attachments of bug 18231 you may find more information on changing server NIDs: https://bugzilla.lustre.org/show_bug.cgi?id=18231 I'm not sur

Re: [Lustre-discuss] IRC

2009-07-14 Thread Isaac Huang
Have you tried irc.lustre.org instead of zone.lustre.org? Isaac On Tue, Jul 14, 2009 at 11:11:26AM -0700, Frank Leers wrote: > Anybody in the know about the ETA of the lustre IRC server coming back > up? ___ Lustre-discuss mailing list Lustre-discuss@

Re: [Lustre-discuss] Bonded client interfaces and 10GbE server

2009-07-07 Thread Isaac Huang
On Tue, Jul 07, 2009 at 11:44:39AM -0400, Isaac Huang wrote: > .. > > If I would attach the OSS with a single 10GbE link, could > > a client then use the second link, when striping over targets > > on same OSS? > > There's a rather complex way of static con

Re: [Lustre-discuss] Bonded client interfaces and 10GbE server

2009-07-07 Thread Isaac Huang
On Tue, Jul 07, 2009 at 03:44:32PM +0200, Ralf Utermann wrote: > Dear list, > > we have setup of OSS and some clients with a dual Gigabit > trunk (miimon=100 mode=802.3ad xmit_hash_policy=layer3+4). If I understand it correctly, xmit_hash_policy=layer3+4 would not allow a single TCP connection to

Re: [Lustre-discuss] InfiniBand QoS with Lustre ko2iblnd.

2009-06-30 Thread Isaac Huang
On Wed, Jul 01, 2009 at 02:07:33AM -0400, Isaac Huang wrote: > .. > >> For your current concern of setting up different SLs, I'd believe that > >> it could be achieved via target GUIDs as mentioned in my previous reply. > > > > Unfortunately, configuring I

Re: [Lustre-discuss] InfiniBand QoS with Lustre ko2iblnd.

2009-06-30 Thread Isaac Huang
On Fri, Jun 26, 2009 at 01:42:53PM +0200, S?bastien Buisson wrote: > > Isaac Huang a ?crit : >> On Wed, Jun 24, 2009 at 09:46:19AM +0200, S?bastien Buisson wrote: >>> .. >>> The peer's port information could be stored in the kib_peer_t >>> struct

Re: [Lustre-discuss] InfiniBand QoS with Lustre ko2iblnd.

2009-06-25 Thread Isaac Huang
On Wed, Jun 24, 2009 at 09:46:19AM +0200, S?bastien Buisson wrote: > .. > The peer's port information could be stored in the kib_peer_t structure. > That way, it would be possible to make clients connect to servers which > listen on different ports. > What do you think? At this point it ca

Re: [Lustre-discuss] InfiniBand QoS with Lustre ko2iblnd.

2009-06-25 Thread Isaac Huang
On Mon, Jun 22, 2009 at 04:49:03PM +0200, S?bastien Buisson wrote: > .. > Let's consider we have two sets of OSSes, each set serving a different > Lustre file system (i.e. all the OSTs of an OSS are part of the same > Lustre file system). The same Lustre clients have access to both > filesys

Re: [Lustre-discuss] lustre using wrong network

2009-06-19 Thread Isaac Huang
On Fri, Jun 19, 2009 at 08:43:11AM -0400, Michael Di Domenico wrote: > > .. > > Have you changed server NIDs without updating configuration logs with > > --writeconf? > > By accident the lnet configs came up with the 192.168.0.x config > because a modprobe setting was wrong. However, i took t

Re: [Lustre-discuss] lustre using wrong network

2009-06-18 Thread Isaac Huang
On Thu, Jun 18, 2009 at 09:51:33PM -0400, Michael Di Domenico wrote: > .. > > But the connection was rejected because the server didn't have > > 192.168.0@tcp as one of its NIDs. > > > > What was your mount command line? What does 'lctl list_nids' say on > > the nodes? > > list_nids show t

Re: [Lustre-discuss] lustre using wrong network

2009-06-18 Thread Isaac Huang
On Thu, Jun 18, 2009 at 09:11:50PM -0400, Michael Di Domenico wrote: > I cannot figure out what exactly has happened here and how to recover from it. > > Jun 18 21:02:52 node0-eth1 kernel: LustreError: > 2722:0:(socklnd_cb.c:2156:ksocknal_recv_hello()) Error -104 reading > HELLO from 192.168.0.248

Re: [Lustre-discuss] Kernel bug in combination with bonding

2009-06-16 Thread Isaac Huang
On Tue, Jun 16, 2009 at 12:57:27PM +0200, Tom Woezel wrote: >.. >Jun 16 04:33:38 sososd1 kernel: BUG: soft lockup - CPU#2 stuck for 10s! >.. >Jun 16 04:33:38 sososd1 kernel: Call Trace: >Jun 16 04:33:38 sososd1 kernel:[] >:bonding:ad_rx_machine+0x20/0x502 >Ju

Re: [Lustre-discuss] Configuring Lustre routring between two tcp networks

2009-06-13 Thread Isaac Huang
On Thu, Jun 11, 2009 at 10:51:01PM -0400, Erik Froese wrote: >OK here's where I am now. > >The public client can ping the routers public address but not the >private address. > >[r...@routed-client lnet]$ cat /etc/modprobe.conf >.. >options lnet accept=all This would

Re: [Lustre-discuss] 2.0-alpha2 MDS out of memory problem

2009-06-09 Thread Isaac Huang
On Tue, Jun 09, 2009 at 02:36:37PM +0200, Arne Wiebalck wrote: > Dear all, > > I set up an 2.0-alpha2 system and planned to populate it with > 100 million files. While populating it however, the MDS ran > out of memory, the OOM kicked in, killed some processes, and > all ended in a kernel panic. >

Re: [Lustre-discuss] Configuring Lustre routring between two tcp networks

2009-06-05 Thread Isaac Huang
On Thu, Jun 04, 2009 at 01:59:48PM -0400, Erik Froese wrote: >Thanks Andreas and Natalie, > >I've made the changes you suggested (setting tcp1 as the external >network) and I'm able to lctl ping the 128.122.x.y address but I still >cannot ping the private address for the MDS. Plea

Re: [Lustre-discuss] Configuring Lustre routring between two tcp networks

2009-06-04 Thread Isaac Huang
On Wed, Jun 03, 2009 at 05:45:10PM -0400, Erik Froese wrote: >.. >I don't see it sending any traffic to the router with tcpdump running >on the router. Alternatively, you may run 'routerstat 1' on the router to see how much data is being forwarded per second. Isaac ___

Re: [Lustre-discuss] lnet_try_match_md()) Matching packet from 12345-10.5.203....@tcp, match 19154486 length 728 too big

2009-05-26 Thread Isaac Huang
On Sat, May 23, 2009 at 09:18:43PM +0400, Alexey Lyashkov wrote: > Hi Michael, > > > On Fri, 2009-05-22 at 16:38 -0400, Michael D. Seymour wrote: > > Hi all, > > > > One client running CentOS 5.2 re-exports the Lustre filesystem via NFS on a > > different network. > > > > We get the following

Re: [Lustre-discuss] InfiniBand QoS with Lustre ko2iblnd.

2009-05-19 Thread Isaac Huang
On Tue, May 19, 2009 at 05:55:21PM +0200, S??bastien Buisson wrote: > Hi, > > We took a slightly different approach to deal with IB QoS in Lustre. > > We decided to assign a specific service-id to Lustre: in ofa-kernel we > added a new value in the rdma_port_space enum, that we called > RDMA_PS

Re: [Lustre-discuss] InfiniBand QoS with Lustre ko2iblnd.

2009-05-19 Thread Isaac Huang
On Mon, May 18, 2009 at 12:04:37PM +0200, Daniel Kobras wrote: > Hi! > > Does anyone know how to use QoS with Lustre's o2ib LND? The Voltaire IB > LND allowed to #define a service level, but I couldn't find a similar > facility in o2ib. Is there a different way to apply QoS rules? The o2iblnd SL

Re: [Lustre-discuss] tcp network load balancing understanding lustre 1.8

2009-05-07 Thread Isaac Huang
On Thu, May 07, 2009 at 03:02:49PM -0700, Klaus Steden wrote: > .. > I didn't even touch Lustre bonding, because as you both remark, it's a > little convoluted. I spent a lot of time experimenting with Lustre over > 802.3ad (LACP) aggregated links using the Linux bonding driver, and my OSS > no

Re: [Lustre-discuss] tcp network load balancing understanding lustre 1.8

2009-05-07 Thread Isaac Huang
On Thu, May 07, 2009 at 02:50:13PM +0200, Michael Ruepp wrote: > Hi there, > .. > I give every NID a IP in the same subnet, eg: 10.111.20.35-38 - oss0 > and 10.111.20.39-42 oss1 > > Do I have to make modprobe.conf.local look like this to force lustre > to use all four interfaces parallel:

Re: [Lustre-discuss] Infiniband hot spot avoidance with LMC>0

2009-04-27 Thread Isaac Huang
On Mon, Apr 27, 2009 at 12:21:41PM -0600, Nathan Dauchy wrote: > Greetings, > > Does Lustre's o2ib LND take advantage of Infiniband's LID Mask Count > (LMC) capability? Might it be included in the future? I'm looking for > something similar to the "MV2_USE_HSAM=1" option for Hot-Spot Avoidance >

Re: [Lustre-discuss] LNET TCP

2009-04-27 Thread Isaac Huang
On Fri, Apr 24, 2009 at 09:38:13AM +1000, Andrew Brooker wrote: > I'm having some difficulty with a slightly more complicated multihomed TCP > based LNET. > Here is what I would like to achieve, have a single MGS/MDT server that > lives on two physically separate IP networks. Be able to add OSTs fr

Re: [Lustre-discuss] Kernel panics while mounting OSTs

2009-03-26 Thread Isaac Huang
On Wed, Mar 25, 2009 at 04:47:21PM -0700, Adam Gandelman wrote: > Hi list- > .. > On all nodes: Linux 2.6.18-92.1.10.el5_lustre.1.6.6smp #1 SMP Tue Aug 26 > 12:05:09 EDT 2008 i686 i686 i386 GNU/Linux > > BUG: soft lockup - CPU#0 stuck for 10s [socknal_cd00:2785] It smells to me like an after

Re: [Lustre-discuss] ksocknal_process_receive() Error -14 / Error -14 on read from ...

2009-03-12 Thread Isaac Huang
On Thu, Mar 12, 2009 at 03:29:40PM +, Gerd wrote: > Hi, > > We have a 1.6.6 installation using InfiniBand attached DDN OST storage > and OSS'es connected to the network with 10GE adapters. When running > iozone with ~40 1GE attached clients we see the following on the clients: > .. > And

Re: [Lustre-discuss] OST external journals & an lbug

2009-02-24 Thread Isaac Huang
On Tue, Feb 24, 2009 at 09:38:42PM -0600, Hendelman, Rob wrote: > .. > I ended up with lots of problems and did end up hitting a few lbug's, > specifically: > > LustreError: 11283:0: (tracefile.c:431:libcfs_assertion_failed()) LBUG > LustreError: 8095:0: (tracefile.c:431:libcfs_aertion_fai

Re: [Lustre-discuss] Lustre with 10GbE or Infiniband?

2009-02-18 Thread Isaac Huang
You might find this interesting: http://www.cse.ohio-state.edu/~panda/temp/ib_10ge_advanced.pdf Isaac On Wed, Feb 11, 2009 at 2:08 PM, Jeffrey Bennett wrote: > Hi, > > Has anybody done any performance comparison between Lustre with 10GbE and > Lustre with Infiniband 4X SDR? I wonder if they per

Re: [Lustre-discuss] Lustre with 10GbE or Infiniband?

2009-02-12 Thread Isaac Huang
On Thu, Feb 12, 2009 at 08:26:09AM -0500, Scott Atchley wrote: >> .. >> One exception is SOCKLND on Chelsio's T3, quote: >> >> "The T3 ASIC uses the mechanism of Direct Data Placement (DDP) that >> provides a flexible zero copy on receive capability for regular TCP >> connections, requiring no

Re: [Lustre-discuss] Lustre with 10GbE or Infiniband?

2009-02-11 Thread Isaac Huang
On Wed, Feb 11, 2009 at 04:35:47PM -0500, Scott Atchley wrote: > .. > SOCKLND is limited by a copy on the receive side. When a client > writes, the server has to copy the data out. When a client reads, it > .. One exception is SOCKLND on Chelsio's T3, quote: "The T3 ASIC uses the mech

Re: [Lustre-discuss] Lustre with 10GbE or Infiniband?

2009-02-11 Thread Isaac Huang
On Wed, Feb 11, 2009 at 06:11:30PM -0500, Charles Taylor wrote: > .. > Just ran a quick IMB (formerly Pallas) between a couple of our SDR > nodes and got 860 MBytes/sec (ping-pong, 4MB). So I don't think > there is anything inherent in SDR IB that limits you to 750 MBytes/ > sec. Howev

Re: [Lustre-discuss] problem mounting two different lustre instances

2009-02-10 Thread Isaac Huang
On Mon, Feb 09, 2009 at 04:52:20PM +0100, G?tz Waschk wrote: > Hello everyone, > . > My client has this in modprobe.conf: > options lnet networks=o2ib,tcp > I'm trying to mount the remote network with > mount -t lustre 141.34.228...@tcp0:/atlas /scratch/lustre-1.6/atlas > and the command just h

Re: [Lustre-discuss] Timeouts and Dumps

2008-12-23 Thread Isaac Huang
On Tue, Dec 23, 2008 at 06:45:09AM -0700, Denise Hummel wrote: > Hi; > > Thanks. I have suspected the network, however have not been able to > pinpoint the problem. I have looked at the ethernet and infiniband > switches - found a few with IGMP turned on and some multicast issues. > Those have b

Re: [Lustre-discuss] Timeouts and Dumps

2008-12-19 Thread Isaac Huang
On Fri, Dec 19, 2008 at 08:42:16AM -0700, Denise Hummel wrote: > Hi; > > I have started getting numerous dump logs, timeouts and client > evictions. Our environment: > .. > Dec 19 04:17:28 oss1 kernel: Lustre: 27065:0:(router.c:167:lnet_notify()) > Ignoring prediction from 172.16.100...@tcp

Re: [Lustre-discuss] Kmod, dkms or make to compile network modules.

2008-12-19 Thread Isaac Huang
On Thu, Dec 18, 2008 at 10:30:42PM -0800, Arden Wiebe wrote: > .. > [r...@lustreone src]# uname -a > Linux lustreone.linuxguru.ca 2.6.18-92.1.10.el5_lustre.1.6.6smp #1 SMP Tue > Aug 26 12:16:17 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux > > [r...@lustreone ~]# rpm -qa kernel\* | sort > kernel-de

Re: [Lustre-discuss] IB Network Failover

2008-12-18 Thread Isaac Huang
We're now doing researches and a design draft shall be ready for public review (at lustre-devel) at the beginning of next January. Isaac On Wed, Dec 17, 2008 at 05:51:12PM -0600, Mike Feuerstein wrote: >Is support for network failover to an alternate IB port on the Lustre >roadmap > >

Re: [Lustre-discuss] Use spcecified NIC on login node

2008-12-14 Thread Isaac Huang
On Mon, Dec 15, 2008 at 10:01:08AM +0800, Lu Wang wrote: > Dear list, > There are two Ethernet Cards on our login node, one outside > connection(202.122.*.*), one for inside connection to other servers. The > problem is Lustre sometimes confuse with configuration. > [r...@lxslc09 ~]# netst

Re: [Lustre-discuss] Using ib0 and tcp

2008-11-13 Thread Isaac Huang
On Thu, Nov 13, 2008 at 04:18:02PM -0800, Joseph Farran wrote: > .. > # lctl list_nids > [EMAIL PROTECTED] > [EMAIL PROTECTED] > > How can I get Lustre to use both ib0 and bond0 (eth0 / eth1) for the > data nework? Currently it only uses Infiband (ib0) and not bond0. You may find this us

Re: [Lustre-discuss] Multiple interfaces in LNet

2008-10-23 Thread Isaac Huang
On Mon, Oct 20, 2008 at 01:20:46PM +0200, Danny Sternkopf wrote: > >> Beside of TCP it is only possible to use multiple interfaces on the same > >> node with o2ib, right? With ko2iblnd one can setup several Lustre > >> networks for each IB interface. In fact you must setup several Lustre > >> netwo

Re: [Lustre-discuss] Multiple interfaces in LNet

2008-10-16 Thread Isaac Huang
On Mon, Oct 13, 2008 at 05:29:14PM +0200, Danny Sternkopf wrote: > .. > Interesting is how to use multiple interfaces on the same server in > Lustre/LNet. My understanding is that TCP(ksocklnd) can manage multiple > physical interfaces as one LNet interface with one unique NID. Is that > still

Re: [Lustre-discuss] Adding IB to tcp only cluster

2008-10-16 Thread Isaac Huang
On Sun, Oct 12, 2008 at 10:15:01AM -0400, Brock Palen wrote: > .. > Currently we don't put any lustre modules in modprobe.conf, lustre > loads the correct modules when mounting the filesystem. We do this > to keep our loads simple as we have several. When nothing has been specified, LNet

Re: [Lustre-discuss] show_route

2008-10-08 Thread Isaac Huang
On Tue, Oct 07, 2008 at 11:00:20PM -0600, Andreas Dilger wrote: > On Oct 07, 2008 22:58 -0400, Mag Gam wrote: > > My intention was I wanted to see if my lustre connection is being > > routed thru other interfaces. I have 4 interfaces on my server: eth0 > > thru eth4. eth0 is used for Lustre but it

Re: [Lustre-discuss] LustreError: server_bulk_callback

2008-09-30 Thread Isaac Huang
On Wed, Sep 24, 2008 at 05:22:55PM -0600, Nathan Dauchy wrote: > Can anyone direct me to documentation to decipher these messages? > What does "server_bulk_callback" do, and does "status -103" indicate a > severe problem for event types 2 and 4? server_bulk_callback signals the completion of bulk

Re: [Lustre-discuss] Typical IB timeout? Or something more?

2008-09-11 Thread Isaac Huang
On Tue, Sep 09, 2008 at 01:55:46PM +0900, Alex Lee wrote: > I been seeing something that looks like IB timeout errors lately after > upgrading to 1.6.5.1 using the supplied ofed kernel drivers. > .. > Sep 9 00:25:31 lustre-oss-4-1 kernel: LustreError: > 13228:0:(o2iblnd_cb.c:2874:kiblnd_chec

Re: [Lustre-discuss] LNET packets

2008-08-21 Thread Isaac Huang
http://www.mail-archive.com/[EMAIL PROTECTED]/msg00491.html Isaac On Fri, Aug 15, 2008 at 07:16:23AM -0400, Mag Gam wrote: > I am doing a case study at my university and I am trying to analyze > packets for LNET. I want to compare this with other Network based > filesystems, such as NFS and SMB.

Re: [Lustre-discuss] performance issues in simple lustre setup

2008-06-03 Thread Isaac Huang
On Tue, Jun 03, 2008 at 08:32:09AM -0400, Murray Smigel wrote: >Some additional information on the problem. I tried disconnecting the >ethernet connection to >the server machine (192.168.1.94) and tried running a disk test on the >client (192.168.1.156 via ethernet), writing to >

Re: [Lustre-discuss] performance issues in simple lustre setup

2008-06-03 Thread Isaac Huang
Since both the configuration and the IB link bandwidth looked fine, I'd suggest to measure lnet throughput by lnet selftest: 1. On both client and server: modprobe lnet_selftest 2. On the client: export LST_SESSION=$$ lst new_session --timeo 10 test lst add_group s [EMAIL PROTECTED] lst add_gr

  1   2   >