Re: [openib-general] SA cache design

2006-01-16 Thread Sean Hefty
Eitan Zahavi wrote: [EZ] The scalability issues we see today are what I most worry about. I think that we have a couple scalability issues at the core of this problem. I think that a cache can solve part of the problem, but to fully address the issues, we eventually may need to extend our

RE: [openib-general] SA cache design

2006-01-16 Thread Eitan Zahavi
Hi Sean Eitan Zahavi wrote: [EZ] The scalability issues we see today are what I most worry about. One issue that I see is that the CMA, IB CM, and DAPL APIs support only point-to-point connections. Trying to layer a many-to-many connection model over these is leading to the

RE: [openib-general] SA cache design

2006-01-16 Thread Eitan Zahavi
What was I thinking ... for (target = (myRank + 1) % numNodes ; target != myRank; target = (target + 1)% numNodes) { /* establish connection to node target */ } [EZ] I would try and make sure the connections are not done in a manner such that all nodes try to establish connections to a

RE: [openib-general] SA cache design

2006-01-16 Thread Rimmer, Todd
From: Eitan Zahavi [mailto:[EMAIL PROTECTED] What was I thinking ... for (target = (myRank + 1) % numNodes ; target != myRank; target = (target + 1)% numNodes) { /* establish connection to node target */ } This can be even simpler for MPI. Given some nodes must listen and others must

Re: [openib-general] SA cache design

2006-01-16 Thread Sean Hefty
Eitan Zahavi wrote: [EZ] Having N^2 messages is not a big problem if they do not all go one target... CM is distributed and this is good. Only the PathRecord section of the connection establishment is going today to one node (SA) and you are about to fix it... I expect that we'll start

Re: [openib-general] SA cache design

2006-01-12 Thread Brian Long
On Wed, 2006-01-11 at 14:21 -0800, Sean Hefty wrote: Rimmer, Todd wrote: A relational database is overkill for this function. It will also likely be more complex for end users to setup and debug. The cache setup should be simple. The solution should be such that just an on/off switch

Re: [openib-general] SA cache design

2006-01-12 Thread Sean Hefty
Brian Long wrote: How much overhead is going to be incurred by using a standard RDBMS instead of not caching anything? I'm not completely familiar with the IB configurations that would benefit from the proposed SA cache, but it seems to me, adding a RDBMS to anything as fast as IB would

Re: [openib-general] SA cache design

2006-01-12 Thread Brian Long
On Thu, 2006-01-12 at 10:16 -0800, Sean Hefty wrote: Brian Long wrote: How much overhead is going to be incurred by using a standard RDBMS instead of not caching anything? I'm not completely familiar with the IB configurations that would benefit from the proposed SA cache, but it seems

Re: [openib-general] SA cache design

2006-01-12 Thread Sean Hefty
Brian Long wrote: What about SQLite (http://www.sqlite.org/)? This is used by yum 2.4 in Fedora Core and other distributions. SQLite is a small C library that implements a self-contained, embeddable, zero-configuration SQL database engine. Someone else sent me a link to this same site, and

RE: [openib-general] SA cache design

2006-01-12 Thread Eitan Zahavi
Hi Sean, The issue is the number of queries grow by N^2. Only a very small subset of queries is used: * PathRecord by SRC-GUID,DST-GUID * PortInfo by capability mask Not to say the current implementations are perfect. But RDBMS are optimized for other requirements not a simple single key

Re: [openib-general] SA cache design

2006-01-12 Thread Sean Hefty
Eitan Zahavi wrote: The issue is the number of queries grow by N^2. I understand. On a related note, why does every instance of the application need to query for every other instance? To establish all-to-all communication, couldn't instance X only initiate connections to instances X?

RE: [openib-general] SA cache design

2006-01-12 Thread Eitan Zahavi
On a related note, why does every instance of the application need to query for every other instance? To establish all-to-all communication, couldn't instance X only initiate connections to instances X? (I.e. 1 connects to 2 and 3, 2 connects to 3.) [EZ] MPI opens a connection from each

Re: [openib-general] SA cache design

2006-01-12 Thread Sean Hefty
Rimmer, Todd wrote: 1 million entry SA database. This is exactly why I think that the SA needs to be backed by a real DBMS. In contrast the replica on each node only needs to handle O(N) entries. And its lookup time could be O(logN). This is still O(NlogN) operations, which made me look at

Re: [openib-general] SA cache design

2006-01-12 Thread Sean Hefty
Eitan Zahavi wrote: [EZ] MPI opens a connection from each node to every other node. Actually even from every CPU to every other CPU. So this is why we have N^2 connections. I was confusing myself. I think that there are n(n-1)/2 connections, but that's still O(n^2). - Sean

RE: [openib-general] SA cache design

2006-01-12 Thread Rimmer, Todd
From: Sean Hefty [mailto:[EMAIL PROTECTED] why ask the SA the same question multiple times in a row? I have no idea why the application did this. Are any of the queries in this case actually the same? Each MPI process is independent. However they all need to get pathrecords for all

Re: [openib-general] SA cache design

2006-01-12 Thread Grant Grundler
On Thu, Jan 12, 2006 at 11:58:28AM -0800, Sean Hefty wrote: This is still O(NlogN) operations, which made me look at indexing schemes to improve performance. I strongly associate Indexing schemes with judy: http://docs.hp.com/en/B6841-90001/ix01.html The open source project is here:

Re: [openib-general] SA cache design

2006-01-12 Thread Sean Hefty
Rimmer, Todd wrote: Each MPI process is independent. However they all need to get pathrecords for all the other processes/nodes in the system. Hence, each process on a node will make the exact same set of queries. That should still only be P queries per node, with P = number of processes on a

RE: [openib-general] SA cache design

2006-01-12 Thread Rimmer, Todd
From: Sean Hefty [mailto:[EMAIL PROTECTED] Rimmer, Todd wrote: Each MPI process is independent. However they all need to get pathrecords for all the other processes/nodes in the system. Hence, each process on a node will make the exact same set of queries. That should still only be

Re: [openib-general] SA cache design

2006-01-12 Thread Sean Hefty
Rimmer, Todd wrote: While each process could do a GET_TABLE for all path records that would be rather inefficient and would provide 1,000,000 path records in the RMPP response, of which only 500 are of interest. Each process could do a GET_TABLE for only those path records with the SGID set

Re: [openib-general] SA cache design

2006-01-11 Thread James Lentini
On Tue, 10 Jan 2006, Sean Hefty wrote: Grant Grundler wrote: I forgot to point out postgres: http://www.postgresql.org/about/ This looks like it would work well. The question that I have for users is: Is it acceptable for the cache to make use of a relational database system?

Re: [openib-general] SA cache design

2006-01-11 Thread Eitan Zahavi
Hi Sean, Now I really lost you: Is the intention to speed up SA queries? Or is it to have persistent storage of them? I think we should focus on the kind of data to cache, how it is made transparently available to any OpenIB client and how/when is it invalidated by the SM. We should only keep

RE: [openib-general] SA cache design

2006-01-11 Thread Rimmer, Todd
On Tue, 10 Jan 2006, Sean Hefty wrote: Grant Grundler wrote: I forgot to point out postgres: http://www.postgresql.org/about/ This looks like it would work well. The question that I have for users is: Is it acceptable for the cache to make use of a relational database

Re: [openib-general] SA cache design

2006-01-11 Thread Sean Hefty
James Lentini wrote: Will it be possible to use the OpenIB stack without setting up the SA cache? Yes. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit

Re: [openib-general] SA cache design

2006-01-11 Thread Sean Hefty
Eitan Zahavi wrote: Is the intention to speed up SA queries? Or is it to have persistent storage of them? I want both. :) I think we should focus on the kind of data to cache, how it is made transparently available to any OpenIB client and how/when is it invalidated by the SM. We should

RE: [openib-general] SA cache design

2006-01-11 Thread Rimmer, Todd
From: Sean Hefty [mailto:[EMAIL PROTECTED] Eitan Zahavi wrote: Is the intention to speed up SA queries? Or is it to have persistent storage of them? I want both. :) I would clarify that the best bang for the effort will be to focus on the queries which the ULPs themselves will use

Re: [openib-general] SA cache design

2006-01-11 Thread Greg Lindahl
Since no one's really answered this yet: Many sysadmins are not going to want to install a relational database to run an SA cache. So I'd stick to Berkeley DB if I were you. -- greg ___ openib-general mailing list openib-general@openib.org

Re: [openib-general] SA cache design

2006-01-11 Thread Sean Hefty
Greg Lindahl wrote: Since no one's really answered this yet: Many sysadmins are not going to want to install a relational database to run an SA cache. So I'd stick to Berkeley DB if I were you. Thanks for the response. To be clear, the cache would be an optional component, and likely only

Re: [openib-general] SA cache design

2006-01-11 Thread Sean Hefty
Rimmer, Todd wrote: A relational database is overkill for this function. It will also likely be more complex for end users to setup and debug. The cache setup should be simple. The solution should be such that just an on/off switch needs to be configured (with a default of on) for most users to

Re: [openib-general] SA cache design

2006-01-10 Thread Sean Hefty
Sean Hefty wrote: To keep the design as flexible as possible, my plan is to implement the cache in userspace. The interface to the cache would be via MADs. Clients would send their queries to the sa_cache instead of the SA itself. The format of the MADs would be essentially identical to

Re: [openib-general] SA cache design

2006-01-10 Thread Sean Hefty
Grant Grundler wrote: We already have several databases for different things: makedb (primarily for NSS) updatedb (fast lookup of local files) mandb (man pages) rpmdb (yes, even on debian boxes) sasldbconverter2 (for SASL - linux securty/login stuff)

Re: [openib-general] SA cache design

2006-01-10 Thread Grant Grundler
On Tue, Jan 10, 2006 at 03:00:46PM -0800, Sean Hefty wrote: I did find that libdb-4.2 was installed on SuSE and RedHat systems, and a libodbc was on my SuSE system. Libdb-4.2 would help manage some of the SA objects to a file, but is limited in its data storage and retrieval capabilities.

Re: [openib-general] SA cache design

2006-01-10 Thread Sean Hefty
Grant Grundler wrote: I forgot to point out postgres: http://www.postgresql.org/about/ This looks like it would work well. The question that I have for users is: Is it acceptable for the cache to make use of a relational database system? The disadvantage is that a RDMS would need

Re: [openib-general] SA cache design

2006-01-06 Thread Eitan Zahavi
Hi Sean, Please see below. Sean Hefty wrote: * Regarding the sentence:Clients would send their queries to the sa_cache instead of the SA I would propose that a SA MAD send switch be implemented in the core: Such a switch will enable plugging in the SA cache (I would prefer calling it SA

Re: [openib-general] SA cache design

2006-01-06 Thread Eitan Zahavi
Hi Sean, Todd, Although I like the replica idea for its query performance boost - I suspect it will actually do not scale for very large networks: Each node has to query for the entire database would cause N^2 load on the SA. After any change (which do happen with higher probability on large

RE: [openib-general] SA cache design

2006-01-06 Thread Rimmer, Todd
From: Hal Rosenstock [mailto:[EMAIL PROTECTED] On Thu, 2006-01-05 at 18:36, Rimmer, Todd wrote: This of course implies the SA Mux must analyze more than just the attribute ID to determine if the replica can handle the query. But the memory savings is well worth the extra level of

RE: [openib-general] SA cache design

2006-01-06 Thread Hal Rosenstock
On Fri, 2006-01-06 at 09:05, Rimmer, Todd wrote: From: Hal Rosenstock [mailto:[EMAIL PROTECTED] On Thu, 2006-01-05 at 18:36, Rimmer, Todd wrote: This of course implies the SA Mux must analyze more than just the attribute ID to determine if the replica can handle the query. But the

Re: [openib-general] SA cache design

2006-01-06 Thread Eitan Zahavi
Sean Hefty wrote: Eitan Zahavi wrote: So if the cache is on another host - a new kind of MAD will have to be sent on behalf of the original request? I was thinking more in terms of redirection. Today none of the clients support redirection. It would take significant duplicated effort

Re: [openib-general] SA cache design

2006-01-06 Thread Eitan Zahavi
I agree with Todd: a key is to keep the client unaware of the mux existence. So the same client can be run on system without the cache. Hal Rosenstock wrote: On Fri, 2006-01-06 at 09:05, Rimmer, Todd wrote: From: Hal Rosenstock [mailto:[EMAIL PROTECTED] On Thu, 2006-01-05 at 18:36, Rimmer,

Re: [openib-general] SA cache design

2006-01-06 Thread Sean Hefty
Eitan Zahavi wrote: Can someone familiar with the opensm code tell me how difficult it would be to extract out the code that tracks the subnet data and responds to queries? I guess you mean the code that is answering to PathRecord queries? Yes - that along with answering other queries. It

Re: [openib-general] SA cache design

2006-01-06 Thread Hal Rosenstock
On Fri, 2006-01-06 at 14:50, Eitan Zahavi wrote: Hal Rosenstock wrote: On Thu, 2006-01-05 at 18:36, Rimmer, Todd wrote: This of course implies the SA Mux must analyze more than just the attribute ID to determine if the replica can handle the query. But the memory savings is well worth

Re: [openib-general] SA cache design

2006-01-06 Thread Hal Rosenstock
On Fri, 2006-01-06 at 15:00, Eitan Zahavi wrote: I agree with Todd: a key is to keep the client unaware of the mux existence. So the same client can be run on system without the cache. Define same client ? I would consider it the same SA client directing requests differently based on how the

Re: [openib-general] SA cache design

2006-01-06 Thread Eitan Zahavi
Hi Todd, So you agree we will need to design replica buildup scalability features into the solution ( to avoid the bring-up load on the SA) ? Why would a caching system not work here? Instead of replicating the data. The caching concept allows for the SA to still be in the loop by

Re: [openib-general] SA cache design

2006-01-06 Thread Eitan Zahavi
Hal Rosenstock wrote: On Fri, 2006-01-06 at 15:00, Eitan Zahavi wrote: I agree with Todd: a key is to keep the client unaware of the mux existence. So the same client can be run on system without the cache. Define same client ? I would consider it the same SA client directing requests

Re: [openib-general] SA cache design

2006-01-06 Thread Eitan Zahavi
Sean Hefty wrote: Eitan Zahavi wrote: Can someone familiar with the opensm code tell me how difficult it would be to extract out the code that tracks the subnet data and responds to queries? I guess you mean the code that is answering to PathRecord queries? Yes - that along with

Re: [openib-general] SA cache design

2006-01-06 Thread Hal Rosenstock
On Fri, 2006-01-06 at 14:55, Eitan Zahavi wrote: I guess you mean the code that is answering to PathRecord queries? It is possible to extract the SMDB objects and duplicate that database. I am not sure it is such a good idea. What if the SM is not OpenSM? I would view that the database is an

Re: [openib-general] SA cache design

2006-01-06 Thread Sean Hefty
Sean Hefty wrote: - The MAD interface will result in additional data copies and userspace to kernel transitions for clients residing on the local system. - Clients require a mechanism to locate the sa_cache, or need to make assumptions about its location. Based on some comments from people, I

Re: [openib-general] SA cache design

2006-01-06 Thread Sean Hefty
Hal Rosenstock wrote: I would view that the database is an SADB with the actual pathrecords as one example rather than the SMDB from which they are calculated. I think Sean is interested in the SA packet query/response code here so avoid recreating this and that the backend would be stripped

RE: [openib-general] SA cache design

2006-01-06 Thread Eitan Zahavi
Hefty [mailto:[EMAIL PROTECTED] Sent: Friday, January 06, 2006 10:40 PM To: Hal Rosenstock Cc: Eitan Zahavi; openib Subject: Re: [openib-general] SA cache design Hal Rosenstock wrote: I would view that the database is an SADB with the actual pathrecords as one example rather than the SMDB

Re: [openib-general] SA cache design

2006-01-05 Thread Eitan Zahavi
Hi Sean, This is great initiative - tackling an important issue. I am glad you took this on. Please see below. Sean Hefty wrote: I've been given the task of trying to come up with an implementation for an SA cache. The intent is to increase the scalability and performance of the openib

Re: [openib-general] SA cache design

2006-01-05 Thread Hal Rosenstock
Hi Sean, On Tue, 2006-01-03 at 20:15, Sean Hefty wrote: Hal Rosenstock wrote: I've been given the task of trying to come up with an implementation for an SA cache. The intent is to increase the scalability and performance of the openib stack. My current thoughts on the implementation

Re: [openib-general] SA cache design

2006-01-05 Thread Hal Rosenstock
Hi Eitan, On Thu, 2006-01-05 at 07:27, Eitan Zahavi wrote: Hi Sean, This is great initiative - tackling an important issue. I am glad you took this on. Please see below. Sean Hefty wrote: I've been given the task of trying to come up with an implementation for an SA cache. The

RE: [openib-general] SA cache design

2006-01-05 Thread Rimmer, Todd
From: Sean Hefty [mailto:[EMAIL PROTECTED] I've been given the task of trying to come up with an implementation for an SA cache. The intent is to increase the scalability and performance of the openib stack. My current thoughts on the implementation are below. Any feedback is

RE: [openib-general] SA cache design

2006-01-05 Thread Sean Hefty
* Regarding the sentence:Clients would send their queries to the sa_cache instead of the SA I would propose that a SA MAD send switch be implemented in the core: Such a switch will enable plugging in the SA cache (I would prefer calling it SA local agent due to its extended

RE: [openib-general] SA cache design

2006-01-05 Thread Sean Hefty
I hadn't fully figured this out yet. I'm not sure if another MAD class is needed or not. My goal is to implement this as transparent to the application as possible without violating the spec, perhaps appearing as an SA on a different LID. The LID for the (real) SA is determined from

RE: [openib-general] SA cache design

2006-01-05 Thread Sean Hefty
Sean, This is great. This is a feature which I find near and dear and is very important to large fabric scalability. If you look in contrib in the infinicon area, you will see a version of a SA replica which we implemented in the linux_discovery tree. The version in SVN is a little dated, but

RE: [openib-general] SA cache design

2006-01-05 Thread Hal Rosenstock
On Thu, 2006-01-05 at 16:51, Sean Hefty wrote: I agree that this is a problem, but I my preference would be for a dedicated kernel module to handle multicast join/leave requests. In addition to multicast, it's also service records and event subscriptions too. -- Hal

RE: [openib-general] SA cache design

2006-01-05 Thread Sean Hefty
- It is implemented in kernel mode - while user mode may help during initial debug, it will be important for kernel mode ULPs such as SRP, IPoIB and SDP to also make use of these records Your kernel footprint is smaller than I expected, which is good. Note that with a MAD

RE: [openib-general] SA cache design

2006-01-05 Thread Hal Rosenstock
On Thu, 2006-01-05 at 17:04, Sean Hefty wrote: I hadn't fully figured this out yet. I'm not sure if another MAD class is needed or not. My goal is to implement this as transparent to the application as possible without violating the spec, perhaps appearing as an SA on a different LID.

RE: [openib-general] SA cache design

2006-01-05 Thread Hal Rosenstock
On Thu, 2006-01-05 at 18:24, Sean Hefty wrote: For the precise language, see C15-0-1.24 p. 923 IBA 1.2: C15-0.1.24: It shall be possible to determine the location of SA from any endport by sending a GMP to QP1 (the GSI) of the node identified by the endport's PortInfo:MasterSMLID, using

RE: [openib-general] SA cache design

2006-01-05 Thread Rimmer, Todd
From: Sean Hefty [mailto:[EMAIL PROTECTED] Your kernel footprint is smaller than I expected, which is good. The key is that while there are O(N^2) path records in a fabric, only O(N) are of interest to a given node. Hence if you only replicate entries where this node is the source the size

RE: [openib-general] SA cache design

2006-01-05 Thread Sean Hefty
Note that with a MAD interface, kernel modules would still have access to any cached data. I also wanted to stick with usermode to allow saving the cache to disk, so that it would be available immediately after a reboot. (My assumption being that changes to the network topology would be

[openib-general] SA cache design

2006-01-03 Thread Sean Hefty
I've been given the task of trying to come up with an implementation for an SA cache. The intent is to increase the scalability and performance of the openib stack. My current thoughts on the implementation are below. Any feedback is welcome. To keep the design as flexible as possible, my

Re: [openib-general] SA cache design

2006-01-03 Thread Hal Rosenstock
Hi Sean, On Tue, 2006-01-03 at 19:42, Sean Hefty wrote: I've been given the task of trying to come up with an implementation for an SA cache. The intent is to increase the scalability and performance of the openib stack. My current thoughts on the implementation are below. Any

Re: [openib-general] SA cache design

2006-01-03 Thread Sean Hefty
Hal Rosenstock wrote: I've been given the task of trying to come up with an implementation for an SA cache. The intent is to increase the scalability and performance of the openib stack. My current thoughts on the implementation are below. Any feedback is welcome. To keep the design as