Eitan Zahavi wrote:
[EZ] The scalability issues we see today are what I most worry about.
I think that we have a couple scalability issues at the core of this problem. I
think that a cache can solve part of the problem, but to fully address the
issues, we eventually may need to extend our
Hi Sean
Eitan Zahavi wrote:
[EZ] The scalability issues we see today are what I most worry
about.
One issue that I see is that the CMA, IB CM, and DAPL APIs support
only
point-to-point connections. Trying to layer a many-to-many connection
model
over these is leading to the
What was I thinking ...
for (target = (myRank + 1) % numNodes ; target != myRank; target =
(target + 1)% numNodes) { /* establish connection to node target
*/
}
[EZ] I would try and make sure the connections are not done in a
manner
such that all nodes try to establish connections to a
From: Eitan Zahavi [mailto:[EMAIL PROTECTED]
What was I thinking ...
for (target = (myRank + 1) % numNodes ; target != myRank; target =
(target + 1)% numNodes) { /* establish connection to node target
*/
}
This can be even simpler for MPI.
Given some nodes must listen and others must
Eitan Zahavi wrote:
[EZ] Having N^2 messages is not a big problem if they do not all go one
target...
CM is distributed and this is good. Only the PathRecord section of the
connection establishment is going today to one node (SA) and you are
about to fix it...
I expect that we'll start
On Wed, 2006-01-11 at 14:21 -0800, Sean Hefty wrote:
Rimmer, Todd wrote:
A relational database is overkill for this function.
It will also likely be more complex for end users to setup and debug.
The cache setup should be simple. The solution should be such that
just an on/off switch
Brian Long wrote:
How much overhead is going to be incurred by using a standard RDBMS
instead of not caching anything? I'm not completely familiar with the
IB configurations that would benefit from the proposed SA cache, but it
seems to me, adding a RDBMS to anything as fast as IB would
On Thu, 2006-01-12 at 10:16 -0800, Sean Hefty wrote:
Brian Long wrote:
How much overhead is going to be incurred by using a standard RDBMS
instead of not caching anything? I'm not completely familiar with the
IB configurations that would benefit from the proposed SA cache, but it
seems
Brian Long wrote:
What about SQLite (http://www.sqlite.org/)? This is used by yum 2.4 in
Fedora Core and other distributions.
SQLite is a small C library that implements a self-contained,
embeddable, zero-configuration SQL database engine.
Someone else sent me a link to this same site, and
Hi Sean,
The issue is the number of queries grow by N^2.
Only a very small subset of queries is used:
* PathRecord by SRC-GUID,DST-GUID
* PortInfo by capability mask
Not to say the current implementations are perfect.
But RDBMS are optimized for other requirements not a simple single key
Eitan Zahavi wrote:
The issue is the number of queries grow by N^2.
I understand.
On a related note, why does every instance of the application need to query for
every other instance? To establish all-to-all communication, couldn't instance
X only initiate connections to instances X?
On a related note, why does every instance of the application need to
query for
every other instance? To establish all-to-all communication, couldn't
instance
X only initiate connections to instances X? (I.e. 1 connects to 2
and 3, 2
connects to 3.)
[EZ] MPI opens a connection from each
Rimmer, Todd wrote:
1 million entry SA database.
This is exactly why I think that the SA needs to be backed by a real DBMS.
In contrast the replica on each node only needs to handle O(N) entries.
And its lookup time could be O(logN).
This is still O(NlogN) operations, which made me look at
Eitan Zahavi wrote:
[EZ] MPI opens a connection from each node to every other node. Actually
even from every CPU to every other CPU. So this is why we have N^2
connections.
I was confusing myself. I think that there are n(n-1)/2 connections, but that's
still O(n^2).
- Sean
From: Sean Hefty [mailto:[EMAIL PROTECTED]
why ask the SA the same question multiple times in a row?
I have no idea why the application did this. Are any of the
queries in this
case actually the same?
Each MPI process is independent. However they all need to get pathrecords
for all
On Thu, Jan 12, 2006 at 11:58:28AM -0800, Sean Hefty wrote:
This is still O(NlogN) operations, which made me look at indexing schemes
to improve performance.
I strongly associate Indexing schemes with judy:
http://docs.hp.com/en/B6841-90001/ix01.html
The open source project is here:
Rimmer, Todd wrote:
Each MPI process is independent. However they all need to get pathrecords
for all the other processes/nodes in the system.
Hence, each process on a node will make the exact same set of queries.
That should still only be P queries per node, with P = number of processes on a
From: Sean Hefty [mailto:[EMAIL PROTECTED]
Rimmer, Todd wrote:
Each MPI process is independent. However they all need to
get pathrecords
for all the other processes/nodes in the system.
Hence, each process on a node will make the exact same set
of queries.
That should still only be
Rimmer, Todd wrote:
While each process could do a GET_TABLE for all path records that
would be rather inefficient and would provide 1,000,000 path records in
the RMPP response, of which only 500 are of interest.
Each process could do a GET_TABLE for only those path records with the SGID set
On Tue, 10 Jan 2006, Sean Hefty wrote:
Grant Grundler wrote:
I forgot to point out postgres:
http://www.postgresql.org/about/
This looks like it would work well.
The question that I have for users is: Is it acceptable for the
cache to make use of a relational database system?
Hi Sean,
Now I really lost you:
Is the intention to speed up SA queries?
Or is it to have persistent storage of them?
I think we should focus on the kind of data to cache,
how it is made transparently available to any OpenIB client
and how/when is it invalidated by the SM.
We should only keep
On Tue, 10 Jan 2006, Sean Hefty wrote:
Grant Grundler wrote:
I forgot to point out postgres:
http://www.postgresql.org/about/
This looks like it would work well.
The question that I have for users is: Is it acceptable for the
cache to make use of a relational database
James Lentini wrote:
Will it be possible to use the OpenIB stack without setting up the SA
cache?
Yes.
- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit
Eitan Zahavi wrote:
Is the intention to speed up SA queries?
Or is it to have persistent storage of them?
I want both. :)
I think we should focus on the kind of data to cache,
how it is made transparently available to any OpenIB client
and how/when is it invalidated by the SM.
We should
From: Sean Hefty [mailto:[EMAIL PROTECTED]
Eitan Zahavi wrote:
Is the intention to speed up SA queries?
Or is it to have persistent storage of them?
I want both. :)
I would clarify that the best bang for the effort will be to focus on
the queries which the ULPs themselves will use
Since no one's really answered this yet:
Many sysadmins are not going to want to install a relational database
to run an SA cache. So I'd stick to Berkeley DB if I were you.
-- greg
___
openib-general mailing list
openib-general@openib.org
Greg Lindahl wrote:
Since no one's really answered this yet:
Many sysadmins are not going to want to install a relational database
to run an SA cache. So I'd stick to Berkeley DB if I were you.
Thanks for the response. To be clear, the cache would be an optional component,
and likely only
Rimmer, Todd wrote:
A relational database is overkill for this function.
It will also likely be more complex for end users to setup and debug.
The cache setup should be simple. The solution should be such that
just an on/off switch needs to be configured (with a default of on)
for most users to
Sean Hefty wrote:
To keep the design as flexible as possible, my plan is to implement the
cache in userspace. The interface to the cache would be via MADs.
Clients would send their queries to the sa_cache instead of the SA
itself. The format of the MADs would be essentially identical to
Grant Grundler wrote:
We already have several databases for different things:
makedb (primarily for NSS)
updatedb (fast lookup of local files)
mandb (man pages)
rpmdb (yes, even on debian boxes)
sasldbconverter2 (for SASL - linux securty/login stuff)
On Tue, Jan 10, 2006 at 03:00:46PM -0800, Sean Hefty wrote:
I did find that libdb-4.2 was installed on SuSE and RedHat systems, and a
libodbc was on my SuSE system. Libdb-4.2 would help manage some of the SA
objects to a file, but is limited in its data storage and retrieval
capabilities.
Grant Grundler wrote:
I forgot to point out postgres:
http://www.postgresql.org/about/
This looks like it would work well.
The question that I have for users is: Is it acceptable for the cache to make
use of a relational database system?
The disadvantage is that a RDMS would need
Hi Sean,
Please see below.
Sean Hefty wrote:
* Regarding the sentence:Clients would send their queries to the sa_cache
instead of the SA
I would propose that a SA MAD send switch be implemented in the core: Such
a switch
will enable plugging in the SA cache (I would prefer calling it SA
Hi Sean, Todd,
Although I like the replica idea for its query performance boost - I
suspect it will actually do not scale for very large
networks: Each node has to query for the entire database would cause N^2 load
on the SA.
After any change (which do happen with higher probability on large
From: Hal Rosenstock [mailto:[EMAIL PROTECTED]
On Thu, 2006-01-05 at 18:36, Rimmer, Todd wrote:
This of course implies the SA Mux must analyze more than just
the attribute ID to determine if the replica can handle the query.
But the memory savings is well worth the extra level of
On Fri, 2006-01-06 at 09:05, Rimmer, Todd wrote:
From: Hal Rosenstock [mailto:[EMAIL PROTECTED]
On Thu, 2006-01-05 at 18:36, Rimmer, Todd wrote:
This of course implies the SA Mux must analyze more than just
the attribute ID to determine if the replica can handle the query.
But the
Sean Hefty wrote:
Eitan Zahavi wrote:
So if the cache is on another host - a new kind of MAD will have to
be sent on behalf of
the original request?
I was thinking more in terms of redirection.
Today none of the clients support redirection. It would take significant
duplicated effort
I agree with Todd: a key is to keep the client unaware of the mux existence.
So the same client can be run on system without the cache.
Hal Rosenstock wrote:
On Fri, 2006-01-06 at 09:05, Rimmer, Todd wrote:
From: Hal Rosenstock [mailto:[EMAIL PROTECTED]
On Thu, 2006-01-05 at 18:36, Rimmer,
Eitan Zahavi wrote:
Can someone familiar with the opensm code tell me how difficult it
would be to extract out the code that tracks the subnet data and
responds to queries?
I guess you mean the code that is answering to PathRecord queries?
Yes - that along with answering other queries.
It
On Fri, 2006-01-06 at 14:50, Eitan Zahavi wrote:
Hal Rosenstock wrote:
On Thu, 2006-01-05 at 18:36, Rimmer, Todd wrote:
This of course implies the SA Mux must analyze more than just
the attribute ID to determine if the replica can handle the query.
But the memory savings is well worth
On Fri, 2006-01-06 at 15:00, Eitan Zahavi wrote:
I agree with Todd: a key is to keep the client unaware of the mux existence.
So the same client can be run on system without the cache.
Define same client ? I would consider it the same SA client directing
requests differently based on how the
Hi Todd,
So you agree we will need to design replica buildup scalability features into
the solution ( to avoid the bring-up load on the SA) ?
Why would a caching system not work here? Instead of replicating the data.
The caching concept allows for the SA to still be in the loop by
Hal Rosenstock wrote:
On Fri, 2006-01-06 at 15:00, Eitan Zahavi wrote:
I agree with Todd: a key is to keep the client unaware of the mux existence.
So the same client can be run on system without the cache.
Define same client ? I would consider it the same SA client directing
requests
Sean Hefty wrote:
Eitan Zahavi wrote:
Can someone familiar with the opensm code tell me how difficult it
would be to extract out the code that tracks the subnet data and
responds to queries?
I guess you mean the code that is answering to PathRecord queries?
Yes - that along with
On Fri, 2006-01-06 at 14:55, Eitan Zahavi wrote:
I guess you mean the code that is answering to PathRecord queries?
It is possible to extract the SMDB objects and duplicate that database.
I am not sure it is such a good idea. What if the SM is not OpenSM?
I would view that the database is an
Sean Hefty wrote:
- The MAD interface will result in additional data copies and userspace
to kernel transitions for clients residing on the local system.
- Clients require a mechanism to locate the sa_cache, or need to make
assumptions about its location.
Based on some comments from people, I
Hal Rosenstock wrote:
I would view that the database is an SADB with the actual pathrecords as
one example rather than the SMDB from which they are calculated. I think
Sean is interested in the SA packet query/response code here so avoid
recreating this and that the backend would be stripped
Hefty [mailto:[EMAIL PROTECTED]
Sent: Friday, January 06, 2006 10:40 PM
To: Hal Rosenstock
Cc: Eitan Zahavi; openib
Subject: Re: [openib-general] SA cache design
Hal Rosenstock wrote:
I would view that the database is an SADB with the actual
pathrecords as
one example rather than the SMDB
Hi Sean,
This is great initiative - tackling an important issue.
I am glad you took this on.
Please see below.
Sean Hefty wrote:
I've been given the task of trying to come up with an implementation for
an SA cache. The intent is to increase the scalability and performance
of the openib
Hi Sean,
On Tue, 2006-01-03 at 20:15, Sean Hefty wrote:
Hal Rosenstock wrote:
I've been given the task of trying to come up with an implementation for an
SA
cache. The intent is to increase the scalability and performance of the
openib
stack. My current thoughts on the implementation
Hi Eitan,
On Thu, 2006-01-05 at 07:27, Eitan Zahavi wrote:
Hi Sean,
This is great initiative - tackling an important issue.
I am glad you took this on.
Please see below.
Sean Hefty wrote:
I've been given the task of trying to come up with an implementation for
an SA cache. The
From: Sean Hefty [mailto:[EMAIL PROTECTED]
I've been given the task of trying to come up with an
implementation for an SA
cache. The intent is to increase the scalability and
performance of the openib
stack. My current thoughts on the implementation are below.
Any feedback is
* Regarding the sentence:Clients would send their queries to the sa_cache
instead of the SA
I would propose that a SA MAD send switch be implemented in the core: Such
a switch
will enable plugging in the SA cache (I would prefer calling it SA local
agent due to
its extended
I hadn't fully figured this out yet. I'm not sure if another MAD class is
needed or not. My goal is to implement this as transparent to the
application
as possible without violating the spec, perhaps appearing as an SA on a
different LID.
The LID for the (real) SA is determined from
Sean, This is great. This is a feature which I find near and dear and is very
important to large fabric scalability. If you look in contrib in the infinicon
area, you will see a version of a SA replica which we implemented in the
linux_discovery tree. The version in SVN is a little dated, but
On Thu, 2006-01-05 at 16:51, Sean Hefty wrote:
I agree that this is a problem, but I my preference would be for a dedicated
kernel module to handle multicast join/leave requests.
In addition to multicast, it's also service records and event
subscriptions too.
-- Hal
- It is implemented in kernel mode
- while user mode may help during initial debug, it will be important
for
kernel mode ULPs such as SRP, IPoIB and SDP to also make use of
these records
Your kernel footprint is smaller than I expected, which is good. Note that with
a MAD
On Thu, 2006-01-05 at 17:04, Sean Hefty wrote:
I hadn't fully figured this out yet. I'm not sure if another MAD class is
needed or not. My goal is to implement this as transparent to the
application
as possible without violating the spec, perhaps appearing as an SA on a
different LID.
On Thu, 2006-01-05 at 18:24, Sean Hefty wrote:
For the precise language, see C15-0-1.24 p. 923 IBA 1.2:
C15-0.1.24: It shall be possible to determine the location of SA from
any
endport by sending a GMP to QP1 (the GSI) of the node identified by the
endport's PortInfo:MasterSMLID, using
From: Sean Hefty [mailto:[EMAIL PROTECTED]
Your kernel footprint is smaller than I expected, which is
good.
The key is that while there are O(N^2) path records in a fabric, only O(N) are
of interest to a given node. Hence if you only replicate entries where this
node is the source the size
Note that with
a MAD interface, kernel modules would still have access to
any cached data. I
also wanted to stick with usermode to allow saving the cache
to disk, so that it
would be available immediately after a reboot. (My
assumption being that
changes to the network topology would be
I've been given the task of trying to come up with an implementation for an SA
cache. The intent is to increase the scalability and performance of the openib
stack. My current thoughts on the implementation are below. Any feedback is
welcome.
To keep the design as flexible as possible, my
Hi Sean,
On Tue, 2006-01-03 at 19:42, Sean Hefty wrote:
I've been given the task of trying to come up with an implementation for an
SA
cache. The intent is to increase the scalability and performance of the
openib
stack. My current thoughts on the implementation are below. Any
Hal Rosenstock wrote:
I've been given the task of trying to come up with an implementation for an SA
cache. The intent is to increase the scalability and performance of the openib
stack. My current thoughts on the implementation are below. Any feedback is
welcome.
To keep the design as
64 matches
Mail list logo