https://bugs.openfabrics.org/show_bug.cgi?id=420
Summary: PKey table reordering caused by SM failover stops ipoib
traffic
Product: OpenFabrics Linux
Version: 1.2alpha1
Platform: All
OS/Version: All
Status: NEW
Hal Rosenstock wrote:
> On Tue, 2007-03-06 at 12:51, Jason Gunthorpe wrote:
>> On Tue, Mar 06, 2007 at 06:03:16PM +0200, Yevgeny Kliteynik wrote:
>>> Hi Hal.
>>>
>>> Converting the the C++ code to C.
>> This is actually valid C99 code. This is the method that ISO
>> standardized in C99 to do dynam
Ishai,
I see several new files in /usr/local/ofed/sbin in OFED 1.2 pre-beta,
what are they for?
/usr/local/ofed/sbin/execute_multipath_or_kpartx.sh
/usr/local/ofed/sbin/srp_dm_multipath_daemon
/usr/local/ofed/sbin/srp_dnotify
/usr/local/ofed/sbin/srp_post_multipath
/usr/local/ofed/sbin/start_s
> Quoting Roland Dreier <[EMAIL PROTECTED]>:
> Subject: Re: [PATCHv3 for-2.6.21] IB/mthca: fix race in QP destroy
>
> > Yes but here we also must make sure completion events and async events are
> > flushed out: once QP is in reset no events should be generated.
>
> Completion events are fine -
On Tue, Mar 06, 2007 at 08:00:23PM -0800, Shirley Ma wrote:
> Comparing to the IPoIB accessibility(let IPv4 working) with playing trick to
> carrier on to avoid IPv6 link local DAD in a small possibility, this patch is
> a
> better choice for switches with limited MGCs resouce today in a large c
>So, yes, currently, IPoIB is broken in that DAD for new addresses is
>not synchronized to the SM join. But, DAD for startup addresses is
>OK due to the trick that is played with carrier. Your patch breaks
>both equally :>
As you descripted, DAD is not some mechanism to prevent duplicated add
What do you think of something like this, plus merging the async event
and command interface EQs?
diff --git a/drivers/infiniband/hw/mthca/mthca_cq.c
b/drivers/infiniband/hw/mthca/mthca_cq.c
index efd79ef..6e247e2 100644
--- a/drivers/infiniband/hw/mthca/mthca_cq.c
+++ b/drivers/infiniband/hw/mth
Jason,
The whole purpose of this patch is trying to address network
accessbility when MLIDs have limitation in the fabrics. We have a customer
hit this problem in a large cluster. Basically the IPv4 doesn't work at all
(since the interface can't up and running becacuse of IPv6 soliciate
Jason,
>So, yes, currently, IPoIB is broken in that DAD for new addresses is
>not synchronized to the SM join. But, DAD for startup addresses is
>OK due to the trick that is played with carrier. Your patch breaks
>both equally :>
My patch doesn't break DAD that's I am tring to explain. The p
On Tue, Mar 06, 2007 at 04:22:45PM -0800, Shirley Ma wrote:
> So we could have same IPv6 addresses even without IPoIB if the NS doesn't
> respond on time for any reason, right?
Right. An example would be if you connect two ethernet networks
together that had duplicate addresses. The startup DAD m
BTW, I have tested IPv4 and IPv6 DAD, duplicate address doesn't prevent the
interface from UP and RUNNING for ethernet. But this is not the recent
kernel.
Thanks
Shirley Ma___
general mailing list
general@lists.openfabrics.org
http://lists.openfabric
Jason Gunthorpe <[EMAIL PROTECTED]> wrote on 03/06/2007
02:17:40 PM:
> On Tue, Mar 06, 2007 at 12:52:27PM -0800, Shirley Ma wrote:
>
> > The IPv6 stack will generate ND and router solicitation messages
> > when sending packet. The duplicated address can be detected
> > anytime. Am I right? So
https://bugs.openfabrics.org/show_bug.cgi?id=417
[EMAIL PROTECTED] changed:
What|Removed |Added
Summary|can't unload drivers on |can't unload OFED 1.2
https://bugs.openfabrics.org/show_bug.cgi?id=417
Summary: can't unload drivers on SLES10 x86_64
Product: OpenFabrics Linux
Version: 1.2
Platform: X86-64
OS/Version: SLES 10
Status: NEW
Severity: major
Priority: P2
I've been testing OFED-1.2-20070306-0807 today, and here's a current
list of bugs I'd like fixed for OFED 1.2 beta.
bug_idassigned_toshort_desc
397 [EMAIL PROTECTED] OFED 1.2 alpha1 Open MPI "InfiniBand
retry count" errors
381 [EMAIL
>Really?
No - not really. I skipped the git push step... It should be there now.
- Sean
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.or
On Tue, Mar 06, 2007 at 12:52:27PM -0800, Shirley Ma wrote:
> The IPv6 stack will generate ND and router solicitation messages
> when sending packet. The duplicated address can be detected
> anytime. Am I right? So if the multicast join completion later, the
> duplication address will be detect la
Roland Dreier <[EMAIL PROTECTED]> wrote on 03/06/2007 12:20:55 PM:
> > I believe even IPv6 with ethernet, the interface will be UP and
RUNNING
> > even they have a duplicated IPv6 address so IPv4 can work. I don't
know why
> > we do thing differently here.
>
> That's not the point. The poi
rdma_cm: initialize rdma_bind_list in cma_alloc_any_port
I applied this one at least...
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.o
> Roland, I added another minor fix to my git tree:
Really?
git ls-remote git://git.openfabrics.org/~shefty/rdma-dev.git
shows the head of for-roland is
53ecd8bead5d6d9a28f26447f309b636aa361c82refs/heads/for-roland
and
git show --pretty=short -M 53ecd8b
commit 53ecd8bead5d6d9a28f2644
Roland, I added another minor fix to my git tree:
git://git.openfabrics.org/~shefty/rdma-dev.git for-roland
rdma_ucm: avoid sending reject if backlog is full
I don't have any other pending changes for 2.6.21, but we're continuing
scalability testing. This fix is not critical, bu
> I believe even IPv6 with ethernet, the interface will be UP and RUNNING
> even they have a duplicated IPv6 address so IPv4 can work. I don't know why
> we do thing differently here.
That's not the point. The point is that if we bring the interface up
before the multicast groups are joined, t
I believe even IPv6 with ethernet, the interface will be UP and RUNNING
even they have a duplicated IPv6 address so IPv4 can work. I don't know why
we do thing differently here.
Thanks
Shirley Ma___
general mailing list
general@lists.openfabrics.org
Jason Gunthorpe <[EMAIL PROTECTED]> wrote on 03/03/2007
02:37:02 PM:
> On Thu, Mar 01, 2007 at 05:04:43PM -0800, Shirley Ma wrote:
>
> > IPv6 ND doesn't prevent the duplicate IPv6 link-local address in
> the network.
> > It only saves a warning in /var/log/messages to indicate that
thisaddress
while looking at some ipoib performance i had a chance to graph the
tcp flow in xplot (see http://www.xplot.org/). the graph appears very
strange and is attached to this message.
the lower solid line represent acks coming back from the tcp server, the
up line represent the window size (i disabled
> Yes but here we also must make sure completion events and async events are
> flushed out: once QP is in reset no events should be generated.
Completion events are fine -- at worst the consumer gets a spurious
event but doesn't find any CQEs. I see the point about async events.
It does make se
Is there a list? They don't seem to be reading this one ...
Quoting Jeff Squyres <[EMAIL PROTECTED]>:
Subject: Re: Re: OFA web page needs updating
All I'm saying is that the promoters group is the "owner" of the web
site. Before you touch it, you should coordinate with them.
On Mar 6, 2007,
> Quoting Roland Dreier <[EMAIL PROTECTED]>:
> Subject: Re: [PATCHv3 for-2.6.21] IB/mthca: fix race in QP destroy
>
> > With current code, when we destroy a QP, we remove it from table first,
> > and move QP to reset. This is clearly wrong, and this patch fixes this.
>
> I guess so, but it stil
All I'm saying is that the promoters group is the "owner" of the web
site. Before you touch it, you should coordinate with them.
On Mar 6, 2007, at 2:39 PM, Michael S. Tsirkin wrote:
Well, for now, I can't imagine fixing a couple of typos
will be much of a problem.
Quoting Jeff Squyres <[E
> With current code, when we destroy a QP, we remove it from table first,
> and move QP to reset. This is clearly wrong, and this patch fixes this.
I guess so, but it still leaves some other obvious races. First, the
QP is removed from the table before its CQEs are cleaned -- to fix
this, we sh
Thanks, applied.
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Thanks, applied.
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> Quoting Moni Shoua <[EMAIL PROTECTED]>:
> Subject: Re: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support
> to IPoIB
>
> Michael S. Tsirkin wrote:
> +if (ipoib_at_exit)
> +nn->neighbour->parms->neigh_destructor
> >>
Well, for now, I can't imagine fixing a couple of typos
will be much of a problem.
Quoting Jeff Squyres <[EMAIL PROTECTED]>:
Subject: Re: [ofa-general] Re: OFA web page needs updating
www.openfabrics.org has been on the new server for quite a long time.
Be aware that the promoters group runs the
www.openfabrics.org has been on the new server for quite a long time.
Be aware that the promoters group runs the main content on
www.openfabrics.org and they're just about to launch a new version of
the site (I heard 2nd hand -- I'm not part of the promoters group).
On Mar 6, 2007, at 2:00
Moni Shoua wrote:
Andrew Friedley wrote:
Moni Shoua wrote:
Andrew Friedley wrote:
The chelsio build errors from yesterday appear to be gone, though now
I'm seeing errors building the IB bonding code with the 3/2 alpha
tarball -- error below. I'm wondering, is there a way to selectively
avoid
> Quoting Sean Hefty <[EMAIL PROTECTED]>:
> Subject: OFA web page needs updating
>
> Can we get the ofa web page updated? Specifically:
>
> Development Tools - link to git rather than svn
> Contact - update the mailing list information, including link to archives
> Downloads - link to developer
> Quoting Hal Rosenstock <[EMAIL PROTECTED]>:
> Subject: Re: [PATCH] osm: Converting the the C++ code to C in osm_ucast_lash.c
>
> On Tue, 2007-03-06 at 12:51, Jason Gunthorpe wrote:
> > On Tue, Mar 06, 2007 at 06:03:16PM +0200, Yevgeny Kliteynik wrote:
> > > Hi Hal.
> > >
> > > Converting the th
> Hi Hal.
>
> Converting the the C++ code to C.
>
> Please apply both to trunk and to 1.2
>
> Thanks.
>
> Signed-off-by: Yevgeny Kliteynik <[EMAIL PROTECTED]>
NAK.
1. I don't see any C++ here.
2. Why do we need this on ofed branch?
Only bugfixes should go there. What bug does it fix?
> --
> Quoting Moni Shoua <[EMAIL PROTECTED]>:
> Subject: Re: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support
> to IPoIB
>
> Michael S. Tsirkin wrote:
> +if (ipoib_at_exit)
> +nn->neighbour->parms->neigh_destructor
> >>
See
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git
ipoib-napi
for the NAPI work.
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit
> Quoting Moni Shoua <[EMAIL PROTECTED]>:
> Subject: Re: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support
> to IPoIB
>
> Michael S. Tsirkin wrote:
> >> Quoting Moni Shoua <[EMAIL PROTECTED]>:
> >> Subject: Re: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding
> >> support to
> Quoting Steve Wise <[EMAIL PROTECTED]>:
> Subject: [PATCH ofed_1_2] perftest: asprintf usage error in rdma_bw.c
>
>
> asprintf usage error in rdma_bw.c
>
> Signed-off-by: Steve Wise <[EMAIL PROTECTED]>
Applied to both ofed_1_2 and master in perftests.
Vlad, pls pull.
--
MST
On Tue, 2007-03-06 at 12:51, Jason Gunthorpe wrote:
> On Tue, Mar 06, 2007 at 06:03:16PM +0200, Yevgeny Kliteynik wrote:
> > Hi Hal.
> >
> > Converting the the C++ code to C.
>
> This is actually valid C99 code. This is the method that ISO
> standardized in C99 to do dynamic stack allocations (al
Can we get the ofa web page updated? Specifically:
Development Tools - link to git rather than svn
Contact - update the mailing list information, including link to archives
Downloads - link to developer public_html download areas
- Sean
___
general mai
https://bugs.openfabrics.org/show_bug.cgi?id=316
[EMAIL PROTECTED] changed:
What|Removed |Added
AssignedTo|[EMAIL PROTECTED] |[EMAIL PROTECTED]
--
Configure bugma
+#define DRV_NAME"ib_cm"
+#define PFX DRV_NAME ": "
Just define PFX.
+
+/*
+ * Limit CM msg timeouts to something reasonable.
+ * 8 seconds, with up to 15 retries, gives per msg timeout of 2 min.
+ */
+#define IB_CM_MAX_TIMEOUT 21
Thinking out loud... maybe we should make
On Tue, Mar 06, 2007 at 06:03:16PM +0200, Yevgeny Kliteynik wrote:
> Hi Hal.
>
> Converting the the C++ code to C.
This is actually valid C99 code. This is the method that ISO
standardized in C99 to do dynamic stack allocations (alloca is not
an ISO C function).
Since it is now 2007 is there rea
Michael S. Tsirkin wrote:
+ if (ipoib_at_exit)
+ nn->neighbour->parms->neigh_destructor = NULL;
>>> Is it safe to do this without locking?
>>> Could the destructor be in progress when we do this?
>> I think you're right. Maybe I need to attack the
Michael S. Tsirkin wrote:
>> Quoting Moni Shoua <[EMAIL PROTECTED]>:
>> Subject: Re: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support
>> to IPoIB
>>
>> Michael S. Tsirkin wrote:
Quoting Moni Shoua <[EMAIL PROTECTED]>:
Subject: Re: [openib-general] [RFC] [PATCH v2] IB/ipoib
I made the following change to infiniband/hw/ehca/ehca_mrmw.c to have
ibv_reg_mr return -ENOMEM instead of -EINVAL.. shouldn't we define
and document what errno codes ibv_reg_mr is expected to return so
that applications have some idea if there is a permanent failure and
they need to exit,
asprintf usage error in rdma_bw.c
Signed-off-by: Steve Wise <[EMAIL PROTECTED]>
---
rdma_bw.c |2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/rdma_bw.c b/rdma_bw.c
index e55b82d..28cee43 100644
--- a/rdma_bw.c
+++ b/rdma_bw.c
@@ -327,7 +327,7 @@ static struct pingpong_
https://bugs.openfabrics.org/show_bug.cgi?id=316
[EMAIL PROTECTED] changed:
What|Removed |Added
CC||[EMAIL PROTECTED]
--- Comment #
Hi Hal.
Converting the the C++ code to C.
Please apply both to trunk and to 1.2
Thanks.
Signed-off-by: Yevgeny Kliteynik <[EMAIL PROTECTED]>
---
osm/opensm/osm_ucast_lash.c | 23 ---
1 files changed, 16 insertions(+), 7 deletions(-)
diff --git a/osm/opensm/osm_ucast_lash
Hi,
There are several blocker/critical bugs for the beta release (If you are
in the "To" section its mean you are an owner of one of these bugs)
I updated the bugzilla with their priority - owners please fix or update
bug status (you can change also priority if you think that it's not
blocker/
On Mar 6, 2007, at 3:16 AM, Guy German wrote:
I am running kernel 2.6.19.2, and I'm in the RDMA group, and can open
/dev/infiniband/uverbs0 as a user, but can't register memory as a
user.
try increasing your "max locked memory" limitation.
you can try setting as root "ulimit -l unlimited" a
> Quoting Moni Levy <[EMAIL PROTECTED]>:
> Subject: Re: [RFC] IB/ipoib: Asynchronous events delivered without port
> parameter.
>
> On 3/6/07, Michael S. Tsirkin <[EMAIL PROTECTED]> wrote:
> > Can you open a bugzilla issue pls?
>
> Bug 413 was opened to track that issue.
Thanks.
--
MST
__
https://bugs.openfabrics.org/show_bug.cgi?id=413
[EMAIL PROTECTED] changed:
What|Removed |Added
AssignedTo|[EMAIL PROTECTED] |[EMAIL PROTECTED]
--
Configure bugma
https://bugs.openfabrics.org/show_bug.cgi?id=390
[EMAIL PROTECTED] changed:
What|Removed |Added
AssignedTo|[EMAIL PROTECTED] |[EMAIL PROTECTED]
Status|REO
On 3/6/07, Michael S. Tsirkin <[EMAIL PROTECTED]> wrote:
Can you open a bugzilla issue pls?
Bug 413 was opened to track that issue.
-- Moni
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/g
"Erez Strauss" <[EMAIL PROTECTED]> wrote on 03/06/2007 12:51:52 AM:
> Hi Bernie,
>
> Thank you for your reply.
>
> In this case I?m using an application which is using very small
> messages (300Bytes) and I can not change it.
> The focus of the testing is to reduce the latency to minimum during
https://bugs.openfabrics.org/show_bug.cgi?id=413
Summary: IPoIB passes async events to an unrelated devices.
Product: OpenFabrics Linux
Version: 1.2alpha1
Platform: All
OS/Version: All
Status: NEW
Severity: normal
P
Can you open a bugzilla issue pls?
Quoting Moni Levy <[EMAIL PROTECTED]>:
Subject: Re: [RFC] IB/ipoib: Asynchronous events delivered without port
parameter.
Michael, can you please add that patch to OFED 1.2.
Thanks,
Moni
On 2/28/07, Moni Levy <[EMAIL PROTECTED]> wrote:
> -- Forwarded
Roland
On 3/4/07, Moni Levy <[EMAIL PROTECTED]> wrote:
On 3/1/07, Michael S. Tsirkin <[EMAIL PROTECTED]> wrote:
> > SM reconfiguration or failover possibly causes a shuffling of the values in
the port pkey table. The current implementation only queries for the index of the
pkey once, when it cr
Hi Arlin,
Please check https://bugs.openfabrics.org/show_bug.cgi?id=408 :
I have 32 bit server (Intel(R) Xeon(TM)) with SLES10.
Linux sw050 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 18:25:39 UTC 2006 i686 i686 i386
GNU/Linux
I got the following compilation error:
Is dapltest supported on this system?
M
On Mon, 2007-03-05 at 17:34 -0600, Steve Wise wrote:
> Start ep timer on a MPA reject.
>
> If the consumer rejects the connection we end up under-referencing the
> endpoint structure. The fix is to call iwch_ep_disconnect() instead of
> the low level disconnect functions so that the endpoint clos
This email was generated automatically, please do not reply
Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.12
Passed on i686 with linux-
> I am running kernel 2.6.19.2, and I'm in the RDMA group, and can open
> /dev/infiniband/uverbs0 as a user, but can't register memory as a user.
try increasing your "max locked memory" limitation.
you can try setting as root "ulimit -l unlimited" and switch user.
if it works for you - set in /etc
Hi Roland, Sean,
How about the attached fix to Sean patch?
Ishai
Roland Dreier wrote:
This all looks rather fishy:
> +/*
> + * Limit CM msg timeouts to something reasonable.
> + * 8 seconds, with up to 15 retries, gives per msg timeout of 2 min.
> + */
> +#define IB_CM_MAX_TIMEOUT 21
OK...
Tang, Changqing wrote:
I want to understand what is the exact fearure you need.
I want our MPI code can survive from connection loss, or peer
process/machine crash. This process can detect any IB error, and then
clean that connection, use healthy connections only, and possibly make
new connect
70 matches
Mail list logo