>
> I managed to find the above missing files under the gen1 branch. I am
> not quite sure about installing packages from gen1 and gen2 on the same
> machine. Should I install IBGOLD instead of OFED ?
>
srp target sw only run on gen1 (ie. IBGD stack)
and yes you need to install/run IBGD stack
>>*srp target* is still on gen1 code base - IBGD
>>
>>*nfs-rdma server* is on gen2 code base
>
>
> Any chance the MTD2000 runs openfiler?
>
We have never installed openfiler. You can try
-vu
___
openib-general mailing list
openib-general@openib.o
Tomoaki,
>
> Can anybody tell me about the mellanox "SRP target" implementation code which
> is included in MTD2000 with NFS-RDMA server ?
> Is this gen2 base ?
>
*srp target* is still on gen1 code base - IBGD
*nfs-rdma server* is on gen2 code base
___
Vu Pham wrote:
> Michael S. Tsirkin wrote:
>
>>Quoting r. Vu Pham <[EMAIL PROTECTED]>:
>>
>>
>>>Either you can use multiple channels or derive different
>>>initiator_port_ID in the login req to have multiple paths on
>>>the same physical
Michael S. Tsirkin wrote:
> Quoting r. Vu Pham <[EMAIL PROTECTED]>:
>
>>Either you can use multiple channels or derive different
>>initiator_port_ID in the login req to have multiple paths on
>>the same physical port
>
>
> So how about we just stick a p
Michael S. Tsirkin wrote:
> Quoting r. Vu Pham <[EMAIL PROTECTED]>:
>
>>Most of srp targets that I tested don't support multiple
>>channels.
>
>
> Which are these?
Mellanox referenced srp target, Texas Memory System's SSD,
Engenio.
> And what
Ishai Rabinovitz wrote:
> Hi Roland,
>
> SRP High Availability needs an initiator to connect to the same target
> several times, e.g., once from each IB port of the target (this way we can use
> device mapper multipath for failover). Note that both connections are actually
> active, e.g. multipat
>
> By default, we found (via stats from the DDN) that we were only seeing reads
> and writes in the 0-32Kbyte range. Comparing IBGold and OFED, we found that
> the srp_sg_tablesize defaulted to 256, but in OFED it defaulted to 12. So,
> changing this (via modprobe.conf) to 256 in OFED, we were
Hi Paul,
You can load ib_srp module with module param
max_xfer_sectors_per_io=<512, 1024, 2048> to support 256KB,
512KB and 1M direct IOs
Adding the following line "options ib_srp
max_xfer_sectors_per_io=1024 to /etc/modprobe.conf
Vu
> Hi,
> I was running some performance tests to an srp ta
Michael,
> +static const u8 mellanox_oui[3] = { 0x02, 0xc9, 0x02 };
Should it be {0x00, 0x02, 0xc9}?
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openi
Roland Dreier wrote:
> > > Why does userspace need to be able to disconnect a connection?
>
> > There are two options on who will initiate the disconnection: the userspace
> > daemon or the ib_srp module. I considered both options and I was not sure
> > which one is better. I choose to do it
Michael S. Tsirkin wrote:
> Quoting r. Vu Pham <[EMAIL PROTECTED]>:
>>> Right now this workaround affects all targets unconditionally.
>>>
>> Can we rework the patch to have mellanox_workarounds=0 by
>> default?
>
> Hmm ... since this is a data cor
Roland Dreier wrote:
> BTW -- how are you seeing this corruption?
Host machine: EM64T cpu, RHEL4 u1/2/3 x86 32-bit version
Create/mount ext2 or ext3 file system on srp devices using
1K block size (ie. mkfs -t ext2 -b 1024). Copy file, umount,
remount and compare to the original file - looping
Roland Dreier wrote:
> > Until there's a better understanding of the issue, I've come up with the
> > following simple patch that will use indirect mode for this case.
>
> Is it possible to check the target's OUI to detect Mellanox targets?
Mellanox target ioc_guid is derived from node_guid;
@@ -1914,6 +1920,11 @@ static int __init srp_init_module(void)
{
int ret;
Thanks, should we do a check and put some cap on
srp_sg_tablesize value ie.
+ srp_sg_tablesize = max(1, srp_sg_tablesize);
+ srp_sg_tablesize = min(srp_sg_tablesize,
SRP_MAX_SG_TABLESIZE);
+
BTW, does Mellanox (or anyone else) have any numbers showing that
using FMRs makes any difference in performance on a semi-realistic benchmark?
I'm using xdd to test the performance
www.ioperformance.com/products.htm
The target is Mellanox srp target reference implemenation
with 14 SATA
Hi Roland,
This patch:
+ introduces srp_sg_tablesize as module parameter - default value is 16
+ adjusts SRP_MAX_IU_LEN, SRP_MAX_INDIRECT from srp_sg_tablesize
Signed-off-by: Vu Pham <[EMAIL PROTECTED]>
Index: infiniband/ulp/srp/ib
Patch to throttle command per lun when adding target.
Signed-off-by: Vu Pham <[EMAIL PROTECTED]>
Index: infiniband/ulp/srp/ib_srp.c
===
--- infiniband/ulp/srp/ib_srp.c (revision 7075)
+++ infiniband/ulp/srp/ib_srp.c (workin
Roland Dreier wrote:
I finally looked this over.
First, this should be two patches: making srp_sg_tablesize tunable
should be a separate change from making it possible to specify
max_cmd_per_lun for a target.
OK, I'll break it to two patches
The srp_sg_tablesize change makes the default num
Roland Dreier wrote:
BTW, does Mellanox (or anyone else) have any numbers showing that
using FMRs makes any difference in performance on a semi-realistic benchmark?
I'm using xdd to test the performance
www.ioperformance.com/products.htm
The target is Mellanox srp target reference implemena
Roland Dreier wrote:
Vu> This fmr patch does not work for ia64 system because this
Vu> fmr_page_mask is defined as unsigned int.
Great catch!
Vu> We should type cast it to u64 or define it as unsigned long
Casting it won't help because it will just get zero-extended. I think
we ne
+ dma_pages[page_cnt++] =
+ (sg_dma_address(&scat[i]) & dev->fmr_page_mask)
+ j;
+
This fmr patch does not work for ia64 system because this
dev->fmr_page_mask is defined as unsigned int.
We should type cast it to u64 or define it as uns
Roland Dreier wrote:
Vu> Have you read scsi_eh_try_stu(scmnd) and scsi_eh_tur(scmnd)?
Vu> These functions use the same scmnd and reformat it with new
Vu> cdb and call srp_queuecommand() which uses new req and put
Vu> this new req in request queue for this same scmnd with
Vu> d
Roland Dreier wrote:
Vu> scsi_eh_try_stu or scsi_eh_try_tur get timeout, scsi midlayer
Vu> tried to abort stu or tur command as well. Since we delay to
Vu> clean in srp_reset_device(), srp's request queue is still not
Vu> empty. This stu or tur command is freed by scsi midlayer. T
Roland Dreier wrote:
> 1st scsi_try_host_reset() --> srp_host_reset() -->
> srp_reconnect_target() return SUCCESS. Then scsi_eh_try_stu() or
> scsi_eh_tur() is called right after
>
> scsi_eh_try_stu or scsi_eh_tur --> scsi_send_eh_cmnd() -->
> srp_queuecommand()
But after srp_reconnect_t
So after srp_reconnect_target() returns, SRP has no requests in its
queue. The only way that a command could be put in the queue is if
the SCSI midlayer passes it back into the queuecommand functions.
Yes this is exactly what happening.
static int scsi_eh_host_reset(struct list_head *work_
apply on top of your patch will fix the problem.
IB/srp: Fix tracking of pending requests during error handling
Signed-off-by: Vu Pham <[EMAIL PROTECTED]>
diff -Naur infiniband/ulp/srp.roland-eh/ib_srp.c infiniband/ulp/srp/ib_srp.c
--- infiniband/ulp/srp.roland-eh/ib_srp.c 2006-05-
Roland Dreier wrote:
> reading scsi_error.c again, I find this logic for our case (please
> correct me if I'm wrong)
> 1. eh_abort_handler and eh_device_reset_handler fail with timeout;
> eh_host_reset_handler successes
But you're crashing inside the call to srp_reconnect_target() in
srp_res
reading scsi_error.c again, I find this logic for our case (please
correct me if I'm wrong)
1. eh_abort_handler and eh_device_reset_handler fail with timeout;
eh_host_reset_handler successes
2. scsi_eh_host_reset goes on with scsi_eh_try_stu & scsi_eh_tur
3. either scsi_eh_try_stu or scsi_eh
Roland Dreier wrote:
> 1. srp_unmap_data() and srp_remove_req() for .eh_abort_handler(scmnd)
> a. abort get timeout or
> b. req->cmd_done or
> c. !req->tsk_status
> 2. we should do step (1) for .eh_abort_handler(scmnd) only and don't
> do step 1 for .eh_device_reset_handler(scmn
Scott Weitzenkamp (sweitzen) wrote:
I see mention of an SRP daemon coming in 1.0 RC4, does this daemon
handle target addition/removal?
Yes
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general
T
Thanks. Can you explain what the bug causing the crash is? I'd like
to understand the "why" of this patch.
1. srp_unmap_data() and srp_remove_req() for
.eh_abort_handler(scmnd)
a. abort get timeout or
b. req->cmd_done or
c. !req->tsk_status
2. we should do step (1) for .eh_abort_
With the following patch applied my ia64 system does not crash anymore
I prepare this patch diffing from srp revision 6455 applied with
srp-params.patch that I sent you last week
Please let me know if you want this patch generated from current srp
(revision 6550)
What is the status for s
Hi Roland,
I reported the error from my original email responding to your fmr
patch. For ia64 system with pcix hca I got asyn event
IB_EVENT_QP_ACCESS_ERR at the initiator (and I got cqe with
IB_COMPLETION_STATUS_REMOTE_ACCESS_ERROR status at my target)
I still have not had an IB analyzer t
Roland Dreier wrote:
> > And what if you comment out the line
> > .eh_device_reset_handler= srp_reset_device,
> > does that fix it?
> No
Now I'm really confused.
Me too.
It seems we lose the connection to the target (BTW -- do you know why
the connection is getting killed)
Roland Dreier wrote:
Roland> Hmm, you may be right -- scsi_eh_bus_device_reset() in
Roland> scsi_error.c does seem to flush all commands. But do you
Roland> see srp_reset_device() being called? I didn't think I saw
Roland> it in your trace.
It was in my trace. I probably left
Roland Dreier wrote:
> The return happened before reaching the code above. It happened at:
> if (!wait_for_completion_timeout(&req->done,
>msecs_to_jiffies(SRP_ABORT_TIMEOUT_MS)))
> return FAILED;
>
> because the qp was in fatal state. Theref
Roland Dreier wrote:
Hmm, I don't understand what could be going on. srp_send_tsk_mgmt()
currently has:
if (req->cmd_done) {
srp_remove_req(target, req, req_index);
scmnd->scsi_done(scmnd);
} else if (!req->tsk_status) {
srp_remove
Roland Dreier wrote:
Hmm, I don't understand what could be going on. srp_send_tsk_mgmt()
currently has:
if (req->cmd_done) {
srp_remove_req(target, req, req_index);
scmnd->scsi_done(scmnd);
} else if (!req->tsk_status) {
srp_remove
Roland Dreier wrote:
Hmm, it's clearly a use-after-free bug. Based on
ip is at srp_reconnect_target+0x2b1/0x5c0 [ib_srp]
can you guess where it is in the SRP driver or what it's accessing?
Also this is happening because the connection is being reconnected,
because SCSI commands are timing
Hi Roland,
Apr 7 18:17:17 lab105 kernel: Unable to handle kernel paging request at
virtual address 6b6b6b6b6b6b6b6b
I think I fixed the bug causing this oops (I was able to reproduce it,
and I don't see it any more). I checked the following patch in and
queued it for kernel 2.6.17:
My ia
Hi Roland,
Please review this patch
+ introducing srp_sg_tablesize as module parameter
+ adjusting SRP_MAX_IU_LEN, SRP_MAX_INDIRECT from srp_sg_tablesize
+ throttling command per lun ie. max_cmd_per_lun can be passed in when
adding target (same as max_sect)
Signed-off-by: Vu Pham <[EM
Roland Dreier wrote:
Apr 7 18:17:17 lab105 kernel: Unable to handle kernel paging request at
virtual address 6b6b6b6b6b6b6b6b
I think I fixed the bug causing this oops (I was able to reproduce it,
and I don't see it any more). I checked the following patch in and
queued it for kernel 2.6.17:
Vu> Here is my status of testing this patch. On x86-64 system I
Vu> got data corruption problem reported after ~4 hrs of running
Vu> Engenio's Smash test tool when I tested with Engenio storage
Vu> On ia64 system I got multiple async event 3
Vu> (IB_EVENT_QP_ACCESS_ERR) and
Hi Roland,
Sorry to take this long to response. Thanks for all the enhancements.
I cced some Engenio's engineer who can help to send latest FW to you.
This mostly works for me, but I still see one weird problem. If I
make an FMR to cover IO of size more than 58 * 4096 bytes, the IO
never comp
ere is the patch (this also does not have
reformatting issue from previous patch)
Thanks,
Vu
Signed-off-by: Vu Pham <[EMAIL PROTECTED]>
Index: ib_srp.c
===
--- ib_srp.c (revision 5918)
+++ ib_srp.c (working copy)
@@ -500,1
more code.
Thanks for cleaning this up
Anyway, I should be able to work on adding FMRs to this cleaner
framework now.
Here is the new fmr patch applied to your current framework
Thanks,
Vu
Signed-off-by: Vu Pham <[EMAIL PROTECTED]>
Index: ib
As to using FMRs to create virtually contiguous regions, the last data
I saw about this related to SRP (not on OpenIB), and resulted in a
gain of ~25% in throughput when using FMRs vs the "full frontal" DMA
MR. So there is definitely something to be gained by creating
virutally contiguous regio
Vu> I don't have a good test to measure performance; however, I
Vu> ran multiple dd streams on raw devices (no filesystem or
Vu> buffer cache involved) to measure the impact of using
Vu> FMRs. With Mellanox srp target referenced implementation SW +
Vu> 14 SATA drives + PCI-E
Roland Dreier wrote:
Vu> Hi Roland, Any progress/update on your review and merging up
Vu> this patch
Sorry, I've been distracted by the huge mass of other code that has
been submitted at the same time.
By the way, do you have a test that you use to measure the performance
impact of usin
Thanks, I have some other SRP changes I'm testing but once that's done
I'll merge this up.
Hi Roland,
Any progress/update on your review and merging up this patch
Thanks,
Vu
___
openib-general mailing list
openib-general@openib.org
http://openib.
Another attempt to implement fmr for srp
+ moving dev_list, pd, dma mr, and fmr resource to srp_device per ib_device
+ implementing fmr - try to build a single fmr per scsi_cmd if fail then
falling back to dma mr
Signed-off-by: Vu Pham <[EMAIL PROTECTED]>
Index: ib
so, it might be good to try and add some more comments explaining
srp_map_fmr() -- it would definitely help me review.
I added some comments - Hope they help your review (instead
of confusing you more :))
Signed-off-by: Vu Pham <[EMAIL PROTE
per device and keep it in srp_device_data struct
I put back fmr + your patch and it works well with my setup.
Signed-off-by: Vu Pham <[EMAIL PROTECTED]>
Have you got time to review this SRP's FMR patch?
Thanks,
vu
___
openib-general mailin
ource per device and keep it in srp_device_data struct
I put back fmr + your patch and it works well with my setup.
Signed-off-by: Vu Pham <[EMAIL PROTECTED]>
Index: ulp/srp/ib_srp.c
===
--- ulp/srp/ib_srp.c(revi
Roland Dreier wrote:
Vu, you pointed out that the current SRP code might look at an IU that
it sent after that IU has been reused for a different command. I
realized that a simple fix for this is just to keep the DMA address
(the only thing we look at in the IU) in the request structure.
Just
Christoph Hellwig wrote:
+ if ((dma_addr & (PAGE_SIZE - 1)) ||
+ ((dma_addr + dma_len) & (PAGE_SIZE - 1)) ||
+ ((i == (sg_cnt - 1)) && !unaligned)) {
+ srp_fmr->io_addr = dma_addr & PAGE_MASK;
+ ++unalig
Christoph Hellwig wrote:
I think it makes more sense to handle this the same way I handled
max_sectors: make it a per-target parameter passed in when connecting
to the target. We could make cmds_per_lun a similar parameter, but
are there likely to be any SRP targets that need this to be limited
Does max_id matter at all for SRP? We only use scsi_scan_target() to
find targets, so I'm not sure where the SCSI midlayer even looks at
max_id for our case.
Our current srp implementation does not support scsi naming convention
which requires max_id and max_channel
For example if ther
> - iu = kmalloc(sizeof *iu, gfp_mask);
> + iu = kzalloc(sizeof *iu, gfp_mask);
We only do this once at init time
By the way, why do we want this? I think we always clear out the
contents of our IUs before we send them.
I don't think that we clear out the contents of
I think it makes more sense to handle this the same way I handled
max_sectors: make it a per-target parameter passed in when connecting
to the target. We could make cmds_per_lun a similar parameter, but
are there likely to be any SRP targets that need this to be limited?
Also, what is max_
target->state other than PORT(DLID)_REDIRECT
+ fix the bug of reuse the iu while it's still in_use
+ support FMR - srp_map_fmr (if map_fmr failed then fall back to normal
indirect mode using global r_key)
I tested this patch with Mellanox IB storage
Signed-off-by: Vu Pham <[EMAI
62 matches
Mail list logo