Re: [openib-general] SRP Target Installation problem

2006-11-06 Thread Vu Pham
> > I managed to find the above missing files under the gen1 branch. I am > not quite sure about installing packages from gen1 and gen2 on the same > machine. Should I install IBGOLD instead of OFED ? > srp target sw only run on gen1 (ie. IBGD stack) and yes you need to install/run IBGD stack

Re: [openib-general] Mellanox SRP target implementation

2006-11-02 Thread Vu Pham
>>*srp target* is still on gen1 code base - IBGD >> >>*nfs-rdma server* is on gen2 code base > > > Any chance the MTD2000 runs openfiler? > We have never installed openfiler. You can try -vu ___ openib-general mailing list openib-general@openib.o

Re: [openib-general] Mellanox SRP target implementation

2006-11-02 Thread Vu Pham
Tomoaki, > > Can anybody tell me about the mellanox "SRP target" implementation code which > is included in MTD2000 with NFS-RDMA server ? > Is this gen2 base ? > *srp target* is still on gen1 code base - IBGD *nfs-rdma server* is on gen2 code base ___

Re: [openib-general] [PATCH] IB/SRP: Enable multichannel

2006-09-27 Thread Vu Pham
Vu Pham wrote: > Michael S. Tsirkin wrote: > >>Quoting r. Vu Pham <[EMAIL PROTECTED]>: >> >> >>>Either you can use multiple channels or derive different >>>initiator_port_ID in the login req to have multiple paths on >>>the same physical

Re: [openib-general] [PATCH] IB/SRP: Enable multichannel

2006-09-27 Thread Vu Pham
Michael S. Tsirkin wrote: > Quoting r. Vu Pham <[EMAIL PROTECTED]>: > >>Either you can use multiple channels or derive different >>initiator_port_ID in the login req to have multiple paths on >>the same physical port > > > So how about we just stick a p

Re: [openib-general] [PATCH] IB/SRP: Enable multichannel

2006-09-26 Thread Vu Pham
Michael S. Tsirkin wrote: > Quoting r. Vu Pham <[EMAIL PROTECTED]>: > >>Most of srp targets that I tested don't support multiple >>channels. > > > Which are these? Mellanox referenced srp target, Texas Memory System's SSD, Engenio. > And what

Re: [openib-general] [PATCH] IB/SRP: Enable multichannel

2006-09-26 Thread Vu Pham
Ishai Rabinovitz wrote: > Hi Roland, > > SRP High Availability needs an initiator to connect to the same target > several times, e.g., once from each IB port of the target (this way we can use > device mapper multipath for failover). Note that both connections are actually > active, e.g. multipat

Re: [openib-general] Srp question

2006-09-01 Thread Vu Pham
> > By default, we found (via stats from the DDN) that we were only seeing reads > and writes in the 0-32Kbyte range. Comparing IBGold and OFED, we found that > the srp_sg_tablesize defaulted to 256, but in OFED it defaulted to 12. So, > changing this (via modprobe.conf) to 256 in OFED, we were

Re: [openib-general] SRP IO Size

2006-08-08 Thread Vu Pham
Hi Paul, You can load ib_srp module with module param max_xfer_sectors_per_io=<512, 1024, 2048> to support 256KB, 512KB and 1M direct IOs Adding the following line "options ib_srp max_xfer_sectors_per_io=1024 to /etc/modprobe.conf Vu > Hi, > I was running some performance tests to an srp ta

Re: [openib-general] [PATCH] RFC: srp filesystem data corruption problem/work-around

2006-08-02 Thread Vu Pham
Michael, > +static const u8 mellanox_oui[3] = { 0x02, 0xc9, 0x02 }; Should it be {0x00, 0x02, 0xc9}? ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openi

Re: [openib-general] [SRP] [RFC] Needed changes to support fail-over drivers

2006-07-26 Thread Vu Pham
Roland Dreier wrote: > > > Why does userspace need to be able to disconnect a connection? > > > There are two options on who will initiate the disconnection: the userspace > > daemon or the ib_srp module. I considered both options and I was not sure > > which one is better. I choose to do it

Re: [openib-general] [PATCH] RFC: srp filesystem data corruption problem/work-around

2006-07-26 Thread Vu Pham
Michael S. Tsirkin wrote: > Quoting r. Vu Pham <[EMAIL PROTECTED]>: >>> Right now this workaround affects all targets unconditionally. >>> >> Can we rework the patch to have mellanox_workarounds=0 by >> default? > > Hmm ... since this is a data cor

Re: [openib-general] [PATCH] RFC: srp filesystem data corruption problem/work-around

2006-07-25 Thread Vu Pham
Roland Dreier wrote: > BTW -- how are you seeing this corruption? Host machine: EM64T cpu, RHEL4 u1/2/3 x86 32-bit version Create/mount ext2 or ext3 file system on srp devices using 1K block size (ie. mkfs -t ext2 -b 1024). Copy file, umount, remount and compare to the original file - looping

Re: [openib-general] [PATCH] RFC: srp filesystem data corruption problem/work-around

2006-07-25 Thread Vu Pham
Roland Dreier wrote: > > Until there's a better understanding of the issue, I've come up with the > > following simple patch that will use indirect mode for this case. > > Is it possible to check the target's OUI to detect Mellanox targets? Mellanox target ioc_guid is derived from node_guid;

Re: [openib-general][PATCH] srp: param sg_tablesize,

2006-05-17 Thread Vu Pham
@@ -1914,6 +1920,11 @@ static int __init srp_init_module(void) { int ret; Thanks, should we do a check and put some cap on srp_sg_tablesize value ie. + srp_sg_tablesize = max(1, srp_sg_tablesize); + srp_sg_tablesize = min(srp_sg_tablesize, SRP_MAX_SG_TABLESIZE); +

Re: [openib-general][patch review] srp: fmr implementation,

2006-05-10 Thread Vu Pham
BTW, does Mellanox (or anyone else) have any numbers showing that using FMRs makes any difference in performance on a semi-realistic benchmark? I'm using xdd to test the performance www.ioperformance.com/products.htm The target is Mellanox srp target reference implemenation with 14 SATA

Re: [openib-general][PATCH] srp: param sg_tablesize,

2006-05-10 Thread Vu Pham
Hi Roland, This patch: + introduces srp_sg_tablesize as module parameter - default value is 16 + adjusts SRP_MAX_IU_LEN, SRP_MAX_INDIRECT from srp_sg_tablesize Signed-off-by: Vu Pham <[EMAIL PROTECTED]> Index: infiniband/ulp/srp/ib

Re: [openib-general][PATCH] srp: throttle command per lun,

2006-05-10 Thread Vu Pham
Patch to throttle command per lun when adding target. Signed-off-by: Vu Pham <[EMAIL PROTECTED]> Index: infiniband/ulp/srp/ib_srp.c === --- infiniband/ulp/srp/ib_srp.c (revision 7075) +++ infiniband/ulp/srp/ib_srp.c (workin

Re: [openib-general][PATCH] srp: tuned parameters,

2006-05-10 Thread Vu Pham
Roland Dreier wrote: I finally looked this over. First, this should be two patches: making srp_sg_tablesize tunable should be a separate change from making it possible to specify max_cmd_per_lun for a target. OK, I'll break it to two patches The srp_sg_tablesize change makes the default num

Re: [openib-general][patch review] srp: fmr implementation,

2006-05-10 Thread Vu Pham
Roland Dreier wrote: BTW, does Mellanox (or anyone else) have any numbers showing that using FMRs makes any difference in performance on a semi-realistic benchmark? I'm using xdd to test the performance www.ioperformance.com/products.htm The target is Mellanox srp target reference implemena

Re: [openib-general][patch review] srp: fmr implementation,

2006-05-08 Thread Vu Pham
Roland Dreier wrote: Vu> This fmr patch does not work for ia64 system because this Vu> fmr_page_mask is defined as unsigned int. Great catch! Vu> We should type cast it to u64 or define it as unsigned long Casting it won't help because it will just get zero-extended. I think we ne

Re: [openib-general][patch review] srp: fmr implementation,

2006-05-08 Thread Vu Pham
+ dma_pages[page_cnt++] = + (sg_dma_address(&scat[i]) & dev->fmr_page_mask) + j; + This fmr patch does not work for ia64 system because this dev->fmr_page_mask is defined as unsigned int. We should type cast it to u64 or define it as uns

Re: [openib-general][patch review] srp: fmr implementation,

2006-05-08 Thread Vu Pham
Roland Dreier wrote: Vu> Have you read scsi_eh_try_stu(scmnd) and scsi_eh_tur(scmnd)? Vu> These functions use the same scmnd and reformat it with new Vu> cdb and call srp_queuecommand() which uses new req and put Vu> this new req in request queue for this same scmnd with Vu> d

Re: [openib-general][patch review] srp: fmr implementation,

2006-05-08 Thread Vu Pham
Roland Dreier wrote: Vu> scsi_eh_try_stu or scsi_eh_try_tur get timeout, scsi midlayer Vu> tried to abort stu or tur command as well. Since we delay to Vu> clean in srp_reset_device(), srp's request queue is still not Vu> empty. This stu or tur command is freed by scsi midlayer. T

Re: [openib-general][patch review] srp: fmr implementation,

2006-05-08 Thread Vu Pham
Roland Dreier wrote: > 1st scsi_try_host_reset() --> srp_host_reset() --> > srp_reconnect_target() return SUCCESS. Then scsi_eh_try_stu() or > scsi_eh_tur() is called right after > > scsi_eh_try_stu or scsi_eh_tur --> scsi_send_eh_cmnd() --> > srp_queuecommand() But after srp_reconnect_t

Re: [openib-general][patch review] srp: fmr implementation,

2006-05-05 Thread Vu Pham
So after srp_reconnect_target() returns, SRP has no requests in its queue. The only way that a command could be put in the queue is if the SCSI midlayer passes it back into the queuecommand functions. Yes this is exactly what happening. static int scsi_eh_host_reset(struct list_head *work_

Re: [openib-general][patch review] srp: fmr implementation,

2006-05-05 Thread Vu Pham
apply on top of your patch will fix the problem. IB/srp: Fix tracking of pending requests during error handling Signed-off-by: Vu Pham <[EMAIL PROTECTED]> diff -Naur infiniband/ulp/srp.roland-eh/ib_srp.c infiniband/ulp/srp/ib_srp.c --- infiniband/ulp/srp.roland-eh/ib_srp.c 2006-05-

Re: [openib-general][patch review] srp: fmr implementation,

2006-05-05 Thread Vu Pham
Roland Dreier wrote: > reading scsi_error.c again, I find this logic for our case (please > correct me if I'm wrong) > 1. eh_abort_handler and eh_device_reset_handler fail with timeout; > eh_host_reset_handler successes But you're crashing inside the call to srp_reconnect_target() in srp_res

Re: [openib-general][patch review] srp: fmr implementation,

2006-05-05 Thread Vu Pham
reading scsi_error.c again, I find this logic for our case (please correct me if I'm wrong) 1. eh_abort_handler and eh_device_reset_handler fail with timeout; eh_host_reset_handler successes 2. scsi_eh_host_reset goes on with scsi_eh_try_stu & scsi_eh_tur 3. either scsi_eh_try_stu or scsi_eh

Re: [openib-general][patch review] srp: fmr implementation,

2006-05-05 Thread Vu Pham
Roland Dreier wrote: > 1. srp_unmap_data() and srp_remove_req() for .eh_abort_handler(scmnd) > a. abort get timeout or > b. req->cmd_done or > c. !req->tsk_status > 2. we should do step (1) for .eh_abort_handler(scmnd) only and don't > do step 1 for .eh_device_reset_handler(scmn

Re: [openib-general] ib_srp

2006-04-27 Thread Vu Pham
Scott Weitzenkamp (sweitzen) wrote: I see mention of an SRP daemon coming in 1.0 RC4, does this daemon handle target addition/removal? Yes ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general T

Re: [openib-general][patch review] srp: fmr implementation,

2006-04-21 Thread Vu Pham
Thanks. Can you explain what the bug causing the crash is? I'd like to understand the "why" of this patch. 1. srp_unmap_data() and srp_remove_req() for .eh_abort_handler(scmnd) a. abort get timeout or b. req->cmd_done or c. !req->tsk_status 2. we should do step (1) for .eh_abort_

Re: [openib-general][patch review] srp: fmr implementation,

2006-04-21 Thread Vu Pham
With the following patch applied my ia64 system does not crash anymore I prepare this patch diffing from srp revision 6455 applied with srp-params.patch that I sent you last week Please let me know if you want this patch generated from current srp (revision 6550) What is the status for s

Re: [openib-general][patch review] srp: fmr implementation,

2006-04-21 Thread Vu Pham
Hi Roland, I reported the error from my original email responding to your fmr patch. For ia64 system with pcix hca I got asyn event IB_EVENT_QP_ACCESS_ERR at the initiator (and I got cqe with IB_COMPLETION_STATUS_REMOTE_ACCESS_ERROR status at my target) I still have not had an IB analyzer t

Re: [openib-general][patch review] srp: fmr implementation,

2006-04-19 Thread Vu Pham
Roland Dreier wrote: > > And what if you comment out the line > > .eh_device_reset_handler= srp_reset_device, > > does that fix it? > No Now I'm really confused. Me too. It seems we lose the connection to the target (BTW -- do you know why the connection is getting killed)

Re: [openib-general][patch review] srp: fmr implementation,

2006-04-18 Thread Vu Pham
Roland Dreier wrote: Roland> Hmm, you may be right -- scsi_eh_bus_device_reset() in Roland> scsi_error.c does seem to flush all commands. But do you Roland> see srp_reset_device() being called? I didn't think I saw Roland> it in your trace. It was in my trace. I probably left

Re: [openib-general][patch review] srp: fmr implementation,

2006-04-18 Thread Vu Pham
Roland Dreier wrote: > The return happened before reaching the code above. It happened at: > if (!wait_for_completion_timeout(&req->done, >msecs_to_jiffies(SRP_ABORT_TIMEOUT_MS))) > return FAILED; > > because the qp was in fatal state. Theref

Re: [openib-general][patch review] srp: fmr implementation,

2006-04-18 Thread Vu Pham
Roland Dreier wrote: Hmm, I don't understand what could be going on. srp_send_tsk_mgmt() currently has: if (req->cmd_done) { srp_remove_req(target, req, req_index); scmnd->scsi_done(scmnd); } else if (!req->tsk_status) { srp_remove

Re: [openib-general][patch review] srp: fmr implementation,

2006-04-14 Thread Vu Pham
Roland Dreier wrote: Hmm, I don't understand what could be going on. srp_send_tsk_mgmt() currently has: if (req->cmd_done) { srp_remove_req(target, req, req_index); scmnd->scsi_done(scmnd); } else if (!req->tsk_status) { srp_remove

Re: [openib-general][patch review] srp: fmr implementation,

2006-04-14 Thread Vu Pham
Roland Dreier wrote: Hmm, it's clearly a use-after-free bug. Based on ip is at srp_reconnect_target+0x2b1/0x5c0 [ib_srp] can you guess where it is in the SRP driver or what it's accessing? Also this is happening because the connection is being reconnected, because SCSI commands are timing

Re: [openib-general][patch review] srp: fmr implementation,

2006-04-13 Thread Vu Pham
Hi Roland, Apr 7 18:17:17 lab105 kernel: Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b6b6b I think I fixed the bug causing this oops (I was able to reproduce it, and I don't see it any more). I checked the following patch in and queued it for kernel 2.6.17: My ia

[openib-general][PATCH] srp: tuned parameters,

2006-04-13 Thread Vu Pham
Hi Roland, Please review this patch + introducing srp_sg_tablesize as module parameter + adjusting SRP_MAX_IU_LEN, SRP_MAX_INDIRECT from srp_sg_tablesize + throttling command per lun ie. max_cmd_per_lun can be passed in when adding target (same as max_sect) Signed-off-by: Vu Pham <[EM

Re: [openib-general][patch review] srp: fmr implementation,

2006-04-12 Thread Vu Pham
Roland Dreier wrote: Apr 7 18:17:17 lab105 kernel: Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b6b6b I think I fixed the bug causing this oops (I was able to reproduce it, and I don't see it any more). I checked the following patch in and queued it for kernel 2.6.17:

Re: [openib-general][patch review] srp: fmr implementation,

2006-04-12 Thread Vu Pham
Vu> Here is my status of testing this patch. On x86-64 system I Vu> got data corruption problem reported after ~4 hrs of running Vu> Engenio's Smash test tool when I tested with Engenio storage Vu> On ia64 system I got multiple async event 3 Vu> (IB_EVENT_QP_ACCESS_ERR) and

Re: [openib-general][patch review] srp: fmr implementation,

2006-04-11 Thread Vu Pham
Hi Roland, Sorry to take this long to response. Thanks for all the enhancements. I cced some Engenio's engineer who can help to send latest FW to you. This mostly works for me, but I still see one weird problem. If I make an FMR to cover IO of size more than 58 * 4096 bytes, the IO never comp

Re: [openib-general][patch review] srp: fmr implementation,

2006-03-22 Thread Vu Pham
ere is the patch (this also does not have reformatting issue from previous patch) Thanks, Vu Signed-off-by: Vu Pham <[EMAIL PROTECTED]> Index: ib_srp.c === --- ib_srp.c (revision 5918) +++ ib_srp.c (working copy) @@ -500,1

Re: [openib-general][patch review] srp: fmr implementation,

2006-03-21 Thread Vu Pham
more code. Thanks for cleaning this up Anyway, I should be able to work on adding FMRs to this cleaner framework now. Here is the new fmr patch applied to your current framework Thanks, Vu Signed-off-by: Vu Pham <[EMAIL PROTECTED]> Index: ib

Re: [openib-general] mthca FMR correctness (and memory windows)

2006-03-20 Thread Vu Pham
As to using FMRs to create virtually contiguous regions, the last data I saw about this related to SRP (not on OpenIB), and resulted in a gain of ~25% in throughput when using FMRs vs the "full frontal" DMA MR. So there is definitely something to be gained by creating virutally contiguous regio

Re: [openib-general][patch review] srp: fmr implementation,

2006-03-14 Thread Vu Pham
Vu> I don't have a good test to measure performance; however, I Vu> ran multiple dd streams on raw devices (no filesystem or Vu> buffer cache involved) to measure the impact of using Vu> FMRs. With Mellanox srp target referenced implementation SW + Vu> 14 SATA drives + PCI-E

Re: [openib-general][patch review] srp: fmr implementation,

2006-03-13 Thread Vu Pham
Roland Dreier wrote: Vu> Hi Roland, Any progress/update on your review and merging up Vu> this patch Sorry, I've been distracted by the huge mass of other code that has been submitted at the same time. By the way, do you have a test that you use to measure the performance impact of usin

Re: [openib-general][patch review] srp: fmr implementation,

2006-03-10 Thread Vu Pham
Thanks, I have some other SRP changes I'm testing but once that's done I'll merge this up. Hi Roland, Any progress/update on your review and merging up this patch Thanks, Vu ___ openib-general mailing list openib-general@openib.org http://openib.

[openib-general][patch review] srp: fmr implementation,

2006-02-27 Thread Vu Pham
Another attempt to implement fmr for srp + moving dev_list, pd, dma mr, and fmr resource to srp_device per ib_device + implementing fmr - try to build a single fmr per scsi_cmd if fail then falling back to dma mr Signed-off-by: Vu Pham <[EMAIL PROTECTED]> Index: ib

[openib-general] Re: [PATCH] SRP: don't use TX IU after freeing it

2005-10-11 Thread Vu Pham
so, it might be good to try and add some more comments explaining srp_map_fmr() -- it would definitely help me review. I added some comments - Hope they help your review (instead of confusing you more :)) Signed-off-by: Vu Pham <[EMAIL PROTE

Re: [openib-general] Re: [PATCH] SRP: don't use TX IU after freeing it

2005-10-10 Thread Vu Pham
per device and keep it in srp_device_data struct I put back fmr + your patch and it works well with my setup. Signed-off-by: Vu Pham <[EMAIL PROTECTED]> Have you got time to review this SRP's FMR patch? Thanks, vu ___ openib-general mailin

[openib-general] Re: [PATCH] SRP: don't use TX IU after freeing it

2005-09-29 Thread Vu Pham
ource per device and keep it in srp_device_data struct I put back fmr + your patch and it works well with my setup. Signed-off-by: Vu Pham <[EMAIL PROTECTED]>   Index: ulp/srp/ib_srp.c === --- ulp/srp/ib_srp.c(revi

[openib-general] Re: [PATCH] SRP: don't use TX IU after freeing it

2005-09-29 Thread Vu Pham
Roland Dreier wrote: Vu, you pointed out that the current SRP code might look at an IU that it sent after that IU has been reused for a different command. I realized that a simple fix for this is just to keep the DMA address (the only thing we look at in the IU) in the request structure. Just

Re: [openib-general][PATCH][SRP] bug fixes & fmr supported,

2005-09-21 Thread Vu Pham
Christoph Hellwig wrote: + if ((dma_addr & (PAGE_SIZE - 1)) || + ((dma_addr + dma_len) & (PAGE_SIZE - 1)) || + ((i == (sg_cnt - 1)) && !unaligned)) { + srp_fmr->io_addr = dma_addr & PAGE_MASK; + ++unalig

Re: [openib-general][PATCH][SRP] bug fixes & fmr supported,

2005-09-21 Thread Vu Pham
Christoph Hellwig wrote: I think it makes more sense to handle this the same way I handled max_sectors: make it a per-target parameter passed in when connecting to the target. We could make cmds_per_lun a similar parameter, but are there likely to be any SRP targets that need this to be limited

Re: [openib-general][PATCH][SRP] bug fixes & fmr supported,

2005-09-20 Thread Vu Pham
Does max_id matter at all for SRP? We only use scsi_scan_target() to find targets, so I'm not sure where the SCSI midlayer even looks at max_id for our case. Our current srp implementation does not support scsi naming convention which requires max_id and max_channel For example if ther

Re: [openib-general][PATCH][SRP] bug fixes & fmr supported,

2005-09-20 Thread Vu Pham
> - iu = kmalloc(sizeof *iu, gfp_mask); > + iu = kzalloc(sizeof *iu, gfp_mask); We only do this once at init time By the way, why do we want this? I think we always clear out the contents of our IUs before we send them. I don't think that we clear out the contents of

Re: [openib-general][PATCH][SRP] bug fixes & fmr supported,

2005-09-20 Thread Vu Pham
I think it makes more sense to handle this the same way I handled max_sectors: make it a per-target parameter passed in when connecting to the target. We could make cmds_per_lun a similar parameter, but are there likely to be any SRP targets that need this to be limited? Also, what is max_

[openib-general][PATCH][SRP] bug fixes & fmr supported,

2005-09-20 Thread Vu Pham
target->state other than PORT(DLID)_REDIRECT + fix the bug of reuse the iu while it's still in_use + support FMR - srp_map_fmr (if map_fmr failed then fall back to normal indirect mode using global r_key) I tested this patch with Mellanox IB storage Signed-off-by: Vu Pham <[EMAI