Justiin, Rewriting a btl is for intra-node performance purpose only. To get things working, you can force tcp connectipns for intra node communication
mpirun --mca btl tcp,self ... Cheers, Gilles Justin Cinkelj <justin.cink...@xlab.si> wrote: >Vader is for intra-node communication only, right? So for inter-node >communication some other mechanism will be used anyway. >Why would be even better to write a new btl? To avoid memcpy (knem would use >it, if I understand you correctly; I guess code assumes that multiple >processes on same node have isolated address spaces). > >Fork + execve was one of first problems, yes. I replaced that with OSv >specific calls (ignore fork, and instead of execve start given binary in new >thread). The global variables required OSv modification - the guys from >http://osv.io/ took care of that (I was surprised that at the end, the patches >were really small and elegant). So while there are no real processes, new >binary / ELF file is loaded at different address then the rest of OS - so it >has separate global variables, and separate environ too. Other resources like >file descriptors are still shared. > >BR Justin > >On 15. 12. 2015 14:55, Gilles Gouaillardet wrote: > >Justin, > > >at first glance, vader should be symmetric (e.g. call >opal_shmem_segment_dettach() instead of munmap() > >Nathan, can you please comment ? > > >using tid instead of pid should also do the trick > > >that being said, a more elegant approach would be to create a new module in >the shmem framework > >basically, create = malloc, attach = return the malloc'ed address, detach = >noop, destroy = free > > >and an even better approach would be to write your own btl that replaces vader. > >basically, vader can use the knem module to write into an other process >address space. > >since your os is thread only, knem invocation would become a simple memcpy. > > >makes sense ? > > > >as a side note, > >ompi uses global variables, and orted forks and exec MPI tasks after setting >some environment variables. it seems porting ompi to this new os was not so >painful, and I would have expected some issues with the global variables, and >some race conditions with the environment. > >did you already solve these issues ? > > >Cheers, > > >Gilles > >On Tuesday, December 15, 2015, Justin Cinkelj <justin.cink...@xlab.si> wrote: > >I'm trying to port Open MPI to OS with threads instead of processes. >Currently, during MPI_Finalize, I get attempt to call munmap first with >address of 0x200000c00000 and later 0x200000c00008. > >mca_btl_vader_component_close(): >munmap (mca_btl_vader_component.my_segment, >mca_btl_vader_component.segment_size) > >mca_btl_vader_component_init(): >if(MCA_BTL_VADER_XPMEM != mca_btl_vader_component.single_copy_mechanism) { > opal_shmem_segment_create (&component->seg_ds, sm_file, >component->segment_size); > component->my_segment = opal_shmem_segment_attach (&component->seg_ds); >} else { > mmap (NULL, component->segment_size, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | >MAP_SHARED, -1, 0); >} > >But opal_shmem_segment_attach (from mmap module) ends with: > /* update returned base pointer with an offset that hides our stuff */ > return (ds_buf->seg_base_addr + sizeof(opal_shmem_seg_hdr_t)); > >So mca_btl_vader_component_close() should in that case call >opal_shmem_segment_dettach() instead of munmap. >Or actually, as at that point shmem_mmap module cleanup code is already done, >vader could/should just skip cleanup part? > >Maybe I should ask first how does that setup/cleanup work on normal Linux >system? >Is mmap called twice, and vader and shmem_mmap module each uses different >address (so vader munmap is indeed required in that case)? > >Second question. >With two threads in one process, I got attempt to opal_shmem_segment_dettach() >and munmap() on same mmap-ed address, from both threads. I 'fixed' that by >replacing "ds_buf->seg_cpid = getpid()" with gettid(), and then each thread >munmap-s only address allocated by itself. Is that correct? Or is it possible, >that the second thread might still try to access data at that address? > >BR Justin > >_______________________________________________ >devel mailing list >de...@open-mpi.org >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >Link to this post: >http://www.open-mpi.org/community/lists/devel/2015/12/18417.php > > > >_______________________________________________ devel mailing list >de...@open-mpi.org Subscription: >http://www.open-mpi.org/mailman/listinfo.cgi/devel Link to this post: >http://www.open-mpi.org/community/lists/devel/2015/12/18418.php > >