Justiin,

Rewriting a btl is for intra-node performance purpose only.
To get things working, you can force tcp connectipns for intra node 
communication

mpirun --mca btl tcp,self ...

Cheers,

Gilles

Justin Cinkelj <justin.cink...@xlab.si> wrote:
>Vader is for intra-node communication only, right? So for inter-node 
>communication some other mechanism will be used anyway.
>Why would be even better to write a new btl? To avoid memcpy (knem would use 
>it, if I understand you correctly; I guess code assumes that multiple 
>processes on same node have isolated address spaces).
>
>Fork + execve was one of first problems, yes. I replaced that with OSv 
>specific calls (ignore fork, and instead of execve start given binary in new 
>thread). The global variables required OSv modification - the guys from 
>http://osv.io/ took care of that (I was surprised that at the end, the patches 
>were really small and elegant). So while there are no real processes, new 
>binary / ELF file is loaded at different address then the rest of OS - so it 
>has separate global variables, and separate environ too. Other resources like 
>file descriptors are still shared.
>
>BR Justin
>
>On 15. 12. 2015 14:55, Gilles Gouaillardet wrote:
>
>Justin, 
>
>
>at first glance, vader should be symmetric (e.g. call 
>opal_shmem_segment_dettach() instead of munmap()
>
>Nathan, can you please comment ?
>
>
>using tid instead of pid should also do the trick
>
>
>that being said, a more elegant approach would be to create a new module in 
>the shmem framework
>
>basically, create = malloc, attach = return the malloc'ed address, detach = 
>noop, destroy = free
>
>
>and an even better approach would be to write your own btl that replaces vader.
>
>basically, vader can use the knem module to write into an other process 
>address space.
>
>since your os is thread only, knem invocation would become a simple memcpy.
>
>
>makes sense ?
>
>
>
>as a side note,
>
>ompi uses global variables, and orted forks and exec MPI tasks after setting 
>some environment variables. it seems porting ompi to this new os was not so 
>painful, and I would have expected some issues with the global variables, and 
>some race conditions with the environment.
>
>did you already solve these issues ?
>
>
>Cheers,
>
>
>Gilles
>
>On Tuesday, December 15, 2015, Justin Cinkelj <justin.cink...@xlab.si> wrote:
>
>I'm trying to port Open MPI to OS with threads instead of processes. 
>Currently, during MPI_Finalize, I get attempt to call munmap first with 
>address of 0x200000c00000 and later 0x200000c00008.
>
>mca_btl_vader_component_close():
>munmap (mca_btl_vader_component.my_segment, 
>mca_btl_vader_component.segment_size)
>
>mca_btl_vader_component_init():
>if(MCA_BTL_VADER_XPMEM != mca_btl_vader_component.single_copy_mechanism) {
>  opal_shmem_segment_create (&component->seg_ds, sm_file, 
>component->segment_size);
>  component->my_segment = opal_shmem_segment_attach (&component->seg_ds);
>} else {
>  mmap (NULL, component->segment_size, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | 
>MAP_SHARED, -1, 0);
>}
>
>But opal_shmem_segment_attach (from mmap module) ends with:
>    /* update returned base pointer with an offset that hides our stuff */
>    return (ds_buf->seg_base_addr + sizeof(opal_shmem_seg_hdr_t));
>
>So mca_btl_vader_component_close() should in that case call 
>opal_shmem_segment_dettach() instead of munmap.
>Or actually, as at that point shmem_mmap module cleanup code is already done, 
>vader could/should just skip cleanup part?
>
>Maybe I should ask first how does that setup/cleanup work on normal Linux 
>system?
>Is mmap called twice, and vader and shmem_mmap module each uses different 
>address (so vader munmap is indeed required in that case)?
>
>Second question.
>With two threads in one process, I got attempt to opal_shmem_segment_dettach() 
>and munmap() on same mmap-ed address, from both threads. I 'fixed' that by 
>replacing "ds_buf->seg_cpid = getpid()" with gettid(), and then each thread 
>munmap-s only address allocated by itself. Is that correct? Or is it possible, 
>that the second thread might still try to access data at that address?
>
>BR Justin
>
>_______________________________________________
>devel mailing list
>de...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>Link to this post: 
>http://www.open-mpi.org/community/lists/devel/2015/12/18417.php
>
>
>
>_______________________________________________ devel mailing list 
>de...@open-mpi.org Subscription: 
>http://www.open-mpi.org/mailman/listinfo.cgi/devel Link to this post: 
>http://www.open-mpi.org/community/lists/devel/2015/12/18418.php 
>
>

Reply via email to