I allocate an array that needs to be calculated among pools using: ALLOCATE(A(3,nbnd,nkf)), where nkf is the number of k points per pool and initialized as: A(:,:,:)=0
Once array A is calculated, I allocate array A_tot as: ALLOCATE(A_tot(3,nbnd,nkf_tot)), where nkf_tot is the total number of k points. and initialized as: A_tot(:,:,:)=0 Then I call the poolgather1 as below: #if defined(__MPI) CALL poolgather1(3,nbnd,nkf_tot,nkf,A,A_tot) #else A_tot=A #endif I have tried both allocating A_tot outside and inside the #if defined section. Poolgather1 is similar to other EPW poolgather and QE poolcollect routines and is given below: subroutine poolgather1 (nsize1,nsize2,nkstot,nks, f_in, f_out) !-------------------------------------------------------------------- USE kinds, ONLY : DP USE mp_global, ONLY : my_pool_id, inter_pool_comm, npool,kunit USE mp, ONLY : mp_barrier, mp_bcast,mp_sum USE mp_world, ONLY : mpime implicit none ! INTEGER, INTENT (in) :: nsize1 !! first dimension of vectors f_in and f_out INTEGER, INTENT (in) :: nsize2 !! second dimension of vectors f_in and f_out INTEGER, INTENT (in) :: nks !! number of k-points per pool INTEGER, INTENT (in) :: nkstot !! total number of k-points REAL (KIND=DP), INTENT (in) :: f_in(nsize1,nsize2,nks) ! input ( only for k-points of mypool ) REAL (KIND=DP), INTENT (out) :: f_out(nsize1,nsize2,nkstot) ! output ( contains values for all k-point ) ! #if defined(__MPI) INTEGER :: rest, nbase ! the rest of the integer division nkstot / npool ! the position in the original list rest = nkstot / kunit - ( nkstot / kunit / npool ) * npool ! nbase = nks * my_pool_id ! IF ( ( my_pool_id + 1 ) > rest ) nbase = nbase + rest * kunit ! f_out = 0.d0 f_out(1:nsize1,1:nsize2,(nbase+1):(nbase+nks)) = f_in(1:nsize1,1:nsize2,1:nks) ! ! ... reduce across the pools ! CALL mp_sum(f_out,inter_pool_comm) ! #else f_out(:,:,:) = f_in(:,:,:) ! #endif ! end subroutine poolgather1 For my test case, nsize1=3, nsize2=4, nkstot=47. Thanks, Vahid On Oct 30, 2016, at 8:19 PM, Ye Luo <xw111lu...@gmail.com<mailto:xw111lu...@gmail.com>> wrote: mp_sum_rt is interfaced with mp_sum for 3d array. When you mp_sum a 3d array, it will be automatically called. USE mp, ONLY : mp_sum_rt is not necessary. If you want to check if mp_sum_rt is implicitly called via interface, you can print something before and after your mp_sum call and also print something in mp_sum_rt. You should find the expected printing behaviour. if you compare the mp_sum_rt with the 2d and 4d implementation mp_sum_rm and mp_sum_r4d, you should find they flat the dimensionality and call the same 1-d routine reduce_base_real. As I asked earlier, how did you allocate your 3-D arrays input and output? Are you allocating sufficient size for the output, did you deallocate it by mistake? How large is output array size, if it exceeds 2^32, you might hit the 32bit integer bug of the variable msglen. Ye =================== Ye Luo, Ph.D. Leadership Computing Facility Argonne National Laboratory 2016-10-30 14:25 GMT-05:00 Vahid Askarpour <vh261...@dal.ca<mailto:vh261...@dal.ca>>: Hi Ye, The changes I am making are part of the EPW/QE code and so the parallelization depends on the QE MPI routines. I have no problem using mp_sum for a 2-D array such as energy(iband,ik). I can use mp_sum to add the contributions from various nodes and print the total in agreement with the output from the serial run. As far as the allocation goes, I use the same 2-D approach to my 3-D array. The problem arises when I use the 3-D array with mp_sum. I noticed that in the Modules/mp.f90 file, there is a subroutine as follows: SUBROUTINE mp_sum_rt( msg, gid ) IMPLICIT NONE REAL (DP), INTENT (INOUT) :: msg(:,:,:) INTEGER, INTENT(IN) :: gid #if defined(__MPI) INTEGER :: msglen msglen = size(msg) CALL reduce_base_real( msglen, msg, gid, -1 ) #endif END SUBROUTINE mp_sum_rt Here msg has three dimensions. There are other subroutines for 1, 2, and 4 dimensions. Somehow, if I can get EPW to use the above subroutine, it might work. Adding a statement like: USE mp, ONLY : mp_sum_rt at the beginning of the file does not work as the compilation fails because EPW does not see the subroutine. Interestingly, it does see mp_sum which is in the same mp.f90 file. So is there a way that I can get EPW to see mp_sum_rt? This might solve my problem. Since I know that mp_sum works with 2-D arrays, one option is to rewrite my files in 2-D, one array for each x,y,z direction. I am trying to avoid this. Thanks, Vahid On Oct 30, 2016, at 3:59 PM, Ye Luo <xw111lu...@gmail.com<mailto:xw111lu...@gmail.com>> wrote: Hi Vahid, segfault in mp_sum doesn't necessarily mean a problem there. Probably you wrote something to output array but not in a valid place before mp_sum. Try to check your allocation of output and the copy make sure they are correct. Ye =================== Ye Luo, Ph.D. Leadership Computing Facility Argonne National Laboratory 2016-10-28 14:33 GMT-05:00 Vahid Askarpour <vh261...@dal.ca<mailto:vh261...@dal.ca>>: Hi Ye, Thank you for your suggestion. I tried it and when I ran the code, it seg-faulted. I put flags in the code to see where the segmentation faults occurs. It happens as the code calls mp_sum. It seems that mp_sum may not be able to handle this reduction. Cheers, Vahid On Oct 28, 2016, at 2:51 PM, Ye Luo <xw111lu...@gmail.com<mailto:xw111lu...@gmail.com>> wrote: In Fortran, whatever-D array is 1-D array. mp_sum should be fine. I saw something strange in your code that you were not copying the right things as you expected. How about the following? output(1:3,1:nbnds,(k_pool*pool_id+1:k_pool*pool_id+k_pool))=input(1:3,1:nbnds,1:k_pool) Ye =================== Ye Luo, Ph.D. Leadership Computing Facility Argonne National Laboratory 2016-10-28 12:29 GMT-05:00 Vahid Askarpour <vh261...@dal.ca<mailto:vh261...@dal.ca>>: Dear QE Users, I am working on some modifications to the QE-6.0 code using symmetry. When I try to combine a 3-D array scattered across nodes, I use the following: output(3,nbnds,(k_pool*pool_id+1:k_pool*pool_id+k_pool))=input(3,nbnds,1:k_pool) Here, nbnds is the number of bands, k_pool is the number of k points/pool, and pool_id is the id of the pool. Here I am assuming the the number of k points is divisible by the number of pools. Then I call mp_sum(output,inter_pool_comm) to put all the segments of input across the nodes into one output file. When I run the modified QE code in parallel, the output file is different from the serial run. Does the QE's mp_sum allow the above operation for a three-D array? Any hints or suggestions would be greatly appreciated. Vahid Vahid Askarpour Department of Physics and Atmospheric Science Dalhousie University, Halifax, NS, Canada _______________________________________________ Pw_forum mailing list Pw_forum@pwscf.org<mailto:Pw_forum@pwscf.org> http://pwscf.org/mailman/listinfo/pw_forum _______________________________________________ Pw_forum mailing list Pw_forum@pwscf.org<mailto:Pw_forum@pwscf.org> http://pwscf.org/mailman/listinfo/pw_forum _______________________________________________ Pw_forum mailing list Pw_forum@pwscf.org<mailto:Pw_forum@pwscf.org> http://pwscf.org/mailman/listinfo/pw_forum _______________________________________________ Pw_forum mailing list Pw_forum@pwscf.org<mailto:Pw_forum@pwscf.org> http://pwscf.org/mailman/listinfo/pw_forum _______________________________________________ Pw_forum mailing list Pw_forum@pwscf.org<mailto:Pw_forum@pwscf.org> http://pwscf.org/mailman/listinfo/pw_forum _______________________________________________ Pw_forum mailing list Pw_forum@pwscf.org<mailto:Pw_forum@pwscf.org> http://pwscf.org/mailman/listinfo/pw_forum
_______________________________________________ Pw_forum mailing list Pw_forum@pwscf.org http://pwscf.org/mailman/listinfo/pw_forum