Re: [QE-users] D3Q code stopped due to davcio error

Lorenzo Paulatto Mon, 07 Dec 2020 05:45:02 -0800

The qq2rr code is serial right? After running in /scratch (for memoryissue) it shows Bus error!
/var/spool/slurm/slurmd.spool/job3829394/slurm_script: line 38: 179584Done ls anh* 179585 Bus error |/gpfs/home/kghosh/kanka/qe-6.5/bin/d3_qq2rr.x 1 1 1
Job finished

Any specific reason for this error?

It is parallelized with openmp (not MPI), although I have not tested itin a while. I do not know what causes a bus error, it is not something Ihad seen since the nineties. Maybe out of memory ? If you are running iton a cluster, it may be better to submit it as a job even if it is serial


Regards,
Kanka


Kanka Ghosh
Postdoctoral Researcher
I2M-Bordeaux
University of Bordeaux, CNRS UMR 5295
Site: Ecole Nationale Supérieure des Arts et Métiers
Bordeaux-Talence 33400

------------------------------------------------------------------------
*From: *"Lorenzo Paulatto" <[email protected]>
*To: *"users" <[email protected]>
*Sent: *Friday, December 4, 2020 1:12:09 PM
*Subject: *Re: [QE-users] D3Q code stopped due to davcio error



    Yes it took little more than 5 days to compute only the first
    q-point. anyway it seems that I should use 1x1x1 grid instead of
    2x2x2. But are you suggesting to do the single mode calculation
    with 1x1x1 grid or the "mode=full" using the 1x1x1 grid?

Yes, but no need to do it: you have done it already. You can just calld3_qq2rr and specify "1 1 1" as the grid size:


ls anh*| d3_qq2rr.x 1 1 1

and it will automatically compute the force constants from thecalculation at (0,0,0). This way you can immediately test how it works.

If you want to try the 2x2x2 grid, I would use 10 pools and maybe trywith *fewer* CPUs per pool: at the moment you are using 128 whichrequires a lot of communications. If the calculation fits in RAM, Iwould recommend keeping each pool on a single computing node.

You may try to use some local scratch in order to avoid running out ofdisk space (ask the cluster managers what to use).

Finally, if you manage to get everything running, you can run al theq-points triplet simultaneously as different batch jobs by setting"first" and "last". You can have the same outdir and prefix, as longas they work on different triplets, they will not interfere (this istrue for d3q, but not in general for other linear response codes)


hth


    Regards,
    Kanka

    Kanka Ghosh
    Postdoctoral Researcher
    I2M-Bordeaux
    University of Bordeaux, CNRS UMR 5295
    Site: Ecole Nationale Supérieure des Arts et Métiers
    Bordeaux-Talence 33400

    ------------------------------------------------------------------------
    *From: *"Lorenzo Paulatto" <[email protected]>
    *To: *"users" <[email protected]>
    *Sent: *Friday, December 4, 2020 9:09:50 AM
    *Subject: *Re: [QE-users] D3Q code stopped due to davcio error

        Thanks for pointing out the storage issue. Yes, I am running
        it at the French computing centre (University of Bordeaux's
        cluster system (curta, mcia)). Here I am attaching the d3q
        output file. Indeed, it was in the process of computing the
        second q-point triplet.

    I do not have access to  Bordeaux cluster, but I could ask it if
    you need that I look at the code. That said, I see that to compute
    the first q-point it took about 5 days, it will take at least a
    month to do the second point ! Because it has less symmetry the
    code needs to compute 2x more k-points and 3x more perturbations.


        "Maybe for such a large system you can get some decent
        force-constants already from (0,0,0) alone"


        In that case, you mean to implement the "mode=gamma-only" tag?

    Not really, the triple (0,0,0) is in itself the 1x1x1 grid, and
    you can threat it as such. Thanks to some Fourier interpolation
    trickery, you can use it to get the D3 matrices at any point.
    Also, the d3_qq2rr code is not particularly optimized, and is not
    parallelized I'm not sure you would manage to compute the Fourier
    transform of the 2x2x2 grid anyway.

    You have to keep in mind that the 3-body force constant become
    huge very quickly with the number of atoms and the size of the
    grid: each D3 matrix has (3*nat)^3 complex elements, and a grid n
    x n x n contains n^6 power triplets

    In your case, the 2x2x2 grid would use about 2.2GB of RAM, which
    is probably still feasible, but i would try the 1x1x1 first.


    cheers



        Regards,

        Kanka





        Kanka Ghosh
        Postdoctoral Researcher
        I2M-Bordeaux
        University of Bordeaux, CNRS UMR 5295
        Site: Ecole Nationale Supérieure des Arts et Métiers
        Bordeaux-Talence 33400

        ------------------------------------------------------------------------
        *From: *"Lorenzo Paulatto" <[email protected]>_
        *To: *"users" <[email protected]>
        *Sent: *Thursday, December 3, 2020 11:23:58 PM
        *Subject: *Re: [QE-users] D3Q code stopped due to davcio error

             task #        71

                 from davcio : error #      5011
                 error while writing from file
            ".//D3_Q1.0_0_0_Q2.0_0_-1o2_Q3.0_0_1o2/scf.d1.dq1pq1.72"


        I guess it may have run out of space, d3q uses a ton of disk
        space and there is not easy way to avoid this. If you are
        running on any of the French computing centers I can try to
        have a look directly.

        I do not think the change in number of CPUs could cause this
        problem, but if you provide the full output I can check. Also,
        44 atoms is a lot for the d3q code, it seems like you're
        running the second q-point triplet, which is of kind (0,q,-q),
        it takes much more time and disk space than the triplet
        (0,0,0). Maybe for such a large system you can get some decent
        force-constants already from (0,0,0) alone

        cheers


            Kanka Ghosh
            Postdoctoral Researcher
            I2M-Bordeaux
            University of Bordeaux, CNRS UMR 5295
            Site: Ecole Nationale Supérieure des Arts et Métiers
            Bordeaux-Talence 33400

            _______________________________________________
            Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
            users mailing [email protected]
            https://lists.quantum-espresso.org/mailman/listinfo/users


        _______________________________________________
        Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
        users mailing list [email protected]
        https://lists.quantum-espresso.org/mailman/listinfo/users

        _______________________________________________
        Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
        users mailing [email protected]
        https://lists.quantum-espresso.org/mailman/listinfo/users


    _______________________________________________
    Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
    users mailing list [email protected]
    https://lists.quantum-espresso.org/mailman/listinfo/users

    _______________________________________________
    Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
    users mailing [email protected]
    https://lists.quantum-espresso.org/mailman/listinfo/users


_______________________________________________
Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
users mailing list [email protected]
https://lists.quantum-espresso.org/mailman/listinfo/users

_______________________________________________
Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
users mailing list [email protected]
https://lists.quantum-espresso.org/mailman/listinfo/users

_______________________________________________
Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
users mailing list [email protected]
https://lists.quantum-espresso.org/mailman/listinfo/users

Re: [QE-users] D3Q code stopped due to davcio error

Reply via email to