Dear Steve, Thank you for your reply. I tried the same simulation with a finer grid, and the simulation started working fine, even though very slow (looks like due to slow inter-node communication), but it did work out. I could see a few iterations towards the final couple of hours from the wall time.
Turns out, a simulation with GRhydro, in such cases (where the grid needs to be finer), would end with an error saying, "*the grid structure inconsistent. Impossible to continue*". On the other hand, a simulation with IllinoisGRMHD stops abruptly during the thorn setup (somewhere around the SpaceMask and AHFinderDirect setup). Later I tried to see if I can pace up the simulation, but looks like the inter-node communication is very slow in the HPC, which may be an inherent problem with the HPC since it is a very old one. Regards Shamim Haque Senior Research Fellow (SRF) Department of Physics IISER Bhopal ᐧ On Tue, May 23, 2023 at 10:08 PM Steven R. Brandt <sbra...@cct.lsu.edu> wrote: > Sorry that no one has replied to you in a while. Are you still > experiencing this difficulty? > > --Steve > On 4/4/2023 3:08 AM, Shamim Haque 1910511 wrote: > > Dear Steven, > > I assure you that I submitted the simulation for the first time only. I > used "sim create-submit" to submit the simulation, which would not submit > the job if the same name was executed earlier. > > Secondly, I found this same message appearing in the output files from > debug queue (1 node, with GRHydro) and high memory node (3 nodes, with > IllinoisGRMHD), here the simulation ran successfully. I have attached the > output files for reference. > > Regards > Shamim Haque > Senior Research Fellow (SRF) > Department of Physics > IISER Bhopal > > ᐧ > > On Tue, Apr 4, 2023 at 12:35 AM Steven R. Brandt <sbra...@cct.lsu.edu> > wrote: > >> I see this error message in your output: >> >> -> [0m No HDF5 checkpoint files with basefilename 'checkpoint.chkpt' >> and file extension '.h5' found in recovery directory >> 'nsns_toy1.2_DDME2BPS_quark_1.2vs1.6M_40km_g25' >> >> I suspect you did a "sim submit" for a job, got a failure, and did a >> second "sim submit" without purging. That immediately triggered the error. >> Then, for some reason, MPI didn't shut down cleanly and the processes hung >> doing nothing until they used up the walltime. >> >> --Steve >> On 4/2/2023 5:16 AM, Shamim Haque 1910511 wrote: >> >> Hello, >> >> I am trying to run BNSM using IllinoisGRMHD on HPC Kanad at IISER Bhopal. >> While I have tested the parfile to be running fine on debug queue (1 node) >> and high memory queue (3 nodes), I am unable to run the simulation in a >> queue with 9 nodes (144 cores). >> >> The output file suggests that the setup of listed thorns is not complete >> within 24 hours, which is the max walltime for this queue. >> >> Is there a way to sort out this issue? I have attached the parfile and >> outfile for reference. >> >> Regards >> Shamim Haque >> Senior Research Fellow (SRF) >> Department of Physics >> IISER Bhopal >> ᐧ >> >> _______________________________________________ >> Users mailing >> listUsers@einsteintoolkit.orghttp://lists.einsteintoolkit.org/mailman/listinfo/users >> >> _______________________________________________ >> Users mailing list >> Users@einsteintoolkit.org >> http://lists.einsteintoolkit.org/mailman/listinfo/users >> >
_______________________________________________ Users mailing list Users@einsteintoolkit.org http://lists.einsteintoolkit.org/mailman/listinfo/users