Dear All, I had a simulation running for nearly 5 days and it stops today with no reason, no errors, and no termination. the first thing I need help with is that I can not find the cause that the simulation has been stopped. The last lines during the simulation have been attached as a text file.
The second problem is that I can not restart from the checkpoint. there is an error : ./simfactory/bin/sim submit the-last-one --parfile=par/bbh-2res-1mass-10sep-final.par --procs=56 Error: job id is negative Aborting Simfactory. I looked up in email archives, and I did what Roland has suggested, to add a line for jobid, (jobid = 999999) in the properties.ini file, but I am still getting errors ./simfactory/bin/sim submit the-last-one --parfile=par/bbh-2res-1mass-10sep-final.par --procs=56 Warning: job status is U Warning: job status is U Assigned restart id: 1 Warning: Too many used cores per node specified: specified ppn-used=56 (ppn is 28) Executing submit command: exec nohup /home/cosmo/simulations/the-last-one/output-0001/SIMFACTORY/SubmitScript < /dev/null > /dev/null 2> /dev/null & echo $! Submit finished, job id is 8907 I changed the lines in the properties.ini file for procs, and again getting error ./simfactory/bin/sim submit the-last-one --parfile=par/bbh-2res-1mass-10sep-final.par Assigned restart id: 1 Executing submit command: exec nohup /home/cosmo/simulations/the-last-one/output-0001/SIMFACTORY/SubmitScript < /dev/null > /dev/null 2> /dev/null & echo $! Submit finished, job id is 10517 And finally, I am confused about the option for the "ppn, procs, and ..." numbers in the Simfactory. I have attached my CPU information. It is a double 14 core Xeon(R) CPU E5-2680, with 2 threads per core. my submission command was: ./simfactory/bin/sim create-run the-last-one --parfile=par/bbh-2res-1mass-10sep-final.par --procs=56 --ppn-used=56 but in the properties.ini file, it is mentioned that: numprocs = 4 nodeprocs = 4 numthreads = 14 I have also attached the properties.ini file. Is it using only 4 cores? I looked up in the Simfactory docs, and also ET's wiki. I can not get a clear picture of how the option of the number of processors works. However, with the same command line, I have mentioned above, --procs=56 --ppn-used=56, the simulation was performing well, I want to know if it is using total number of processors on my system or not. I would be grateful if anyone could help me with each of these issues. Attachments are: parameter file, properties.ini, simulation-last-lines, CPU info, and the log.txt file. Sincerely, Hassan -- *Hassan KhalvatiSharif University of Technology, [email protected] <[email protected]>[email protected] <[email protected]>*
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::Creating
simulation the-last-one
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::Simulation
directory: /home/cosmo/simulations/the-last-one
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::Simulation
Properties:
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::[properties]
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::machine
= cosmo-Super-Server
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::simulationid
=
simulation-the-last-one-cosmo-Super-Server-cosmo-Super-Server-cosmo-2019.09.08-13.49.21-40523
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::sourcedir
= /home/cosmo/ET/Cactus
[LOG:2019-09-08 13:49:21] restart.create(simulationName,
parfile)::configuration = sim
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::configid
= config-sim-cosmo-Super-Server-home-cosmo-ET-Cactus
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::buildid
= build-sim-cosmo-Super-Server-cosmo-2019.05.07-09.43.14-31763
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::testsuite
= False
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::executable
= /home/cosmo/simulations/the-last-one/SIMFACTORY/exe/cactus_sim
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::optionlist
= /home/cosmo/simulations/the-last-one/SIMFACTORY/cfg/OptionList
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::submitscript
= /home/cosmo/simulations/the-last-one/SIMFACTORY/run/SubmitScript
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::runscript
= /home/cosmo/simulations/the-last-one/SIMFACTORY/run/RunScript
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::parfile
=
/home/cosmo/simulations/the-last-one/SIMFACTORY/par/bbh-2res-1mass-10sep-final.par
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::Simulation
the-last-one created
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::Creating new
properties because this is an independant run, not a run following a submit
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::Determined the
following properties
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::[properties]
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::machine =
cosmo-Super-Server
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::simulationid =
simulation-the-last-one-cosmo-Super-Server-cosmo-Super-Server-cosmo-2019.09.08-13.49.21-40523
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::sourcedir =
/home/cosmo/ET/Cactus
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::configuration = sim
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::configid =
config-sim-cosmo-Super-Server-home-cosmo-ET-Cactus
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::buildid =
build-sim-cosmo-Super-Server-cosmo-2019.05.07-09.43.14-31763
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::testsuite =
False
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::executable =
/home/cosmo/simulations/the-last-one/SIMFACTORY/exe/cactus_sim
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::optionlist =
/home/cosmo/simulations/the-last-one/SIMFACTORY/cfg/OptionList
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::submitscript =
/home/cosmo/simulations/the-last-one/SIMFACTORY/run/SubmitScript
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::runscript =
/home/cosmo/simulations/the-last-one/SIMFACTORY/run/RunScript
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::parfile =
/home/cosmo/simulations/the-last-one/SIMFACTORY/par/bbh-2res-1mass-10sep-final.par
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::numprocs = 4
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::nodeprocs = 4
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::numthreads = 14
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::hostname =
cosmo-Super-Server
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::ppn = 28
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::ppnused = 56
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::procsrequested = 28
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::pbsSimulationName=
the-last-one-00
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::cpufreq =
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::user =
cosmo
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::memory = 0
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::nodes = 1
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::procs = 56
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::numsmt = 1
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::
[LOG:2019-09-08 13:49:21] self.makeActive()::Simulation the-last-one with
restart-id 0 has been made active
[LOG:2019-09-08 13:49:21] self.run(debug)::Prepping for execution/run
[LOG:2019-09-08 13:49:21] checkpointing =
self.PrepareCheckpointing(recover_id)::PrepareCheckpointing: max_restart_id: -1
[LOG:2019-09-08 13:49:21] self.run(debug)::Defined substitution properties for
execution/run
[LOG:2019-09-08 13:49:21] self.run(debug)::{'SIMULATION_ID':
'simulation-the-last-one-cosmo-Super-Server-cosmo-Super-Server-cosmo-2019.09.08-13.49.21-40523',
'NODE_PROCS': 4, 'PPN_USED': 56, 'PPN': 28, 'CPUFREQ': None, 'USER': 'cosmo',
'RUNDIR': '/home/cosmo/simulations/the-last-one/output-0000', 'NODES': 1,
'SIMULATION_NAME': 'the-last-one', 'NUM_THREADS': 14, 'EXECUTABLE':
'/home/cosmo/simulations/the-last-one/SIMFACTORY/exe/cactus_sim',
'PROCS_REQUESTED': 28, 'RESTART_ID': 0, 'NUM_SMT': 1, 'CONFIGURATION': 'sim',
'PROCS': 56, 'SUBMITSCRIPT':
'/home/cosmo/simulations/the-last-one/SIMFACTORY/run/SubmitScript', 'MACHINE':
'cosmo-Super-Server', 'PARFILE':
'/home/cosmo/simulations/the-last-one/output-0000/bbh-2res-1mass-10sep-final.par',
'SOURCEDIR': '/home/cosmo/ET/Cactus', 'HOSTNAME': 'cosmo-Super-Server',
'RUNDEBUG': 0, 'NUM_PROCS': 4, 'SCRIPTFILE':
'/home/cosmo/simulations/the-last-one/SIMFACTORY/run/SubmitScript', 'MEMORY':
'0', 'SHORT_SIMULATION_NAME': 'the-last-one-00'}
[LOG:2019-09-08 13:49:21] self.run(debug)::Executing run command:
/home/cosmo/simulations/the-last-one/output-0000/SIMFACTORY/RunScript
properties.ini
Description: Binary data
bbh-2res-1mass-10sep-final.par
Description: Binary data
simulation-last-lines
Description: Binary data
cpu
Description: Binary data
_______________________________________________ Users mailing list [email protected] http://lists.einsteintoolkit.org/mailman/listinfo/users
