Dear All,
I had a simulation running for nearly 5 days and it stops today with no
reason, no errors, and no termination.
the first thing I need help with is that I can not find the cause that the
simulation has been stopped. The last lines during the simulation have been
attached as a text file.


The second problem is that I can not restart from the checkpoint. there is
an error :

 ./simfactory/bin/sim submit the-last-one
--parfile=par/bbh-2res-1mass-10sep-final.par --procs=56
Error: job id is negative
Aborting Simfactory.


 I looked up in email archives, and I did what Roland has suggested, to add
a line for jobid, (jobid = 999999) in the properties.ini file, but I am
still getting errors

./simfactory/bin/sim submit the-last-one
--parfile=par/bbh-2res-1mass-10sep-final.par --procs=56
Warning: job status is U
Warning: job status is U
Assigned restart id: 1
Warning: Too many used cores per node specified: specified ppn-used=56 (ppn
is 28)
Executing submit command: exec nohup
/home/cosmo/simulations/the-last-one/output-0001/SIMFACTORY/SubmitScript <
/dev/null > /dev/null 2> /dev/null & echo $!
Submit finished, job id is 8907



I changed the lines in the properties.ini file for procs, and again getting
error


./simfactory/bin/sim submit the-last-one
--parfile=par/bbh-2res-1mass-10sep-final.par
Assigned restart id: 1
Executing submit command: exec nohup
/home/cosmo/simulations/the-last-one/output-0001/SIMFACTORY/SubmitScript <
/dev/null > /dev/null 2> /dev/null & echo $!
Submit finished, job id is 10517

And finally, I am confused about the option for the "ppn, procs, and ..."
numbers in the Simfactory. I have attached my CPU information. It is a
double 14 core Xeon(R) CPU E5-2680, with 2 threads per core. my submission
command was:
./simfactory/bin/sim create-run the-last-one
--parfile=par/bbh-2res-1mass-10sep-final.par --procs=56 --ppn-used=56
but in the properties.ini file, it is mentioned that:
numprocs        = 4
nodeprocs       = 4
numthreads      = 14
I have also attached the properties.ini file. Is it using only 4 cores? I
looked up in the Simfactory docs, and also ET's wiki. I can not get a clear
picture of how the option of the number of processors works. However, with
the same command line, I have mentioned above, --procs=56 --ppn-used=56,
the simulation was performing well, I want to know if it is using total
number of processors on my system or not. I would be grateful if anyone
could help me with each of these issues.

Attachments are:
parameter file,
properties.ini,
simulation-last-lines,
CPU info,
and the log.txt file.



Sincerely,
Hassan


-- 




*Hassan KhalvatiSharif University of Technology,
[email protected]
<[email protected]>[email protected]
<[email protected]>*
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::Creating 
simulation the-last-one
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::Simulation 
directory: /home/cosmo/simulations/the-last-one
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::Simulation 
Properties:
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::[properties]
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::machine      
   = cosmo-Super-Server
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::simulationid 
   = 
simulation-the-last-one-cosmo-Super-Server-cosmo-Super-Server-cosmo-2019.09.08-13.49.21-40523
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::sourcedir    
   = /home/cosmo/ET/Cactus
[LOG:2019-09-08 13:49:21] restart.create(simulationName, 
parfile)::configuration   = sim
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::configid     
   = config-sim-cosmo-Super-Server-home-cosmo-ET-Cactus
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::buildid      
   = build-sim-cosmo-Super-Server-cosmo-2019.05.07-09.43.14-31763
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::testsuite    
   = False
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::executable   
   = /home/cosmo/simulations/the-last-one/SIMFACTORY/exe/cactus_sim
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::optionlist   
   = /home/cosmo/simulations/the-last-one/SIMFACTORY/cfg/OptionList
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::submitscript 
   = /home/cosmo/simulations/the-last-one/SIMFACTORY/run/SubmitScript
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::runscript    
   = /home/cosmo/simulations/the-last-one/SIMFACTORY/run/RunScript
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::parfile      
   = 
/home/cosmo/simulations/the-last-one/SIMFACTORY/par/bbh-2res-1mass-10sep-final.par
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::
[LOG:2019-09-08 13:49:21] restart.create(simulationName, parfile)::Simulation 
the-last-one created
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::Creating new 
properties because this is an independant run, not a run following a submit
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::Determined the 
following properties
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::[properties]
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::machine         = 
cosmo-Super-Server
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::simulationid    = 
simulation-the-last-one-cosmo-Super-Server-cosmo-Super-Server-cosmo-2019.09.08-13.49.21-40523
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::sourcedir       = 
/home/cosmo/ET/Cactus
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::configuration   = sim
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::configid        = 
config-sim-cosmo-Super-Server-home-cosmo-ET-Cactus
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::buildid         = 
build-sim-cosmo-Super-Server-cosmo-2019.05.07-09.43.14-31763
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::testsuite       = 
False
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::executable      = 
/home/cosmo/simulations/the-last-one/SIMFACTORY/exe/cactus_sim
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::optionlist      = 
/home/cosmo/simulations/the-last-one/SIMFACTORY/cfg/OptionList
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::submitscript    = 
/home/cosmo/simulations/the-last-one/SIMFACTORY/run/SubmitScript
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::runscript       = 
/home/cosmo/simulations/the-last-one/SIMFACTORY/run/RunScript
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::parfile         = 
/home/cosmo/simulations/the-last-one/SIMFACTORY/par/bbh-2res-1mass-10sep-final.par
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::numprocs        = 4
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::nodeprocs       = 4
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::numthreads      = 14
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::hostname        = 
cosmo-Super-Server
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::ppn             = 28
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::ppnused         = 56
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::procsrequested  = 28
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::pbsSimulationName= 
the-last-one-00
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::cpufreq         = 
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::user            = 
cosmo
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::memory          = 0
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::nodes           = 1
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::procs           = 56
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::numsmt          = 1
[LOG:2019-09-08 13:49:21] restart.userRun(simulationName)::
[LOG:2019-09-08 13:49:21] self.makeActive()::Simulation the-last-one with 
restart-id 0 has been made active
[LOG:2019-09-08 13:49:21] self.run(debug)::Prepping for execution/run
[LOG:2019-09-08 13:49:21] checkpointing = 
self.PrepareCheckpointing(recover_id)::PrepareCheckpointing: max_restart_id: -1
[LOG:2019-09-08 13:49:21] self.run(debug)::Defined substitution properties for 
execution/run
[LOG:2019-09-08 13:49:21] self.run(debug)::{'SIMULATION_ID': 
'simulation-the-last-one-cosmo-Super-Server-cosmo-Super-Server-cosmo-2019.09.08-13.49.21-40523',
 'NODE_PROCS': 4, 'PPN_USED': 56, 'PPN': 28, 'CPUFREQ': None, 'USER': 'cosmo', 
'RUNDIR': '/home/cosmo/simulations/the-last-one/output-0000', 'NODES': 1, 
'SIMULATION_NAME': 'the-last-one', 'NUM_THREADS': 14, 'EXECUTABLE': 
'/home/cosmo/simulations/the-last-one/SIMFACTORY/exe/cactus_sim', 
'PROCS_REQUESTED': 28, 'RESTART_ID': 0, 'NUM_SMT': 1, 'CONFIGURATION': 'sim', 
'PROCS': 56, 'SUBMITSCRIPT': 
'/home/cosmo/simulations/the-last-one/SIMFACTORY/run/SubmitScript', 'MACHINE': 
'cosmo-Super-Server', 'PARFILE': 
'/home/cosmo/simulations/the-last-one/output-0000/bbh-2res-1mass-10sep-final.par',
 'SOURCEDIR': '/home/cosmo/ET/Cactus', 'HOSTNAME': 'cosmo-Super-Server', 
'RUNDEBUG': 0, 'NUM_PROCS': 4, 'SCRIPTFILE': 
'/home/cosmo/simulations/the-last-one/SIMFACTORY/run/SubmitScript', 'MEMORY': 
'0', 'SHORT_SIMULATION_NAME': 'the-last-one-00'}
[LOG:2019-09-08 13:49:21] self.run(debug)::Executing run command: 
/home/cosmo/simulations/the-last-one/output-0000/SIMFACTORY/RunScript

Attachment: properties.ini
Description: Binary data

Attachment: bbh-2res-1mass-10sep-final.par
Description: Binary data

Attachment: simulation-last-lines
Description: Binary data

Attachment: cpu
Description: Binary data

_______________________________________________
Users mailing list
[email protected]
http://lists.einsteintoolkit.org/mailman/listinfo/users

Reply via email to