Hello,
first of all thanks for your reply. I tried specifying the --slot-list option like you proposed. This will unfortunately lead to the same mpirun with 5 cores. Adding another slot-list command for the second program, e.g.
mpirun -np 4 --slot-list 0-3 prog_1 : -np 1 --slot-list 0 prog_2
will actually run on only 4 cores, but now it takes more than triple the time as needed before on 5 cores. I suppose there should be some overhead because of the oversubscription but that definitely seems too much. Any suggestions?