On 02/11/2013 12:20 AM, Christopher Samuel wrote:
> One of the nice things in Torque that a number of our users use is
> interactive jobs with X11 forwarding so they can run various codes with
> graphical interfaces on the compute nodes (such as VMD, or MATLAB, etc).
Hi Chris,
To solve this when
I think we've seen this problem as well. When using ThreadsPerCore along
with DefMemPerCpu or --mem-per-cpu, the worker node doesn't set the
memory limit correctly. Setting --mem instead should work OK.
I have a lightly tested patch that appears to address the issue, but
I've only tried it on
On 06/18/2013 05:53 PM, Eva Hocks wrote:
> any sacct command returns:
>
> slurmctld.log:[2013-06-18T14:45:57-07:00] error: Association database
> appears down, reading from state file.
In slurm.conf, are you using
"AccountingStorageType=accounting_storage/slurmdbd" ?
If so, check that slurmdbd
On 06/19/2013 10:36 AM, Paul Edmon wrote:
> I have a group here that wants to submit a ton of jobs to the queue, but
> want to restrict how many they have running at any given time so that
> they don't torch their fileserver.
The licenses feature might work OK for this. Create a license for the
On 07/10/2013 06:16 PM, Eva Hocks wrote:
> The entry in partiton.conf:
> PartitionName=CLUSTER Default=yes State=UP
> nodes=gpu-[1]-[4-17],gpu-[2]-[4,6-16],gpu-[3]-[9]
>
>
> causes slurmctl to crash:
>
> 2013-07-10T16:03:22.923] error: find_node_record: lookup failure for
> gpu-[2]-[4]
> [2013-0
p) and 6-16] (which is
>
> actually no node name at all but a wrong parsing after the failure)
>
>
>
> Thanks
>
> Eva
>
>
>
> On Wed, 10 Jul 2013, John Thiltges wrote:
>
>
>
>> On 07/10/2013 06:16 PM, Eva Hocks wrote:
>>> The ent
On 08/05/2013 06:52 PM, Kevin Abbey wrote:
I started using cgroups for control memory usage last week. One user
reported his application takes 4 times longer to complete. I read
elsewhere that cgroup mem. control can reduce performance. Is this
amount realistic?
Is there a more efficient met
On 08/18/2013 11:51 PM, Christopher Samuel wrote:
On 19/08/13 12:02, Christopher Samuel wrote:
Chasing through the code it looks like getgrnam_r() fails in
get_group_members() in src/slurmctld/groups.c on our Slurm 2.6 boxes
(on RHEL 6.4)
Scratch that, restarting slurmctld doesn't provoke the
Hi folks,
We're seeing what looks like incorrect step accounting in the MySQL
database when we're using preemption and PreemptMode=REQUEUE. When a job
is requeued, a new job_table row is created with a new job_db_inx. Next,
a step_table record is created and uses the new job_db_inx, instead of