Very temping but the user would have my head :-)
On my previous post I think I may have found the issue. I turned down max_reservation from 100 to 10 and sge_qmaster seems to be taking a breather and not staying at 100% cpu utilization. Time will
tell.
Thank you all,
Joseph
On 1/26/2019 1
Hi Ian.
I have BLCR checkpoint implemented and even-though the context files are large,
the FS keeps up.
I've been playing with adjusting qconf ( qconf -msconf ) vales I think I just
hit gold. I set max_reservation really low and
max_reservation 10
And this is the first time sge_qmaster h
IO issues? NFS server providing data and possibly jobs running over NDS
shares as opposed to running on local disk?
On Sat, Jan 26, 2019, 11:23 AM Joseph Farran Hi Daniel.
>
> Yes I do have large job-arrays around 7k tasks BUT I have had larger job
> arrays of 500k without seeing this kind of slo
It may depend on specific features of those large job arrays. You could
try deleting them and see if the problem disappears.
On Sat, Jan 26, 2019 at 2:23 PM Joseph Farran wrote:
> Hi Daniel.
>
> Yes I do have large job-arrays around 7k tasks BUT I have had larger job
> arrays of 500k without se
Hi Daniel.
Yes I do have large job-arrays around 7k tasks BUT I have had larger job arrays
of 500k without seeing this kind of slowdown.
Joseph
On 1/26/2019 10:16 AM, Daniel Povey wrote:
Check if there are any huge jobs in the queue. Sometimes very large task
ranges, or large numbers of job
Hi Reuti.
Yes - several times
with no success.
Joseph
On 1/26/2019 4:03 AM, Reuti wrote:
Hi,
Am 26.01.2019 um 10:20 schrieb Joseph Farran :
Hi.
Our Grid Engine is running very sluggish all of a sudden. Sqe_qmaster stays at 100% a
Check if there are any huge jobs in the queue. Sometimes very large task
ranges, or large numbers of jobs, can make it slow.
On Sat, Jan 26, 2019 at 7:05 AM Reuti wrote:
> Hi,
>
> > Am 26.01.2019 um 10:20 schrieb Joseph Farran :
> >
> > Hi.
> > Our Grid Engine is running very sluggish all of a
Hi,
> Am 26.01.2019 um 10:20 schrieb Joseph Farran :
>
> Hi.
> Our Grid Engine is running very sluggish all of a sudden. Sqe_qmaster stays
> at 100% all the time where is used to be 100% for a few seconds every 30
> seconds or so.
> I ran the qping command but not sure how to read it. Any hel
Hi.
Our Grid Engine is running very sluggish all of a sudden. Sqe_qmaster stays at
100% all the time where is used to be 100% for a few seconds every 30 seconds
or so.
I ran the qping command but not sure how to read it. Any helpful insight much
appreciated
qping -i 5 -info hpc-s 6444 qmaste