With a 500k machine cluster I suggest getting professional Go support - someone
experienced in troubleshooting that can sit with you and review the code and
configuration to diagnose the issue.
Personally it sounds like overallicated machines causing thrashing delays in
the context switching.
Sorry for a mistake: 'hyperthread closed', hyperthread is actually on.
在2021年6月22日星期二 UTC+8 下午10:01:48 写道:
> I just checked the monitor data and found that the machine suffered from
> high 'load average'(about 30+) at approximately the time the agent get
> stuck.
> A 24 cores(2 CPUs * 14 cores)
I just checked the monitor data and found that the machine suffered from
high 'load average'(about 30+) at approximately the time the agent get
stuck.
A 24 cores(2 CPUs * 14 cores), hyperthread closed machine with load average
over 30 seems bad. But after the load average got down to below 1, th
>
> He is stating he has a cloud cluster consisting of 500k machines - each
> machine runs an agent process - each agent has 7000 Go routines.
>
Aha. Yes, this is what I mean.
> Sorry, now I am completely confused.
>
> So, you have about 500,000 *processes *running this agent on each
>>> ma
He is stating he has a cloud cluster consisting of 500k machines - each machine
runs an agent process - each agent has 7000 Go routines.
> On Jun 22, 2021, at 7:07 AM, jake...@gmail.com wrote:
>
>
> Sorry, now I am completely confused.
>
>>> So, you have about 500,000 processes running thi
Sorry, now I am completely confused.
So, you have about 500,000 *processes *running this agent on each machine,
>> and each process has around 7,000 gorouines? Is that correct?
>>
>
> Yes, that's exactly what I mean.
>
but then you say: "Only one process per machine".
Is there a language ba
Only one process per machine. We use '*taskset -c
$last_2nd_core,$last_3rd_core,$last_4th_core ./agent -c ../conf/agent.toml*'
to start the agent. I wonder if it has any relationship with this problem ?
在2021年6月22日星期二 UTC+8 上午12:56:13 写道:
> How many processes per machine? It seems like scheduli
How many processes per machine? It seems like scheduling latency to me.
> On Jun 21, 2021, at 6:31 AM, Peter Z wrote:
>
>
>> So, you have about 500,000 processes running this agent on each machine, and
>> each process has around 7,000 gorouines? Is that correct?
>
> Yes, that's exactly wha
>
> So, you have about 500,000 *processes *running this agent on each
> machine, and each process has around 7,000 gorouines? Is that correct?
>
Yes, that's exactly what I mean.
--
You received this message because you are subscribed to the Google Groups
"golang-nuts" group.
To unsubscribe
Could you clarify something? You say:
" We have about half a million agents running on each of our machines"
in your initial message. I thought maybe it was a language thing, and you
meant 500,000 goroutines. But then you said:
"There are 7000 goroutines total"
So, you have about 500,000 *process
>
> On Thu, Jun 17, 2021 at 9:19 AM Peter Z wrote:
> >
> > The original post is on stackoverflow
> https://stackoverflow.com/questions/67999117/unexpected-stuck-in-sync-pool-get
>
> >
> > Golang ENV:
> > go1.14.3 linux/amd64
> >
> > Description:
> > We have about half a million agents r
You’re right. Inspecting the code it is internally partitioned by P.
I agree that it looks like the pool is being continually created.
> On Jun 17, 2021, at 12:18 PM, Ian Lance Taylor wrote:
>
> On Thu, Jun 17, 2021 at 9:19 AM Peter Z wrote:
>> The original post is on stackoverflow
>> h
On Thu, Jun 17, 2021 at 10:11 AM Robert Engels wrote:
>
> You probably need multiple pools in and partition them. 500k accessors of a
> shared lock is going to have contention.
That might well help, but note that sync.Pool does not have a shared
lock in general use. The shared lock is only used
On Thu, Jun 17, 2021 at 9:19 AM Peter Z wrote:
>
> The original post is on stackoverflow
> https://stackoverflow.com/questions/67999117/unexpected-stuck-in-sync-pool-get
>
> Golang ENV:
> go1.14.3 linux/amd64
>
> Description:
> We have about half a million agents running on each of our machine
You probably need multiple pools in and partition them. 500k accessors of a
shared lock is going to have contention.
github.com/robaho/go-concurrency-test might be helpful.
> On Jun 17, 2021, at 11:19 AM, Peter Z wrote:
>
>
> The original post is on stackoverflow
> https://stackoverflo
The original post is on stackoverflow
https://stackoverflow.com/questions/67999117/unexpected-stuck-in-sync-pool-get
Golang ENV:
go1.14.3 linux/amd64
Description:
We have about half a million agents running on each of our machines.The
agent is written in Go. Recently we found that the agent
16 matches
Mail list logo