When trying to deploy my App Engine Flex app today, I am getting this 
error, after the build has completed.

`ERROR: (gcloud.app.deploy) Error Response: [9] An internal error occurred 
while processing task 
/app-engine-flex/flex_await_healthy/flex_await_healthy>2022-03-12T14:09:32.742Z6575.hg.2:
 
The region us-west2 does not have enough resources available to fulfill the 
request. Please try again later.`

After about 10x attempts, and about 2 hours, it killed off my previous 
running version's two instances, and so now my company's app it down, and I 
cannot bring it back up. 

I get this error pretty regularly when deploying new versions (almost every 
time) but it usually succeeds after a couple of attempts, so I've just 
lived with it. But now my site is down and it's 4 hours later, and I'd 
really like to know if there's something I can do to fix it, or is it just 
a case of waiting for the zone to have more capacity?

A newly deployed version shows up in the console (and command line version 
list command), for a while but then disappears on its own.

I have checked my quotas under IAM & Admin, and nothing is above 30% 
allocation at most. Besides now my site is not running, I don't have many 
resources in use.

I noticed the previous version had 2 instances which seemed stuck in 
"restarting" state from a couple of weeks ago. I killed them off manually 
in the console thinking these might have been consuming resources. I wonder 
if this has somehow skewed the auto-scaler? I was hoping it would 
eventually repair itself, as it seems GCP sometimes takes a while to do 
stuff in the background.

I have tried restarting the previous version in the console, but it just 
sits at 0 instances. It's autoscaled, but the autoscaling has stopped 
working. I tried stopping it, waiting, then restarting it. 

I have checked the stackdriver logs and it's definitely a 
ZONE_RESOURCE_POOL_EXHAUSTED 
error.
e.g.
serviceName: "compute.googleapis.com"
status: {
code: 8
details: [
0: {
@type: "type.googleapis.com/google.protobuf.Struct"
value: {
zoneResourcePoolExhausted: {
resource: {
project: {
canonicalProjectId: "XXX"
}
resourceName: "us-west2-b"
resourceType: "ZONE"
scope: {
scopeName: "global"
scopeType: "GLOBAL"
}
}
}
}
}
]
message: "ZONE_RESOURCE_POOL_EXHAUSTED"
}


I have tried increasing the readiness_check: app_start_timeout_sec and 
increasing failure_threshold and timeouts etc in case this was on the edge, 
but judging by the logs, the instance doesn't even begin to get booted (due 
to the VM not being allocated).

I tried re-deploying the previous version again.

I tried stopping the current version (which previously was "SERVING" but 
with 0 instances) and then deploying, but this doesn't help. So at this 
point I'm deploying over nothing running at all in my project, confirming 
it cannot be quotas.

I noticed in my service logs though, seemingly inconsistent reports of the 
number of instances. This doesn't make sense because the are NO instances 
running either before or after deployment.

2022-03-12 13:09:55.422 GMT
The number of running VMs for version 20220211t141828 changed from 2 to 1
2022-03-12 13:10:19.089 GMT
The number of running VMs for version 20220211t141828 changed from 1 to 3
2022-03-12 13:10:25.919 GMT
The number of running VMs for version 20220211t141828 changed from 3 to 4
2022-03-12 13:10:41.160 GMT
The number of running VMs for version 20220211t141828 changed from 4 to 2

I tried deploying the app to a different service name, (and was going to 
change my dispatch to reroute to that) but that service deployment failed 
with the same error.

The status pages look OK. 

I've tried --verbosity=debug which didn't reveal any extra info.

I've read every post I can find (including my own previous post in this 
group where is was a "prerequesite" error caused by quotas), and the only 
thing I seem to be left with is migrating my app to a new project in a more 
reliable zone like us-central? However this will be a lot of work as I'm 
using GCS, Functions, and networking to other providers which will all have 
to be migrated.

Is there any way to get more detailed information on the resource problem?

thanks
Pip Jones

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/7aa4e573-364e-4555-9d7c-80dd5f7c43c1n%40googlegroups.com.

Reply via email to