This post was originally done internally so may refer to some of our 
projects but the background information is useful so I have left it in.

We are having an issue with where Google App Engine is preventing us from 
making new deployments.
The error message is:

ERROR: (gcloud.app.deploy) Error Response: [4] Your deployment has failed to 
become healthy in the allotted time and therefore was rolled back. If you 
believe this was an error, try adjusting the 'app_start_timeout_sec' setting in 
the 'readiness_check' section.

This is a surprising error, especially as we haven't had issues with this 
until recently. It appears our changes earlier this year to prepare for the 
new Google App Engine split health checks didn't actually work, so when the 
system was deprecated on September 15th (mentioned here 
https://cloud.google.com/appengine/docs/flexible/custom-runtimes/migrating-to-split-health-checks),
 
no deployments worked from that point on. Health checks specification is 
listed here: 
https://cloud.google.com/appengine/docs/flexible/python/reference/app-yaml#liveness_path
.The error message references the app_start_timout_sec setting, more 
details about this is found here: 
https://cloud.google.com/endpoints/docs/openapi/troubleshoot-aeflex-deployment. 
I didn’t think it was a timeout issue, since our system boots fairly 
quickly (less than the 5 minutes it defaults to) so I investigated the logs 
of a version of the app (from now on going to refer to one of our projects 
- codeWOF (production system) unless specified). The versions only listed 
the ‘working’ versions, but when I looked in the Logs Viewer, all the 
different versions were listed, including those that had failed.With the 
following app.yaml the logs were showing this error:

liveness_check:
    path: "/gae/liveness_check"readiness_check:
    path: "/gae/readiness_check"

Ready for new connections
Compiling message files
Starting gunicorn 19.9.0
Listening at: http://0.0.0.0:8080 (13)
Using worker: gevent
Booting worker with pid: 16
Booting worker with pid: 17
Booting worker with pid: 18
GET 301 0 B 2 ms GoogleHC/1.0 /readiness_check
GET 301 0 B 3 ms GoogleHC/1.0 /liveness_check

This confirmed that the system had booted successfully and the checks were 
getting through but returning the wrong code, a 301 redirect instead of a 
200. But also that the checks were going to the wrong URL, no prefix was 
shown. I believed the redirect was caused by either the APPEND_SLASH 
setting, or the HTTP to HTTPS redirect. I tried the following configuration 
and got the following:

liveness_check:
    path: "/liveness_check/"readiness_check:
    path: "/readiness_check/"

GET 301 0 B 2 ms GoogleHC/1.0 /readiness_check
GET 301 0 B 3 ms GoogleHC/1.0 /liveness_check

Same error as above, so it appears that setting the custom path does not 
affect where the health check is sent. Searching for the custom path in all 
logging messages returns exactly one message (summary below):

2019-11-06 16:24:14.288 NZDT App Engine Create Version default:20191106t032141
livenessCheck: { path: "/liveness_check/" }
readinessCheck: { path: "/readiness_check/" }
Resources: { cpu: 1 memoryGb: 3.75 }

10:04 PM 
<https://deptfunstuff.slack.com/archives/C3ZM56UGH/p1573031083013200>
So this is the first thing to look into, is setting the custom path 
correctly, I couldn’t get this to change.I read all StackOverflow posts 
talking about App Engine and split health checks (there were less than 10 
entries) and tried all suggested fixes. These included:

   - Checking the split health check was set correctly using gcloud app 
   describe --project codewof.
   - Setting the split health checks (again) with gcloud app update 
   --split-health-checks --project codewof.

The last thing I had tried resulted in something quite interesting. I 
deleted all health check settings in the app.yaml files. The documentation (
https://cloud.google.com/appengine/docs/flexible/custom-runtimes/configuring-your-app-with-app-yaml#updated_health_checks)
 
states the following:

By default, HTTP requests from health checks are not forwarded to your 
application container. If you want to extend health checks to your 
application, then specify a path for liveness checks or readiness checks. A 
customized health check to your application is considered successful if it 
returns a 200 OK response code.

This sounded like the overall VM was being checked, rather than the docker 
image running inside of it, and the deployment worked!

GET 200 0 B 2 ms GoogleHC/1.0 /readiness_check
GET 200 0 B 3 ms GoogleHC/1.0 /liveness_check

But if the docker container fails for some reason, Google App Engine 
wouldn’t know there is an issue. We need to look into this scenario and see 
what it actually means, I couldn’t find anything specifying it exactly. 
However this allows us to do urgent deployments.I also tested the following 
to skip HTTPS redirects.settings/production.py

SECURE_REDIRECT_EXEMPT = [
    r'^/?cron/.*',
    r'^/?liveness_check/?$',
    r'^/?readiness_check/?$',
]

liveness_check:
    path: "/liveness_check/"readiness_check:
    path: "/readiness_check/"

GET 301 0 B 2 ms GoogleHC/1.0 /readiness_check
GET 301 0 B 3 ms GoogleHC/1.0 /liveness_check

The last confusing thing I discovered was to do with the codewof-dev 
website’s behaviour conflicting with documentation I had read. I can’t find 
the documentation again but I’m pretty sure it said that the App Engine 
instance will either run the old legacy or new split health checks. But the 
codewof-dev website is running both! Please help :(

GET 200 0 B 2 ms GoogleHC/1.0 /readiness_check
GET 200 2 B 2 ms GoogleHC/1.0 /_ah/health
GET 200 2 B 2 ms GoogleHC/1.0 /_ah/health
GET 200 2 B 2 ms GoogleHC/1.0 /_ah/health
GET 200 2 B 2 ms GoogleHC/1.0 /_ah/health
GET 200 2 B 2 ms GoogleHC/1.0 /_ah/health
GET 200 2 B 2 ms GoogleHC/1.0 /_ah/health
GET 200 0 B 2 ms GoogleHC/1.0 /readiness_check
GET 200 2 B 2 ms GoogleHC/1.0 /_ah/health
GET 200 0 B 2 ms GoogleHC/1.0 /readiness_check
GET 200 2 B 2 ms GoogleHC/1.0 /_ah/health
GET 200 0 B 2 ms GoogleHC/1.0 /readiness_check
GET 200 2 B 2 ms GoogleHC/1.0 /_ah/health
GET 200 0 B 2 ms GoogleHC/1.0 /readiness_check
GET 200 0 B 3 ms GoogleHC/1.0 /liveness_check

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/a85526d9-21a5-42d4-8e26-ad2b052f4abd%40googlegroups.com.

Reply via email to