Hey, Seems like for some reason the resumable upload session that the google python library uses for large files disappeared. No idea how common or why that would happen, but unfortunately the google library doesn't seem to retry those errors themselves anymore now that we stopped using the deprecated num_retries parameter directly.
Are you open to hacking your wal-e installation a bit to see if just checking for the GoogleAPICallError (and maybe specifically 410) and retrying would fix this? We can think about more cleaner solutions afterwards. Checking for the exception somewhere in this if-branch ( https://github.com/wal-e/wal-e/blob/master/wal_e/worker/upload.py#L119) and making sure it doesn't get to the else block and raised should force wal-e to retry the upload. Is this something you could try adding to your local installation and see if it fixes the situation for you? Cheers, Samuel On Fri, Jun 21, 2019 at 9:43 AM 'Yun Guo' via wal-e <[email protected]> wrote: > > Hi, > > We are using wal-e v1.1 to backup GCS. The total backup is around 3.2T . > We noticed the wal-e processed failed HTTP/410 sporadically and below is > the log. > > Jun 21 02:30:57 wal_e.worker.upload INFO MSG: beginning volume > compression#012 DETAIL: Building volume 1142.#012 STRUCTURED: > time=2019-06-21T02:30:57.666929-00 pid=37373Jun 21 02:30:58 > wal_e.worker.upload INFO MSG: beginning volume compression#012 > DETAIL: Building volume 1143.#012 STRUCTURED: > time=2019-06-21T02:30:58.958880-00 pid=37373Jun 21 02:31:13 > wal_e.worker.upload INFO MSG: beginning volume compression#012 > DETAIL: Building volume 1144.#012 STRUCTURED: > time=2019-06-21T02:31:13.820819-00 pid=37373Jun 21 02:31:14 > wal_e.operator.backup WARNING MSG: blocking on sending WAL segments#012 > DETAIL: The backup was not completed successfully, but we have to wait > anyway. See README: TODO about pg_cancel_backup#012 STRUCTURED: > time=2019-06-21T02:31:14.716392-00 pid=37373Jun 21 02:31:17 wal_e.main > CRITICAL MSG: An unprocessed exception has avoided all error handling#012 > DETAIL: Traceback (most recent call last):#012 File > "/opt/wal-e/lib/python3.5/site-packages/google/cloud/storage/blob.py", line > 1041, in upload_from_file#012 size, num_retries, > predefined_acl)#012 File > "/opt/wal-e/lib/python3.5/site-packages/google/cloud/storage/blob.py", line > 957, in _do_upload#012 num_retries, predefined_acl)#012 > File "/opt/wal-e/lib/python3.5/site-packages/google/cloud/storage/blob.py", > line 904, in _do_resumable_upload#012 response = > upload.transmit_next_chunk(transport)#012 File > "/opt/wal-e/lib/python3.5/site-packages/google/resumable_media/requests/upload.py", > line 396, in transmit_next_chunk#012 > self._process_response(result, len(payload))#012 File > "/opt/wal-e/lib/python3.5/site-packages/google/resumable_media/_upload.py", > line 574, in _process_response#012 self._get_status_code, > callback=self._make_invalid)#012 File > "/opt/wal-e/lib/python3.5/site-packages/google/resumable_media/_helpers.py", > line 93, in require_status_code#012 status_code, u'Expected one > of', *status_codes)#012 google.resumable_media.common.InvalidResponse: > ('Request failed with status code', 410, 'Expected one of', <HTTPStatus.OK: > 200>, 308)#012 #012 During handling of the above exception, > another exception occurred:#012 #012 Traceback (most recent > call last):#012 File > "/opt/wal-e/lib/python3.5/site-packages/wal_e/retries.py", line 87, in > shim#012 return f(*args, **kwargs)#012 File > "/opt/wal-e/lib/python3.5/site-packages/wal_e/worker/upload.py", line 140, in > put_file_helper#012 return self.blobstore.uri_put_file(self.creds, > url, tf)#012 File > "/opt/wal-e/lib/python3.5/site-packages/wal_e/blobstore/gs/utils.py", line > 38, in uri_put_file#012 blob.upload_from_file(fp, size=size, > content_type=content_type)#012 File > "/opt/wal-e/lib/python3.5/site-packages/google/cloud/storage/blob.py", line > 1044, in upload_from_file#012 > _raise_from_invalid_response(exc)#012 File > "/opt/wal-e/lib/python3.5/site-packages/google/cloud/storage/blob.py", line > 1914, in _raise_from_invalid_response#012 response.status_code, > message, response=response)#012 > google.api_core.exceptions.GoogleAPICallError: 410 PUT > https://www.googleapis.com/upload/storage/v1/b/gitlab-gprd-postgres-backup/o?uploadType=resumable&upload_id=AEnB2UrKU4zHPqzF4fGPeEvhoxJ-2qeIK5xY9SI8O1NIhtOaDn1GC7Q_D4XQVFFXvMVVzuhCLJvUmzTkkKui6M8mpb3BedH15g: > ('Request failed with status code', 410, 'Expected one of', <HTTPStatus.OK: > 200>, 308)#012 #012 During handling of the above exception, > another exception occurred:#012 #012 Traceback (most recent > call last):#012 File > "/opt/wal-e/lib/python3.5/site-packages/wal_e/cmd.py", line 652, in main#012 > pool_size=args.pool_size)#012 File > "/opt/wal-e/lib/python3.5/site-packages/wal_e/operator/backup.py", line 197, > in database_backup#012 **kwargs)#012 File > "/opt/wal-e/lib/python3.5/site-packages/wal_e/operator/backup.py", line 500, > in _upload_pg_cluster_dir#012 pool.put(tpart)#012 File > "/opt/wal-e/lib/python3.5/site-packages/wal_e/worker/upload_pool.py", line > 108, in put#012 self._wait()#012 File > "/opt/wal-e/lib/python3.5/site-packages/wal_e/worker/upload_pool.py", line > 65, in _wait#012 raise val#012 File > "src/gevent/greenlet.py", line 716, in gevent._greenlet.Greenlet.run#012 > File "/opt/wal-e/lib/python3.5/site-packages/wal_e/worker/upload.py", > line 145, in __call__#012 k = put_file_helper()#012 File > "/opt/wal-e/lib/python3.5/site-packages/wal_e/retries.py", line 101, in > shim#012 exc_processor_cxt=exc_processor_cxt)#012 File > "/opt/wal-e/lib/python3.5/site-packages/wal_e/retries.py", line 139, in > retry_with_count_internal#012 side_effect_func(exc_tup, > exc_processor_cxt)#012 File > "/opt/wal-e/lib/python3.5/site-packages/wal_e/worker/upload.py", line 135, in > log_volume_failures_on_error#012 raise > typ(value).with_traceback(tb)#012 File > "/opt/wal-e/lib/python3.5/site-packages/wal_e/retries.py", line 87, in > shim#012 return f(*args, **kwargs)#012 File > "/opt/wal-e/lib/python3.5/site-packages/wal_e/worker/upload.py", line 140, in > put_file_helper#012 return self.blobstore.uri_put_file(self.creds, > url, tf)#012 File > "/opt/wal-e/lib/python3.5/site-packages/wal_e/blobstore/gs/utils.py", line > 38, in uri_put_file#012 blob.upload_from_file(fp, size=size, > content_type=content_type)#012 File > "/opt/wal-e/lib/python3.5/site-packages/google/cloud/storage/blob.py", line > 1044, in upload_from_file#012 > _raise_from_invalid_response(exc)#012 File > "/opt/wal-e/lib/python3.5/site-packages/google/cloud/storage/blob.py", line > 1914, in _raise_from_invalid_response#012 response.status_code, > message, response=response)#012 > google.api_core.exceptions.GoogleAPICallError: None 410 PUT > https://www.googleapis.com/upload/storage/v1/b/gitlab-gprd-postgres-backup/o?uploadType=resumable&upload_id=AEnB2UrKU4zHPqzF4fGPeEvhoxJ-2qeIK5xY9SI8O1NIhtOaDn1GC7Q_D4XQVFFXvMVVzuhCLJvUmzTkkKui6M8mpb3BedH15g: > ('Request failed with status code', 410, 'Expected one of', <HTTPStatus.OK: > 200>, 308)#012 #012 STRUCTURED: > time=2019-06-21T02:31:17.960909-00 pid=37373 > > > Any idea what we can do to fix it? > > Thanks > > > -- > > Yun GuoSenior Database Engineer | GitLab > > -- > You received this message because you are subscribed to the Google Groups > "wal-e" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/wal-e/CAJsFAOz8wwdwGRgV9u4CFnrdQ0QKYMcArpVLOQ%3D%3DvVynT5q-Pw%40mail.gmail.com > <https://groups.google.com/d/msgid/wal-e/CAJsFAOz8wwdwGRgV9u4CFnrdQ0QKYMcArpVLOQ%3D%3DvVynT5q-Pw%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "wal-e" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/wal-e/CAJty0e%2B5Pyuo-Ds-F2VbFC7TXKTB%3D7REXDZmycziMVh6k8EnRA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
