Hey,

Seems like for some reason the resumable upload session that the google
python library uses for large files disappeared. No idea how common or why
that would happen, but unfortunately the google library doesn't seem to
retry those errors themselves anymore now that we stopped using the
deprecated num_retries parameter directly.

Are you open to hacking your wal-e installation a bit to see if just
checking for the GoogleAPICallError (and maybe specifically 410) and
retrying would fix this? We can think about more cleaner solutions
afterwards. Checking for the exception somewhere in this if-branch (
https://github.com/wal-e/wal-e/blob/master/wal_e/worker/upload.py#L119) and
making sure it doesn't get to the else block and raised should force wal-e
to retry the upload. Is this something you could try adding to your local
installation and see if it fixes the situation for you?

Cheers,
Samuel

On Fri, Jun 21, 2019 at 9:43 AM 'Yun Guo' via wal-e <[email protected]>
wrote:

>
> Hi,
>
> We are using wal-e v1.1 to backup GCS. The total backup is around 3.2T .
> We noticed the wal-e processed failed HTTP/410 sporadically and below is
> the log.
>
> Jun 21 02:30:57  wal_e.worker.upload INFO     MSG: beginning volume 
> compression#012        DETAIL: Building volume 1142.#012        STRUCTURED: 
> time=2019-06-21T02:30:57.666929-00 pid=37373Jun 21 02:30:58  
> wal_e.worker.upload INFO     MSG: beginning volume compression#012        
> DETAIL: Building volume 1143.#012        STRUCTURED: 
> time=2019-06-21T02:30:58.958880-00 pid=37373Jun 21 02:31:13  
> wal_e.worker.upload INFO     MSG: beginning volume compression#012        
> DETAIL: Building volume 1144.#012        STRUCTURED: 
> time=2019-06-21T02:31:13.820819-00 pid=37373Jun 21 02:31:14  
> wal_e.operator.backup WARNING  MSG: blocking on sending WAL segments#012      
>   DETAIL: The backup was not completed successfully, but we have to wait 
> anyway.  See README: TODO about pg_cancel_backup#012        STRUCTURED: 
> time=2019-06-21T02:31:14.716392-00 pid=37373Jun 21 02:31:17  wal_e.main   
> CRITICAL MSG: An unprocessed exception has avoided all error handling#012     
>    DETAIL: Traceback (most recent call last):#012          File 
> "/opt/wal-e/lib/python3.5/site-packages/google/cloud/storage/blob.py", line 
> 1041, in upload_from_file#012            size, num_retries, 
> predefined_acl)#012          File 
> "/opt/wal-e/lib/python3.5/site-packages/google/cloud/storage/blob.py", line 
> 957, in _do_upload#012            num_retries, predefined_acl)#012          
> File "/opt/wal-e/lib/python3.5/site-packages/google/cloud/storage/blob.py", 
> line 904, in _do_resumable_upload#012            response = 
> upload.transmit_next_chunk(transport)#012          File 
> "/opt/wal-e/lib/python3.5/site-packages/google/resumable_media/requests/upload.py",
>  line 396, in transmit_next_chunk#012            
> self._process_response(result, len(payload))#012          File 
> "/opt/wal-e/lib/python3.5/site-packages/google/resumable_media/_upload.py", 
> line 574, in _process_response#012            self._get_status_code, 
> callback=self._make_invalid)#012          File 
> "/opt/wal-e/lib/python3.5/site-packages/google/resumable_media/_helpers.py", 
> line 93, in require_status_code#012            status_code, u'Expected one 
> of', *status_codes)#012        google.resumable_media.common.InvalidResponse: 
> ('Request failed with status code', 410, 'Expected one of', <HTTPStatus.OK: 
> 200>, 308)#012        #012        During handling of the above exception, 
> another exception occurred:#012        #012        Traceback (most recent 
> call last):#012          File 
> "/opt/wal-e/lib/python3.5/site-packages/wal_e/retries.py", line 87, in 
> shim#012            return f(*args, **kwargs)#012          File 
> "/opt/wal-e/lib/python3.5/site-packages/wal_e/worker/upload.py", line 140, in 
> put_file_helper#012            return self.blobstore.uri_put_file(self.creds, 
> url, tf)#012          File 
> "/opt/wal-e/lib/python3.5/site-packages/wal_e/blobstore/gs/utils.py", line 
> 38, in uri_put_file#012            blob.upload_from_file(fp, size=size, 
> content_type=content_type)#012          File 
> "/opt/wal-e/lib/python3.5/site-packages/google/cloud/storage/blob.py", line 
> 1044, in upload_from_file#012            
> _raise_from_invalid_response(exc)#012          File 
> "/opt/wal-e/lib/python3.5/site-packages/google/cloud/storage/blob.py", line 
> 1914, in _raise_from_invalid_response#012            response.status_code, 
> message, response=response)#012        
> google.api_core.exceptions.GoogleAPICallError: 410 PUT 
> https://www.googleapis.com/upload/storage/v1/b/gitlab-gprd-postgres-backup/o?uploadType=resumable&upload_id=AEnB2UrKU4zHPqzF4fGPeEvhoxJ-2qeIK5xY9SI8O1NIhtOaDn1GC7Q_D4XQVFFXvMVVzuhCLJvUmzTkkKui6M8mpb3BedH15g:
>  ('Request failed with status code', 410, 'Expected one of', <HTTPStatus.OK: 
> 200>, 308)#012        #012        During handling of the above exception, 
> another exception occurred:#012        #012        Traceback (most recent 
> call last):#012          File 
> "/opt/wal-e/lib/python3.5/site-packages/wal_e/cmd.py", line 652, in main#012  
>           pool_size=args.pool_size)#012          File 
> "/opt/wal-e/lib/python3.5/site-packages/wal_e/operator/backup.py", line 197, 
> in database_backup#012            **kwargs)#012          File 
> "/opt/wal-e/lib/python3.5/site-packages/wal_e/operator/backup.py", line 500, 
> in _upload_pg_cluster_dir#012            pool.put(tpart)#012          File 
> "/opt/wal-e/lib/python3.5/site-packages/wal_e/worker/upload_pool.py", line 
> 108, in put#012            self._wait()#012          File 
> "/opt/wal-e/lib/python3.5/site-packages/wal_e/worker/upload_pool.py", line 
> 65, in _wait#012            raise val#012          File 
> "src/gevent/greenlet.py", line 716, in gevent._greenlet.Greenlet.run#012      
>     File "/opt/wal-e/lib/python3.5/site-packages/wal_e/worker/upload.py", 
> line 145, in __call__#012            k = put_file_helper()#012          File 
> "/opt/wal-e/lib/python3.5/site-packages/wal_e/retries.py", line 101, in 
> shim#012            exc_processor_cxt=exc_processor_cxt)#012          File 
> "/opt/wal-e/lib/python3.5/site-packages/wal_e/retries.py", line 139, in 
> retry_with_count_internal#012            side_effect_func(exc_tup, 
> exc_processor_cxt)#012          File 
> "/opt/wal-e/lib/python3.5/site-packages/wal_e/worker/upload.py", line 135, in 
> log_volume_failures_on_error#012            raise 
> typ(value).with_traceback(tb)#012          File 
> "/opt/wal-e/lib/python3.5/site-packages/wal_e/retries.py", line 87, in 
> shim#012            return f(*args, **kwargs)#012          File 
> "/opt/wal-e/lib/python3.5/site-packages/wal_e/worker/upload.py", line 140, in 
> put_file_helper#012            return self.blobstore.uri_put_file(self.creds, 
> url, tf)#012          File 
> "/opt/wal-e/lib/python3.5/site-packages/wal_e/blobstore/gs/utils.py", line 
> 38, in uri_put_file#012            blob.upload_from_file(fp, size=size, 
> content_type=content_type)#012          File 
> "/opt/wal-e/lib/python3.5/site-packages/google/cloud/storage/blob.py", line 
> 1044, in upload_from_file#012            
> _raise_from_invalid_response(exc)#012          File 
> "/opt/wal-e/lib/python3.5/site-packages/google/cloud/storage/blob.py", line 
> 1914, in _raise_from_invalid_response#012            response.status_code, 
> message, response=response)#012        
> google.api_core.exceptions.GoogleAPICallError: None 410 PUT 
> https://www.googleapis.com/upload/storage/v1/b/gitlab-gprd-postgres-backup/o?uploadType=resumable&upload_id=AEnB2UrKU4zHPqzF4fGPeEvhoxJ-2qeIK5xY9SI8O1NIhtOaDn1GC7Q_D4XQVFFXvMVVzuhCLJvUmzTkkKui6M8mpb3BedH15g:
>  ('Request failed with status code', 410, 'Expected one of', <HTTPStatus.OK: 
> 200>, 308)#012        #012        STRUCTURED: 
> time=2019-06-21T02:31:17.960909-00 pid=37373
>
>
> Any idea what we can do to fix it?
>
> Thanks
>
>
> --
>
> Yun GuoSenior Database Engineer | GitLab
>
> --
> You received this message because you are subscribed to the Google Groups
> "wal-e" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/wal-e/CAJsFAOz8wwdwGRgV9u4CFnrdQ0QKYMcArpVLOQ%3D%3DvVynT5q-Pw%40mail.gmail.com
> <https://groups.google.com/d/msgid/wal-e/CAJsFAOz8wwdwGRgV9u4CFnrdQ0QKYMcArpVLOQ%3D%3DvVynT5q-Pw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"wal-e" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/wal-e/CAJty0e%2B5Pyuo-Ds-F2VbFC7TXKTB%3D7REXDZmycziMVh6k8EnRA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to