Not sure if it's related but I'll leave this here.

We have a bunch of EC2 instances running Ubuntu 18.04 LTS and GitLab
Runner, with the latter configured to execute build jobs inside Docker
containers.  Our EC2 instances are of the t2.medium, t2.xlarge and
t3.medium sizes and run in the eu-west-1 region (in different AZs).

After upgrading from kernel 4.15.0-1021-aws to 4.15.0-1023-aws we've
seen a lot of build failures across all of our GitLab Runner instances
on Ubuntu 18.04 LTS.  The problems happen intermittently.

Specifically the build jobs fail during the startup phase where GitLab
Runner clones the git repository.  The git clone operation fails with:

```
Cloning repository...
Cloning into '/builds/my-group/my-project'...
error: RPC failed; curl 18 transfer closed with outstanding read data remaining
fatal: The remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed
```

According to the documentation at https://curl.haxx.se/libcurl/c
/libcurl-errors.html this means:

> CURLE_PARTIAL_FILE (18)
> 
> A file transfer was shorter or larger than expected. This happens when the 
> server first reports an expected transfer size, and then delivers data that 
> doesn't match the previously given size.

When this problem happened it seems that the bandwidth used by the `git
clone` operation eventually dropped to a few or zero kByte/s (the
reporting wasn't granular enough to determine which).  I assume that the
GitLab server later closed/timed out the connection before the git
client had downloaded all of the data it expected, leaving us with a
`CURLE_PARTIAL_FILE` error.

We've also seen larger (10s-100s of MBs) file downloads hang
indefinitely (e.g. apt install).

We've rolled back changes made during the last few weeks one by one and
eventually found that downgrading to 4.15.0-1021-aws solved the issue on
all of our GitLab Runner instances (ranging from 11.2.0, 11.2.1 and
11.3.1).

I couldn't spot anything obvious that seemed to be related to the issue in the 
changelog here:
https://changelogs.ubuntu.com/changelogs/pool/main/l/linux-aws/linux-aws_4.15.0-1023.23/changelog

FWIW, this is what `docker version` reports:

```
Client:
 Version:           18.06.1-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        e68fc7a
 Built:             Tue Aug 21 17:24:56 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.1-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       e68fc7a
  Built:            Tue Aug 21 17:23:21 2018
  OS/Arch:          linux/amd64
  Experimental:     false
```

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1796469

Title:
  aws s3 cp --recursive hangs on the last file on a large file transfer
  to instance

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1796469/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to