After some investigation, seems we were able to narrow down the issue.

* tl;dr: After the upgrade of the PPA builder to Bionic, the memlock limit 
(ulimit -l) was bumped from a ridiculous low value (64) to something bigger 
(16M). Happens that cryptsetup then succeeded in its call to mlockall(), so all 
allocations got restricted by such limit, which is still a bit low and it ends 
up leading to allocation failures.
When the limit is very low (like in Xenial), the lock procedure fails, and 
cryptsetup allocations are not subject to this restriction, so everything just 
works.

See section "Conclusion" for alternatives on how to fix this

 
* Details: 
I manage to reproduce that by collecting the luks2-validation images in a local 
environment, running a Bionic VM + LXD (a Focal container). By collecting the 
strace of luksDump in both environments, we got the following:

### LXD - NOT working
...
openat(AT_FDCWD, "./luks2-metadata-size-4m.img", O_RDONLY|O_DIRECT) = 5
fstat(5, {st_mode=S_IFREG|0644, st_size=16777216, ...}) = 0
fstat(6, {st_mode=S_IFREG|0644, st_size=16777216, ...}) = 0
lseek(5, 0, SEEK_SET)                   = 0
read(5, "LUKS\272\276\0\2\0\0\0\0\0@\0\0\0\0\0\0\0\0\0\n\0\0\0\0\0\0\0\0"..., 
4096) = 4096
mmap(NULL, 4194304, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
-1 EAGAIN (Resource temporarily unavailable)
brk(0x55e789c38000)                     = 0x55e78982d000
mmap(NULL, 4325376, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
-1 EAGAIN (Resource temporarily unavailable)
lseek(5, 16384, SEEK_SET)               = 16384
read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
4096) = 4096
lseek(5, 32768, SEEK_SET)               = 32768
read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
4096) = 4096
lseek(5, 65536, SEEK_SET)               = 65536
read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
4096) = 4096
...

### VM - working
...
openat(AT_FDCWD, "./luks2-metadata-size-4m.img", O_RDONLY|O_DIRECT) = 5
fstat(5, {st_mode=S_IFREG|0644, st_size=16777216, ...}) = 0
fstat(6, {st_mode=S_IFREG|0644, st_size=16777216, ...}) = 0
lseek(5, 0, SEEK_SET)                   = 0
read(5, "LUKS\272\276\0\2\0\0\0\0\0@\0\0\0\0\0\0\0\0\0\n\0\0\0\0\0\0\0\0"..., 
4096) = 4096
mmap(NULL, 4194304, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f6b06031000
lseek(5, 4096, SEEK_SET)                = 4096
mmap(NULL, 4198400, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f6b05c30000

So: as mmap fails, lseeks start to be attempted with wrong sizes, 2K^N,
where N=4,5,...

In cryptsetup code: on luks2_disk_metadata.c, function
LUKS2_disk_hdr_read(), we fail and try all known offsets, as per the
below code:

[...]
   * No header size, check all known offsets.
                 */
                for (r = -EINVAL,i = 0; r < 0 && i < ARRAY_SIZE(hdr2_offsets); 
i++)
[...]

This explains why we see that many lseeks in the LXD failing case, with
multiple offsets.

But then, why we fail?  In the failing case, on funtion
LUKS2_disk_hdr_read(), we fail right in the first header read, as per
code in lib/luks2/luks2_disk_metadata.c:

[...]
         * Read primary LUKS2 header (offset 0).
         */
        state_hdr1 = HDR_FAIL;
        r = hdr_read_disk(cd, device, &hdr_disk1, &json_area1, 0, 0);
[...]

The failure comes in a malloc(), specifically in hdr_read_disk():

[...]
        r = hdr_disk_sanity_check_pre(cd, hdr_disk, &hdr_json_size, secondary, 
offset);
        if (r < 0) {
                return r;
        }
        /*
         * Allocate and read JSON area. Always the whole area must be read.
         */
        *json_area = malloc(hdr_json_size);
        if (!*json_area) {
                return -ENOMEM;
        }
[...]

Without the json_area allocated we end-up looping, in search of the
proper header size, and failing the test. This malloc is the one
generating the following entry on strace:

mmap(NULL, 4194304, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = -1 EAGAIN (Resource temporarily unavailable)


* Conclusion: we have 2 avenues for fixing that, I personally consider (a) 
[below] the more correct one.

(a) We could increase the builders memlock limit to 64M - Focal has that
as a default now. This seems to me the proper approach, given that in
real life cryptsetup is performing the memory lock, so we should
exercise it like that during the build tests.

(b) It's possible to fallback to the same scenario of Xenial builder by
_reducing_ the memlock limit and having cryptsetup not setting the
memory lock at all during the build. The bonus of this approach is its
simplicity - we can decrease such limit from the package itself, but at
the same time, we don't exercise the real life usage anymore during the
build tests.

By following the approach (b) above, I've managed to make the build
work: https://launchpad.net/~gpiccoli/+archive/ubuntu/crypt-
groovy/+build/19913720

I'll spin a mailing-list discussion on top of Colin's PPA builder update 
message to discuss the possibility of approach (a).
Cheers,


Guilherme

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1891473

Title:
  cryptsetup ftbfs in focal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/cryptsetup/+bug/1891473/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to