Hi all, the xz-backdoor (CVE-2024-3094) luckily did not target gentoo, but it could have easily done so. One step in this sophisticated attack involved injecting concealed code into the build-process by some kind of homebrew steganography.
I asked myself how many high-entropy files I can find in distfiles. All these gif|png|jpg|jpeg|wav|der|xz|gz|p12 might actually be low entropy, but checking this would require a more sophisticated approach — in a naive approach, I just checked how much bzip2 is able to compress files. But I also found some really unnecessary and — IMHO — high risk stuff in distfiles. tpm-tools f.e. has the /.git/ subdir with all those blobs. Python has some audio-testfiles. In an ideal world, upstream would instead include some low entropy generators for this stuff. Gentoo should address the problem even if upstream is not responsive. I wonder if we should have some functionality in eclasses to a) let src_unpack() filter/drop distfile content, controlled by an ebuild-variable (to deal f.e. with /.git/) b) let src_unpack() warn on high entropy content (except files whitelisted in ebuild) This would at least allow to easily identify high risk stuff that warrants more scrutiny. Greets, Andreas BTW, this is my naive test script, sort output on -r -k3 #!/bin/bash TMPDIR=/tmp/distfiles-entropy.$(date +"%Y%m%d%H%M%S") trap ' rm -rf ${TMPDIR} ' EXIT mkdir ${TMPDIR} cd ${TMPDIR} for DISTFILE in $(find /var/cache/distfiles/ -type f -printf '%f\n') do mkdir ${DISTFILE} case ${DISTFILE} in *.tar.gz) gzip -dc /var/cache/distfiles/${DISTFILE} | tar -C ${TMPDIR}/${DISTFILE} -xf -;; *.tgz) gzip -dc /var/cache/distfiles/${DISTFILE} | tar -C ${TMPDIR}/${DISTFILE} -xf -;; *.tar.xz) xzcat /var/cache/distfiles/${DISTFILE} | tar -C ${TMPDIR}/${DISTFILE} -xf -;; *.txz) xzcat /var/cache/distfiles/${DISTFILE} | tar -C ${TMPDIR}/${DISTFILE} -xf -;; *.tar.bz2) bzcat /var/cache/distfiles/${DISTFILE} | tar -C ${TMPDIR}/${DISTFILE} -xf -;; *.tbz) bzcat /var/cache/distfiles/${DISTFILE} | tar -C ${TMPDIR}/${DISTFILE} -xf -;; *.gz) gzip -dc /var/cache/distfiles/${DISTFILE} > ${TMPDIR}/${DISTFILE}/file;; *) cat /var/cache/distfiles/${DISTFILE} > ${TMPDIR}/${DISTFILE}/file;; esac find ${DISTFILE} -type f | xargs bzip2 -cv 2>&1 >/dev/null rm -rf ${TMPDIR}/${DISTFILE} done