Hi all,

the xz-backdoor (CVE-2024-3094) luckily did not target gentoo, but it could 
have easily done so. One step in this sophisticated attack involved injecting 
concealed code into the build-process by some kind of homebrew steganography.

I asked myself how many high-entropy files I can find in distfiles. All these 
gif|png|jpg|jpeg|wav|der|xz|gz|p12 might actually be low entropy, but checking 
this would require a more sophisticated approach — in a naive approach, I just 
checked how much bzip2 is able to compress files.

But I also found some really unnecessary and — IMHO — high risk stuff in 
distfiles. tpm-tools f.e. has the /.git/ subdir with all those blobs. Python 
has some audio-testfiles.

In an ideal world, upstream would instead include some low entropy generators 
for this stuff. Gentoo should address the problem even if upstream is not 
responsive. 

I wonder if we should have some functionality in eclasses to
a) let src_unpack() filter/drop distfile content, controlled by an 
ebuild-variable (to deal f.e. with /.git/)
b) let src_unpack() warn on high entropy content (except files whitelisted in 
ebuild)
This would at least allow to easily identify high risk stuff that warrants more 
scrutiny.

  
Greets,
Andreas

BTW, this is my naive test script, sort output on -r -k3

#!/bin/bash

TMPDIR=/tmp/distfiles-entropy.$(date +"%Y%m%d%H%M%S")
trap ' rm -rf ${TMPDIR} ' EXIT
mkdir ${TMPDIR}
cd ${TMPDIR}

for DISTFILE in $(find /var/cache/distfiles/ -type f -printf '%f\n')
do
        mkdir ${DISTFILE}
        case ${DISTFILE} in
                *.tar.gz)       gzip -dc /var/cache/distfiles/${DISTFILE} | tar 
-C ${TMPDIR}/${DISTFILE} -xf -;;
                *.tgz)          gzip -dc /var/cache/distfiles/${DISTFILE} | tar 
-C ${TMPDIR}/${DISTFILE} -xf -;;
                *.tar.xz)       xzcat    /var/cache/distfiles/${DISTFILE} | tar 
-C ${TMPDIR}/${DISTFILE} -xf -;;
                *.txz)          xzcat    /var/cache/distfiles/${DISTFILE} | tar 
-C ${TMPDIR}/${DISTFILE} -xf -;;
                *.tar.bz2)      bzcat    /var/cache/distfiles/${DISTFILE} | tar 
-C ${TMPDIR}/${DISTFILE} -xf -;;
                *.tbz)          bzcat    /var/cache/distfiles/${DISTFILE} | tar 
-C ${TMPDIR}/${DISTFILE} -xf -;;
                *.gz)           gzip -dc /var/cache/distfiles/${DISTFILE} >     
   ${TMPDIR}/${DISTFILE}/file;;
                *)              cat      /var/cache/distfiles/${DISTFILE} >     
   ${TMPDIR}/${DISTFILE}/file;;
        esac
        find ${DISTFILE} -type f | xargs bzip2 -cv 2>&1 >/dev/null
        rm -rf ${TMPDIR}/${DISTFILE}
done

Reply via email to