Hi,

On Mon, 09 May 2016 21:07:40 +0200 Johannes Schauer <jo...@debian.org> wrote:
> The main disadvantage of the current srebuild implementation is, that it will
> only make use of a single snapshot.d.o timestamp. This makes it impossible to
> reproduce situations where packages are not built in a clean chroot, in a
> partially updated chroot or in a chroot mixing different suites. To assemble
> a chroot with the right package versions, sbuild could retrieve the exact
> right debs from snapshot.d.o.

I was thinking about this issue again and thought that instead of creating a
wrapper for sbuild which then uses a chroot-setup hook to install the
dependencies, what I should instead do is to let sbuild itself accept
.buildinfo files and then do the right thing like:

 - use snapshot.d.o to retrieve the right timestamps needed to gather all
   packages
 - mangle the build dependencies such that the source package now depends on
   the exact right package versions and let the resolver figure out the rest
   (thanks Benjamin for that idea)
 - check whether the generated binaries produce the same checksum as given in
   the supplied buildinfo file

But then on IRC, HW42 suggested to approach this problem differently. Instead
of integrating the functionality of figuring out the right repositories to
reproduce the contents of a buildinfo file into sbuild, write a tool that can
drive any package builder (like pbuilder).

I now wrote such a script. It currently supports sbuild or manual installation
by showing the correct sources.list. For both it prints the correct command
line invocations. Advantages over the old sbuild hook-based script attached to
initial post are:

 - package versions can come from multiple snapshot timestamps
 - theoretically works for more package builders than just sbuild (pbuilder
   support is missing because I don't know enough about pbuilder)
 - uses Dpkg::Checksums to parse and verify the hash and size fields instead of
   doing it manually
 - uses apt to download and manage Packages files instead of doing it manually
 - allows to add additional repositories like reproducible.alioth.debian.org
   (which is hardcoded so far)
 - uses base-files and dpkg version to estimate a base Debian release
 - drastically reduce number snapshot.d.o API queries by only querying for
   missing packages

Limitations:

 - There is no nice command line interface with options and switches yet
 - You cannot yet supply additional initial archives
   (reproducible.alioth.debian.org is hardcoded)
 - It only considers Debian main
 - It only considers official Debian (and not ports)
 - It only considers Debian unstable from snapshot.d.o
 - You have to manually run sbuild/pbuilder with the displayed command and then
   manually verify if the .buildinfo file stayed the same

What is still needed:

 - a good name (I named it debrebuild for now because it is Debian centric and
   rebuilds a package that was built before to check if the checksums can be
   reproduced locally. This is the main difference to reprotest which does not
   require an existing build but checks for reproducibility by building the
   same software twice in different environments)
 - a nice home for the script to live
 - somebody maintaining the software and making it more user friendly by adding
   a nice command line interface and writing a README file and/or man page
 - maybe let the script execute the sbuild/pbuilder command it suggests to run
   as well. This would allow the script to check the output for plausibility.

Usage:

   debrebuild.pl package.buildinfo

Depends:

   apt-get install --no-install-recommends libdpkg-perl libwww-perl 
libdatetime-format-strptime-perl

Have fun!

cheers, josch
#!/usr/bin/perl
#
# Copyright 2016 Johannes Schauer
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.

use strict;
use warnings;

use Dpkg::Control;
use Dpkg::Index;
use Dpkg::Compression::FileHandle;
use Dpkg::Deps;
use File::Temp qw(tempdir);
use File::Path qw(make_path);
use JSON::PP;

eval {
    require LWP::Simple;
    require LWP::UserAgent;
    no warnings;
    $LWP::Simple::ua = LWP::UserAgent->new(agent => 'LWP::UserAgent/srebuild');
    $LWP::Simple::ua->env_proxy();
};
if ($@) {
    if ($@ =~ m/Can\'t locate LWP/) {
	die "Unable to run: the libwww-perl package is not installed";
    } else {
	die "Unable to run: Couldn't load LWP::Simple: $@";
    }
}

eval {
    require DateTime::Format::Strptime;
};
if ($@) {
    if ($@ =~ m/Can\'t locate DateTime/) {
	die "Unable to run: the libdatetime-format-strptime-perl package is not installed";
    } else {
	die "Unable to run: Couldn't load JSON: $@";
    }
}

my $buildinfo = shift @ARGV;
if (not defined($buildinfo)) {
    die "need buildinfo filename";
}

# FIXME: replace CTRL_INDEX_SRC by the proper value once dpkg supports
# buildinfo files
my $cdata = Dpkg::Control->new(type => CTRL_INDEX_SRC);

my $fh = Dpkg::Compression::FileHandle->new(filename => $buildinfo);

if (not $cdata->parse($fh, $buildinfo)) {
    die "cannot parse"
}
close $fh;

my $build_arch = $cdata->{"Build-Architecture"};
if (not defined($build_arch)) {
    die "need Build-Architecture field";
}
my $environ = $cdata->{"Installed-Build-Depends"};
if (not defined($environ)) {
    die "need Build-Environment field";
}

my $checksums = Dpkg::Checksums->new();
$checksums->add_from_control($cdata);
my @files = $checksums->get_files();

# gather all installed build-depends and figure out the version of base-files
# and dpkg
my $base_files_version;
my $dpkg_version;
my @environ = ();
foreach my $dep (split(/\s*,\s*/m, $environ)) {
    my $pkg = Dpkg::Deps::Simple->new($dep);
    if (not defined($pkg->{package})) {
	die "name undefined";
    }
    if (defined($pkg->{relation})) {
	if ($pkg->{relation} ne "=") {
	    die "wrong relation";
	}
	if (not defined($pkg->{version})) {
	    die "version undefined"
	}
    } else {
	die "no version";
    }
    if ($pkg->{package} eq "dpkg") {
	if (defined($dpkg_version)) {
	    die "more than one dpkg\n";
	}
	$dpkg_version = $pkg->{version};
    }
    if ($pkg->{package} eq "base-files") {
	if (defined($base_files_version)) {
	    die "more than one base-files\n";
	}
	$base_files_version = $pkg->{version};
    }
    push @environ, { name => $pkg->{package},
	architecture => $pkg->{archqual},
	version => $pkg->{version}
    };
}

if (!defined($base_files_version)) {
    die "no base-files\n";
}
if (!defined($dpkg_version)) {
    die "no dpkg\n";
}

# figure out the debian release from the version of base-files and dpkg
my $base_dist;

my %base_files_map = (
    "6" => "squeeze",
    "7" => "wheezy",
    "8" => "jessie",
    "9" => "stretch",
    "10" => "buster",
);
my %dpkg_map = (
    "15" => "squeeze",
    "16" => "wheezy",
    "17" => "jessie",
    "18" => "stretch",
    "19" => "buster",
);

$base_files_version =~ s/^(\d+).*/$1/;
$dpkg_version =~ s/1\.(\d+)\..*/$1/;

$base_dist = $base_files_map{$base_files_version};

if (! defined $base_dist) {
    die "base-files version didn't map to any Debian release"
}

if ($base_dist ne $dpkg_map{$dpkg_version}) {
    die "base-files and dpkg versions point to different Debian releases\n";
}

# test if all checksums in the buildinfo file check out and figure out the
# name of the dsc

my $dsc_fname;
foreach my $fname ($checksums->get_files()) {
    # there is no way to ask Dpkg::Checksums to check file size and hashes
    # directory, so as a workaround we re-add the same files which will do the
    # check
    $checksums->add_from_file($fname);

    if ($fname =~ /\.dsc$/) {
	if (defined($dsc_fname)) {
	    die "more than one dsc\n";
	}
	$dsc_fname = $fname;
    }
}

if (not defined($dsc_fname)) {
    die "no dsc found\n";
}

# setup a temporary apt directory

my $tempdir = tempdir(CLEANUP => 1);

foreach my $d (('/etc/apt', '/etc/apt/apt.conf.d', '/etc/apt/preferences.d',
	'/etc/apt/trusted.gpg.d', '/etc/apt/sources.list.d',
	'/var/lib/apt/lists/partial',
	'/var/cache/apt/archives/partial', '/var/lib/dpkg')) {
    make_path("$tempdir/$d");
}

open(FH, '>', "$tempdir/etc/apt/sources.list");
# FIXME avoid trusted=yes by somehow securely obtaining the right keyring
print FH <<EOF;
deb [trusted=yes] http://reproducible.alioth.debian.org/debian/ ./
deb http://httpredir.debian.org/debian/ $base_dist main
EOF
close FH;
# Create dpkg status
open(FH, '>', "$tempdir/var/lib/dpkg/status");
close FH; #empty file
# Create apt.conf
my $aptconf = "$tempdir/etc/apt/apt.conf";
open(FH, ">$aptconf");

# We create an apt.conf and pass it to apt via the APT_CONFIG environment
# variable instead of passing all options via the command line because
# otherwise apt will read the system's config first and might get unwanted
# configuration options from there. See apt.conf(5) for the order in which
# configuration options are read.
#
# While we are at it, we also set all other options through our custom
# apt.conf.
#
# Apt::Architecture has to be set because otherwise apt will default to the
# architecture apt was compiled for.
#
# Apt::Architectures has to be set or otherwise apt will use dpkg to find all
# foreign architectures of the system running apt.
#
# Dir::State::status has to be set even though Dir is set because Dir::State
# is set to var/lib/apt, so Dir::State::status would be below that but really
# isn't and without an absolute path, Dir::State::status would be constructed
# from Dir + Dir::State + Dir::State::status. This has been fixed in apt
# commit 475f75506db48a7fa90711fce4ed129f6a14cc9a.
#
# Acquire::Check-Valid-Until has to be set to false because the snapshot
# timestamps might be too far in the past to still be valid.
#
# Acquire::Languages has to be set to prevent downloading of translations from
# the mirrors.

print FH <<EOF;
Apt {
   Architecture "$build_arch";
   Architectures "$build_arch";
};

Dir "$tempdir";
Dir::State::status "$tempdir/var/lib/dpkg/status";
Acquire::Check-Valid-Until "false";
Acquire::Languages "none";
EOF
close FH;
foreach my $keyring (qw(debian-archive-keyring.gpg
    debian-archive-removed-keys.gpg
    ubuntu-archive-keyring.gpg
    ubuntu-archive-removed-keys.gpg)) {
    my $src = "/usr/share/keyrings/$keyring";
    if (-f $src) {
	symlink $src, "$tempdir/etc/apt/trusted.gpg.d/$keyring";
    }
}

$ENV{'APT_CONFIG'} = $aptconf;

0 == system 'apt-get', 'update' or die "apt-get update failed\n";

my $key_func = sub {
    return $_[0]->{Package} . ' ' . $_[0]->{Version} . ' ' . $_[0]->{Architecture};
};
my $index = Dpkg::Index->new(get_key_func=>$key_func);

open(my $fd, '-|', 'apt-get', 'indextargets', '--format', '$(FILENAME)', 'Created-By: Packages');
while (my $fname = <$fd>) {
    chomp $fname;
    print "parsing $fname...\n";
    open(my $fd2, '-|', '/usr/lib/apt/apt-helper', 'cat-file', $fname);
    $index->parse($fd2, "pipe") or die "cannot parse Packages file\n";
    close($fd2);
}
close($fd);

# go through all packages in the Installed-Build-Depends field and find out
# the timestamps at which they were first seen each
my %notfound_timestamps;

foreach my $pkg (@environ) {
    my $pkg_name = $pkg->{name};
    my $pkg_ver = $pkg->{version};
    my $pkg_arch = $pkg->{architecture};

    # check if we really need to acquire this package from snapshot.d.o or if
    # it already exists in the cache
    if (defined $pkg->{architecture}) {
	if ($index->get_by_key("$pkg_name $pkg_ver $pkg_arch")) {
	    print "skipping $pkg_name $pkg_ver\n";
	    next;
	}
    } else {
	if ($index->get_by_key("$pkg_name $pkg_ver $build_arch")) {
	    $pkg->{architecture} = $build_arch;
	    print "skipping $pkg_name $pkg_ver\n";
	    next;
	}
	if ($index->get_by_key("$pkg_name $pkg_ver all")) {
	    $pkg->{architecture} = "all";
	    print "skipping $pkg_name $pkg_ver\n";
	    next;
	}
    }

    print "retrieving snapshot.d.o data for $pkg_name $pkg_ver\n";
    my $json_url = "http://snapshot.debian.org/mr/binary/$pkg_name/$pkg_ver/binfiles?fileinfo=1";;
    my $content = LWP::Simple::get($json_url);
    die "cannot retrieve $json_url" unless defined $content;
    my $json = JSON::PP->new();
    # json options taken from debsnap
    my $json_text = $json->allow_nonref->utf8->relaxed->decode($content);
    die "cannot decode json" unless defined $json_text;
    my $pkg_hash;
    if (scalar @{$json_text->{result}} == 1) {
	# if there is only a single result, then the package must either be
	# Architecture:all, be the build architecture or match the requested
	# architecture
	$pkg_hash = ${$json_text->{result}}[0]->{hash};
	$pkg->{architecture} = ${$json_text->{result}}[0]->{architecture};
	# if a specific architecture was requested, it should match
	if (defined $pkg_arch && $pkg_arch ne $pkg->{architecture}) {
	    die "package $pkg_name was explicitly requested for $pkg_arch but only $pkg->{architecture} was found\n";
	}
	# if no specific architecture was requested, it should be the build
	# architecture
	if (! defined $pkg_arch && $build_arch ne $pkg->{architecture} && "all" ne $pkg->{architecture}) {
	    die "package $pkg_name was implicitly requested for $pkg_arch but only $pkg->{architecture} was found\n";
	}
    } else {
	# Since the package occurs more than once, we expect it to be of
	# Architecture:any
	#
	# If no specific architecture was requested, look for the build
	# architecture
	if (! defined $pkg_arch) {
	    $pkg_arch = $build_arch;
	}
	foreach my $result (@{$json_text->{result}}) {
	    if ($result->{architecture} eq $pkg_arch) {
		$pkg_hash = $result->{hash};
		last;
	    }
	}
	if (! defined($pkg_hash)) {
	    die "cannot find package in architecture $pkg_arch\n";
	}
	# we now know that this package is not architecture:all but has a
	# concrete architecture
	$pkg->{architecture} = $pkg_arch;
    }
    # assumption: package is from Debian official (and not ports)
    my @package_from_main = grep { $_->{archive_name} eq "debian" } @{$json_text->{fileinfo}->{$pkg_hash}};
    if (scalar @package_from_main > 1) {
        die "more than one package with the same hash in Debian official\n";
    }
    if (scalar @package_from_main == 0) {
        die "no package with the right hash in Debian official\n";
    }
    my $date = $package_from_main[0]->{first_seen};
    $pkg->{first_seen} = $date;
    $notfound_timestamps{$date} = 1;
}

# feed apt with timestamped snapshot.debian.org URLs until apt is able to find
# all the required package versions. We start with the most recent timestamp,
# check which packages cannot be found at that timestamp, add the timestamp of
# the most recent not-found package and continue doing this iteratively until
# all versions can be found.

my $dtparser = DateTime::Format::Strptime->new(
  pattern => '%Y%m%dT%H%M%SZ',
  on_error => 'croak',
);

while (0 < scalar keys %notfound_timestamps) {
    print "left to check: " . (scalar keys %notfound_timestamps) . "\n";
    my @timestamps = sort (map { $dtparser->parse_datetime($_) } (keys %notfound_timestamps));
    my $newest = $timestamps[$#timestamps];
    $newest = $newest->strftime("%Y%m%dT%H%M%SZ");
    delete $notfound_timestamps{$newest};

    my $snapshot_url = "http://snapshot.debian.org/archive/debian/$newest/";;

    open(FH, '>>', "$tempdir/etc/apt/sources.list");
    print FH "deb $snapshot_url unstable main\n";
    close FH;

    0 == system 'apt-get', 'update' or die "apt-get update failed";

    my $index = Dpkg::Index->new(get_key_func=>$key_func);
    open(my $fd, '-|', 'apt-get', 'indextargets', '--format', '$(FILENAME)', 'Created-By: Packages');
    while (my $fname = <$fd>) {
	chomp $fname;
	print "parsing $fname...\n";
	open(my $fd2, '-|', '/usr/lib/apt/apt-helper', 'cat-file', $fname);
	$index->parse($fd2, "pipe") or die "cannot parse Packages file\n";
	close($fd2);
    }
    close($fd);
    foreach my $pkg (@environ) {
	my $pkg_name = $pkg->{name};
	my $pkg_ver = $pkg->{version};
	my $pkg_arch = $pkg->{architecture};
	my $first_seen = $pkg->{first_seen};
	my $cdata = $index->get_by_key("$pkg_name $pkg_ver $pkg_arch");
	if (not defined($cdata->{"Package"})) {
	    die "cannot find $pkg_name in dumpavail\n";
	}
	if (defined $first_seen) {
	    delete $notfound_timestamps{$first_seen};
	}
    }
}

print "\n";
print "Manual installation and build\n";
print "-----------------------------\n";
print "\n";
print "The following sources.list contains all the required repositories:\n";
print "\n";
0 == system 'cat', "$tempdir/etc/apt/sources.list" or die "cannot cat $tempdir/etc/apt/sources.list";
print "\n";
print "You can manually install the right dependencies like this:\n";
print "\n";
print "apt-get install --no-install-recommends";
foreach my $pkg (@environ) {
    my $pkg_name = $pkg->{name};
    my $pkg_ver = $pkg->{version};
    my $pkg_arch = $pkg->{architecture};
    if ($pkg_arch eq "all" || $pkg_arch eq $build_arch) {
	print " $pkg_name=$pkg_ver";
    } else {
	print " $pkg_name:$pkg_arch=$pkg_ver";
    }
}
print "\n";
print "\n";
print "And then build your package:\n";
print "\n";
print "dpkg-source -x $dsc_fname\n";
print "cd packagedirectory\n";
print "dpkg-buildpackage\n";
print "\n";
print "Using sbuild\n";
print "------------\n";
print "\n";
print "You can try to build the package with sbuild like this:\n";
print "\n";
print "sbuild";
open(FH, '<', "$tempdir/etc/apt/sources.list");
while( my $line = <FH>)  {
    chomp $line;
    print " --extra-repository=\"$line\"";
}
close FH;
my @add_depends = ();
foreach my $pkg (@environ) {
    my $pkg_name = $pkg->{name};
    my $pkg_ver = $pkg->{version};
    my $pkg_arch = $pkg->{architecture};
    if ($pkg_arch eq "all" || $pkg_arch eq $build_arch) {
	push @add_depends, "$pkg_name (= $pkg_ver)";
    } else {
	push @add_depends, "$pkg_name:$pkg_arch (= $pkg_ver)";
    }
}
print " --add-depends=\"" . (join ",", @add_depends) . "\"";
print " --build-dep-resolver=aptitude";
print " -d $base_dist";
print " $dsc_fname\n";

Attachment: signature.asc
Description: signature

_______________________________________________
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

Reply via email to