Hi, It is time for another "major" lintian branch.
This time I am proposing we target making lintian.d.o-like setups easier. To my knowledge, only two of such setups currently exists and both of them were done by (or with the help of) the Lintian Maintainers. Research indicates that Ubuntu in 2008 considered to improve the current situation, but to my knowledge that their work never really further than the blueprint (the spec has a link to the blueprint). As written with "big fat letters", the below is a draft and it can be changed. It is based on the a private TODO-list, the Ubuntu Blueprint and from what I picked up from discussions here and there. I hope we can get a good discussion on the topic. After a bit of discussion I will create a wiki-page for the specification. I am also hoping that some of you are willing to invest a bit of time coding or/and testing the solution as it develops. Should you be interested, doc/README.developers in the source code can hopefully get you started. A few parts included are mostly comments for the Lintian Maintainers; on some internal parts (like the part about refactoring/removing "unpacked/"). Should there be parts you do not understand, please do ask about them. I may have unwillingly assumed you know the internals or the workflow of the current code. :) ~Niels THIS SPECIFICATION IS A DRAFT and is subject to change. This specification aims to provide official support for setting up "lintian.debian.org"-like instances. This is based on a existing Ubuntu Blueprint[UB] and the private/TODO-file[TODO] in the Lintian source code. [UB] https://wiki.ubuntu.com/LintianHarness [TODO] http://anonscm.debian.org/gitweb/?p=lintian/lintian.git;a=blob;f=private/TODO;h=3936b340b730175ef98bd902785403b69d5437e5;hb=9505109c3006edf6f360da8da2d530c42337ee4f#l92 Requirements ============ Some of these are already supported in the current implementation; they are listed here again for completeness. * A new frontend called "lintian-harness" - Must have a well-defined purpose and workflow. - Must be well documented. - it should not take a Lintian Maintainer to use it. - Must be cron-friendly - Expected to be the primary use method of the frontend. - Must (continue to) support incremental runs. * The resulting reports must be (re-)brandable. - (i.e.) Lintian may not be checking against the Debian Policy Manual, etc. * The lintian-harness will be shipped in a separate package that depends on the lintian package. * Must support fetching from http(s):// mirrors. Ideas, Issues and Extensions ============================ * The remaining scripts in unpack/ could be replaced by making the existing Laboratory code smarter. - reporting/ is one of the last consumers of unpack/ - Zach suggested "sync'ing from a mirror" would be useful if Lintian was turned into a Static Analysis Framework. * Use locks when running. - Currently you have to manually disable the cron-job if if you are doing an "out of band" lintian run. * Hooks: - Allow local system specific code to be run (i.e.) after the html site has been updated. - (hopefully) "everyone uses the same frontend" * Migrate to template-toolkit? - There was some talk about it; it should be done before this spec is implemented. * Adding support for display comment overrides. - This would probably be a good time. * Testing - The lintian frontend itself is used by 300+ tests, so we are fairly certain it is not obviously broken, if there are no test failures. We can unit test some of the code used by lintian-harness, but can we do better and actually test the lintian-harness frontend (in some sane manner)? Proposed Solution ================= File System Layout ------------------ The website setup currently uses the templates directly from the LINTIAN_ROOT. This complicates updating templates, since LINTIAN_ROOT will be overwritten on upgrades. This can be solved by splitting the setup into four distinct major components: LINTIAN_ROOT, SITE_ROOT, WORK_ROOT and HTML_DIR. * LINTIAN_ROOT is the base of the Lintian installation. Usually this will be /usr/share/lintian. - This is read-only for the lintian processes. * SITE_ROOT is configuration/setup rules for the site. This is not (by design at least) public available via the HTML site. - The local admin/user can deploy site specific templates and configuration here. - This is read-only for the lintian processes. * WORK_ROOT is the root dir for lintian to write its cache and its logs. - This needs to be readable and writable by the lintian processes. * HTML_DIR is where the html site is written on the machine. lintian-harness will generate all the data presented here. - This is "write-only" for the lintian processes. - lintian-harness may delete HTML_DIR and its entire contents. HTML_DIR is not allowed to be a symlink. - lintian will need to be able to create a directory in the parent of HTML_DIR (see below on HTML_DIR) - Should lintian-harness have configuration options to modify permissions (etc.) on HTML_DIR? Not needed if the proper hook exists. Lintian will ship a base SITE_ROOT in LINTIAN_ROOT/reporting, and can create a SITE_ROOT based on this. The local admin can then modify the SITE_ROOT to fit his/her needs, setup the cronjob and then the setup is complete. The SITE_ROOT should have the following structure: SITE_ROOT/ bin/ ... config hooks/ ... images/ loco-small.png ... lintianrc lintian.css templates/ index.tmpl .... Any of the files or directories in SITE_ROOT may be a symlink, in which case it is followed (regardless of where it points to). "config" shall contain all the relevant configurations for lintian-harness and "lintianrc" will be the configuration file for lintian (if any). templates/ will contain the relevant templates used by lintian-harness to write the html output to HTML_DIR. The contents of "images/" and "lintian.css" will be copied (as is) to HTML_DIR. hooks/ would contain executable scripts that will be run by lintian-harness at the relevant point of the execution. bin/ will be pre-appended to path by lintian-harness and can be used to override some system commands. Particularly symlinking SITE_ROOT/bin/gpg to /bin/true can be used to disable gpg signature checks (as done by dpkg-source, when extracting a source package). Can we do something to assist the local admins in upgrading their existing SITE_ROOT? The WORK_ROOT has the following (default) layout: WORK_ROOT/ laboratory/ ... logs/ lintian.log ... ... Unless otherwise specified in SITE_ROOT/config, the laboratory will be placed in WORK_ROOT/laboratory. The logs directory will store the logs and some statistical data collected by lintian and lintian-harness. "savelog" shall be used to maintain some past logs. The lintian.log file will be copied to the HTML_DIR and is also used by lintian-harness to create the incremental runs (see "incremental runs" below). By default WORK_ROOT may be used for other temporary / auxiliary files (or directories) that can be used in a subsuquent run. Particular see "Fetching packages" below on having a package cache. WORK_ROOT and SITE_ROOT may point to the same directory, but lintian will need to create and edit files in WORK_ROOT, so it may complicate making SITE_ROOT read-only. HTML_DIR is where lintian-harness will produce its final output to be served by a webserver. When replacing the existing HTML_DIR, lintian-harness will create a temporary directory and populate it with the new contents. It will then swap the HTML_DIR and the temporary directories, followed by a removal of the old (renamed) HTML_DIR. Problem with this is that there is a "minimal" time where HTML_DIR is absent ("mv HTML_DIR old && mv new HTML_DIR"). Can we use some other apporach that ensures that (the content in) HTML_DIR is always present and consistent (without a ton of "mv -f new/.../file HTML_DIR/.../file")? Fetching packages ----------------- On lintian.debian.org there is a local mirror available on the file system. Other setups may not want to or have the capacity to have the mirror locally (even as an NFS mount). The harness frontend should therefore support more than one method for fetching the packages to be processed. Having a local cache may be useful to avoid unnecessary bandwidth usage, when doing a full run. If such a cache is implemented, the layout may need another directory to ensure that LINTIAN_ROOT and SITE_ROOT does not need to be writable by the user running lintian-harness. Fetching packages via HTTP sounds a lot like something APT or aptitude can do already, so perhaps this solution should use APT (possibly via libapt-pkg-perl). The old code needs access to the Sources file, the Packages file and the packages downloaded. The two former can most likely be replaced by using APT's API to access the package metadata. The second advantage of using APT as a backend for pulling packages is that lintian-harness would automatically support fetching from any protocol (or setup) that APT supports. It seems to be a fair assumption that anyone wanting to setup a "lintian.d.o"-like machine will have basic knowledge about APT. That being said lintian-harness should ship with some basic APT configuration templates to be used by lintian-harness's APT module. Incremental runs ---------------- The incremental runs work by lintian-harness analysing which packages have changed, been removed or have appeared since the last run. It then filters out all tags for these packages from the previous lintian.log. Finally it instructs lintian to test the changed and new packages, appending its output to the new lintian.log. Once lintian has terminated, lintian-harness will use the lintian.log to generate the website. Testing ------- The lintian frontend itself is used by 300+ tests, so we are fairly certain it is not obviously broken, if there are no test failures. We can unit test some of the code used by lintian-harness, but can we do better and actually test the lintian-harness frontend (in some sane manner)? -- To UNSUBSCRIBE, email to debian-lint-maint-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20110823123639.5bd6f3...@thykier.net