This was a giant copy&paste, was disabled four months ago, and the
infrastructure for this ceased to exist.
If this comes back, the AutoPackageTest class should be generalized to also
issue phone boot tests (exposed as new architectures, which should then be
called "platforms"), to avoid all this duplicated code.
Generate https://autopkgtest.ubuntu.com/retry.cgi links for re-running tests
that regressed.
Change Excuse.html() back to usual % string formatting to be consistent with
the rest of the code.
If we have a result, directly link to the log file on swift in excuses.html.
The architecture name still leads to the package history as before.
If result is still pending, link to the "running tests" page instead.
- Invert the map to go from triggers to tested packages, instead of the other
way around. This is the lookup and update mode that we usually want, which
simplifies the code and speeds up lookups. The one exception is for fetching
results (as that is per tested source package, not per trigger), but there
is a FIXME to get rid of the "triggers" argument completely.
- Stop tracking tested package versions. We don't actually care about it
anywhere, as the important piece of data is the trigger.
- Drop our home-grown pending.txt format and write pending.json instead.
ATTENTION: This changes the on-disk cache format for pending tests, so
pending.txt needs to be cleaned up manually and any pending tests at the time
of upgrading to this revision will be re-run.
- Invert the map to go from triggers to tested versions, instead of from
tested versions to triggers. This is the lookup and update mode that we
usually want (except for determining "ever passed"), thus this simplifies
the code and speeds up lookups.
- Drop "latest_stamp" in favor of tracking individual run IDs for every
result. This allows us in the future to directly expose the run IDs on
excuses.{yaml,html}, e. g. by providing direct links to result logs.
- Drop "ever_passed" flag as we can compute this from the individual results.
- Don't track multiple package versions for a package and a particular
trigger. We are only interested in the latest (passing) version and don't
otherwise use the tested version except for displaying.
This requires adjusting the test_dkms_results_per_kernel_old_results test, as
we now consistently ignore "ever passed" for kernel tests also for "RUNNING"
vs. "RUNNING-ALWAYSFAILED", not just for "PASS" vs. "ALWAYSFAIL".
Also fix a bug in results() when checking if a test for which we don't have
results yet is currently running: Check for correct trigger, not for the
current source package version. This most probably already fixes LP: #1494786.
Also upgrade the warning about "result is neither known nor pending" to a grave
error, for making it more obvious to debug remaining errors with this.
ATTENTION: This changes the on-disk format of results.cache, and thus this
needs to be dropped and rebuilt when rolling this out.
We don't want to accept a result for an older package version than what the
trigger says.
Also drop two repeated (and thus unnecessary) results in set_results().
For using britney on PPAs we need to add the "ppas" test parameter to AMQP
autopkgtest requests. Add ADT_PPAS britney.conf option which gets passed
through to test requests.
In situations where we don't have an up to date existing results.cache, the
fallback for handling test results would attribute old test results to new
requests, as we don't have a solid way to map them. This is detrimental for ad
hoc britney instances, like for testing PPAs, and also when we need to rebuild
our cache.
Ignore test results without a package trigger, and drop the code for handling
those.
The main risk here is that if we need to rebuild the cache from scratch we
might miss historic "PASS" results which haven't run since we switched to
recording triggers two months ago. But in these two months most of the
interesting packages have run, in particular for the development series and for
stable kernels, and non-kernel SRUs are not auto-promoted anyway.
It does not make sense to have first a passing and then a failing result for
the same package version and trigger. We should prefer old passing results over
new failed ones for the same version (this isn't done yet, but will be soon).
We're going to modify britney so that RUNNING tests don't block
promotion if they have never failed - for this we will need to change a
few tests so that the packages being tested have passed before, meaning
that there could potentially be a regression.
Stop special-casing the kernel and move to one test run per trigger. This
allows us to only install the triggering package from unstable and run the rest
out of testing, which gives much better isolation.
For linux* themselves we don't want to trigger tests -- these should all come
from linux-meta*. A new kernel ABI without a corresponding -meta won't be
installed and thus we can't sensibly run tests against it. This caused
unnecessary and wrong regressions, and unnecessary runs (like linux-goldfish
being triggered by linux).
We want to treat linux-$flavor and linux-meta-$flavor as one set in britney
which goes in together or not at all. We never want to promote linux-$flavor
without the accompanying linux-meta-$flavor.
Introduce a synthetic linux* → linux-meta* dependency to enforce this grouping.
Test requesting: Don't re-request a test if we already have a result for it for
that trigger (for a relevant version), but there is a new version of the tested
package. First this unnecessarily delays propagation as the test will go back
to "in progress", and second if it fails in the next run this isn't the fault
of the original trigger, but the new version of the tested package.
Result finding: Don't limit acceptable results to the latest version of the
tested package. It's perfectly fine if an earlier version (like the one in
testing, or an earlier upload) was ran and gave a result for the requesting
trigger. If it's PASS, then we are definitively done, and if it's a failure
there is the "Checking for new results for failed test..." logic in collect()
logic which will also accept newer results.
This was a confusing inconsistency: libgreen1 and green binaries are both built
from the "green" source, so they should consistently declare that they have
"Testsuite: autopkgtest". Adjust the tests accordingly.
We trigger independent tests for every linux/linux-meta* reverse dependencies,
as they run under the triggering kernel. Thus "ever passed" is rather
meaningless for these as we don't want to track this on a per-trigger basis (as
it would be wrong for everything else but kernels). This led to a lot of false
regressions, as some DKMS modules only work on some kernel flavours.
The kernel team is doing per-kernel regression analysis of the test results, so
we don't need to duplicate this logic in britney. Thus effectively disable the
"Regression" state for kernel reverse dependencies, and rely on the kernel
test machinery to untag the tracking bug only if there are no actual
regressions.
When fetching a result with explicit triggers, always update self.results, not
just when we have a pending trigger for it. Otherwise satisfied_triggers will
be empty after reading the first result, and we clobber test results for all
triggers with the latest result.
When we need to blow away and rebuild results.cache we want to avoid
re-triggering all tests. Thus collect already existing results for requested
tests before submitting new requests.
This is rather hackish now, as fetch_one_result() now has to deal with both
self.requested_tests and self.pending_tests. The code should be refactored to
eliminate one of these maps.
When downloading results, check the ADT_TEST_TRIGGERS var in testinfo.json to
get back the original trigger that previous britney runs requested the test run
for. This allows us to much more precisely map a test result to an original
test request.
In order to tell apart test results which pass/fail depending on the package
which triggers them (in particular, if that is a kernel), we must keep track of
pass/fail results on a per-trigger granularity. Rearrange the results map
accordingly.
Keep the old "latest test result applies to all triggers of this pkg/version"
logic around as long as we still have existing test results without
testinfo.json.
ATTENTION: This breaks the format of results.cache, so this needs to be removed
when rolling this out.
We often get "tmpfail" results (repeated failure to start cloud instance, etc.)
with no package/version at all. Stop attributing them to the latest pending
request for that package, as that has already messed up some results. With
moving to tracking test triggers in testinfo.jar and running multiple test
requests for each triggering kernel version it becomes completely impossible to
interpret anything into a tmpfail result without testpkg-version, so just
ignore them.
This will leave some orphaned entries in pending.txt and thus require manual
retries after fixing the tmpfail reason. But this needs to happen anyway, so
this does not complicate operation but instead shows those as "in progress"
instead of "regression".
So far we only added the triggering test name. Add the version as well, so that
we'll retain the complete trigger information in result.tar's testinfo.json in
swift. This will allow us to completely reconstruct our results.cache from
scratch without losing any trigger information.
This isn't significantly harder to parse from shell either (in tests): You can
still iterate over $ADT_TEST_TRIGGERS with a "for" loop and split package and
version on '/'.
So far we've only calculated the reverse dependencies on amd64. This breaks
when triggering packages which do not exist on some architectures, like
bcmwl-kernel-source. It also makes it impossible to e. g. trigger DKMS tests on
armhf only for an ARM-only kernel like linux-ti-omap4.
Through the usual reverse dependency triggering, gcc-* usually triggers many
hundreds of (mostly universe) tests via libgccN. But:
- This does not help to prevent compiler regressions: as all packages are
built in -proposed anyway, the new compiler is being used immediately, so we
can't hold it back in -proposed.
- It does not trigger toolchain tests which actually are affected, most
importantly binutils and linux.
- This puts enormous stress onto our test infrastructure.
So special case gcc by triggering binutils and linux, and fglrx-installer as a
typical (and important) example of a DKMS package which also needs a compiler,
and libreoffice as our favourite tool chain stress test to cover libgccN.
If a package gets triggered by several sources, we can ordinarily run just one
test for all triggers. But for proposed kernels we want to run a separate test
for each, so that the test can run under that particular kernel.