Unlink self.fake_amqp in do_test() instead of individually in test cases, as we
always want to verify the requests from the last run only, not the accumulated
requests.
If a result.tar does not contain a testpkg-version, we must still match it
against pending.txt, but we must not add it to the results cache. This ends up
being a "null" version key (JSON's serialization of None) which becomes an
actual version string once this is read back.
There are scenarios when britney requests a package test for a particular
version but we actually get a result for a later version:
* When britney runs the later version is not built yet and thus it is in
excludes; but at the time when the test actually runs the package is built.
* We don't support running tests for a given older (source) version yet, tests
always get run from the latest unstable source even if that isn't built yet.
Thus we need to consider results >= the requested version. However, we prefer a
succesful result for the originally requested version so that we can continue
to remove a broken version from unstable. This is already covered by
TestAutoPkgTest.test_remove_from_unstable.
Disabling AMQP requests with "ADT_ENABLE = yes" but ADT_AMQP unset made sense
while we still supported adt-britney. But as that's gone now, let's use the
ADT_ENABLE switch only, and if it's on, require ADT_AMQP and ADT_SWIFT_URL be
set.
This simplifies the code a bit and is less confusing.
We already handle the exclusions in tests_for_source() (and run the testing
version if appropriate), so don't unconditionally skip requests for those.
Adjust the TestAutoPkgTest.test_rdepends_unbuilt case to catch that: The "run
britney once to pick up previous results" was a thinko as this already
satisfies all tests for green 2.
The previous commit introduced a KeyError crash in tests_for_source() for
packages which are unbuilt/uninstallable and only present in unstable.
Ignore these in tests_for_source() as they can't possibly be a regression for
their dependencies, and there is no sensible way to run a test for them.
Commit 463 ("Don't promote packages with unbuilt reverse dependencies") turned
out to be too strict: This holds up too many innocent packages in -proposed.
If unstable has an unbuilt/uninstallable reverse dependency D of a package P,
trigger a test anyway (which will then most likely run against the testing
version of D). If that succeeds, the unstable P did not break D and can be
accepted. If it fails, D needs to be fixed.
Ideally we would set up some clever apt pinning to force installation of
testing-D, to avoid running into the uninstallability of unstable-D, but this
is tricky and error prone.
Drop the temporary "UNINST" state from commit 466 again. Instead, excuses.html
will now show a test against the testing version of D together with a note that
the unstable version is unbuilt/uninstallable.
This should ideally clear up all cases where a requested result is neither
present or pending. Log an error if that still happens (will be checked in the
next couple of runs), and ensure in the tests that we don't trigger any
outstanding "FIXME" log messages.
Commit 463 introduced waiting on reverse dependencies which are not built or⎵
installable yet, but set their status as "RUNNING". This is confusing as there
is no actual test in progress yet.
Instead, set their status to a new UNINST value, displaying as⎵
"Unbuilt/uninstallable"
If a reverse dependency D of a package P is not built yet, then D will be in
"exclusions" as we can't sensibly run D's tests at that time. In that case,
don't just ignore the missing test result but consider D's test as "in
progress".
Note that this might lead to stalling an innocent P if a broken (FTBFS) D gets
uploaded at the same time. This can/should be handled by overrides if fixing
D isn't appropriate, but this is better than allowing P to break D in that
situation.
- Change AutoPackageTest.results() to evaluate the Swift results instead of
the adt-britney ones.
- Drop TestAdtBritney tests which now fail as we switched results evaluation
to swift. Port relevant tests to TestAutoPkgTest.
- Drop obsolete adt-britney autopkgtest code.
- Adjust TestBoottestEnd2End.test_with_adt() for cloud results.
Swift results were considered for older versions of triggers instead of waiting
for results for the actual package/version that triggered a new test.
This broke due to two reasons:
* When evaluating the test results we need to check whether we have a result
for the tested package/version that got triggered by the current excuse, not
just for any older excuse.
* AutoPackageTest.fetch_swift_results() re-downloaded all results for a
package due to a wrong "marker" value: The marker needs to be the
complete object path, not just the timestamp suffix. This caused old test
results to be considered as "newer than the given marker".
Now that we look at autopkgtest results from swift we can drop the
adt-britney/lp:auto-package-testing code from autopkgtest.py.
Note that we still need it for boottest.py.
Adjust TestBoottestEnd2End.test_with_adt() for cloud results.
Change AutoPackageTest.results() to evaluate the Swift results instead of the
adt-britney ones.
TODO:
- Add more tests (like for adt-britney)
- Drop triggering of adt-britney tests
- Drop adt-britney tests (which fail now)
- Adjust TestBoottestEnd2End.test_with_adt
Now that we look at autopkgtest results from swift we can drop the
adt-britney/lp:auto-package-testing code from autopkgtest.py.
Note that we still need it for boottest.py.
Change AutoPackageTest.results() to evaluate the Swift results instead of the
adt-britney ones.
TODO:
- Add more tests (like for adt-britney)
- Drop triggering of adt-britney tests
- Drop adt-britney tests (which fail now)
Add bool whether there is any successful test of src/arch of any version. This
will be used for detecting "regression" vs. "always failed".
WARNING: This changes the results.cache format, so results.cache has to be
removed and recreated before deploying this.
Commit 446 only considered a package's own tests. But we also need to check for
newer results of failed reverse dependency tests. Introduce a new
failed_tests_for_trigger() helper which computes the failed (src, arch) failed
tests for a given package, and fetch new results for all of them.
When collecting results, not only check pending tests, but also new results for
failed tests. This picks up new test results from manual retries which might
now have succeeded.
These usually stem from repeatedly tmpfailing runs where we did not even get as
far as unpacking the source (e. g. repeatedly hitting the ceiling of max
allowed instances/CPUs/etc.). In that case, consider this run a tmpfail result,
instead of ignoring it, as otherwise we end up with that entry being orphaned
in pending.txt.
Their default values are invalid and must be set locally. But as
britney1-ubuntu copies these into production, we would run with an invalid
config with an unmodified config file.
Until now, autopkgtest results were triggered via an external "adt-britney"
command from lp:auto-package-testing. This required a lot of state files and
duplicated effort, uses hardcoded absolute paths to these external tools, and
is quite hard to understand and maintain. We also want to move away from
Jenkins and rsyncing state files.
Directly retrieve autopkgtest results from a publicly readable and browsable
Swift container, with a debci-compatible layout
(https://wiki.debian.org/debci/DistributedSpec). This now tracks both requests
and results on a per-architecture granularity, so that we can track
per-architecture regressions/always-failed.
Introduce a new ADT_SWIFT_URL config option that sets the swift base URL. If
this key is not set, the behaviour does not change compared to previous
versions, and no results will be retrieved from the cloud.
This still keeps the old adt-britney requests/results as the authoritative
data and for now merely shows the swift results in addition. With that we can
compare the results and run the cloud testing in parallel to find/fix problems
until we switch over. Due to that, the code to britney.py is temporary, does
*not* use AutoPackageTest.results(), and instead just reads the internal
results map.
This is necessary so that we can properly match requested to received results
when the latter arrive in different runs for different architectures.
This also opens up the possibility of per-arch blacklisting later.