This separates them from Ubuntu and upstream test requests, avoids that any of
those can completely starve the other two, and makes queues easier to manage.
Some tests are known-broken on a particular architecture only.
force-badtest'ing the entire version is overzealous as it hides regressions on
the other architectures that we expect to work. It's also hard to maintain as
the version has to be bumped constantly.
Support hints of the form "force-badtest srcpkg/architecture/all" (e. g.
"force-badtest chromium-browser/armhf/all"). The special version "all" will
match any version.
Add new state "IGNORE-FAIL" for regressions which have a 'force' or
'force-badtest' hint. In the HTML, show them as yellow "Ignored failure"
(without a retry link) instead of "Regression", and drop the separate
"Should wait for ..." reason, as that is hard to read for packages with a long
list of tests.
This also makes retry-autopkgtest-regressions more useful as this will now only
run the "real" regressions.
When using a shared results cache with PPAs (silos) we cannot rely on the
latest time stamp from the distro's results.cache. As soon as there is a new
run for a package in Ubuntu proper, that updated time stamp hides all previous
results for the PPA, and causes tests to be re-requested unnecessarily.
In the CI train we sometimes run into transient "HTTP Error 502: Proxy Error".
As we don't keep a results.cache there, this leads to retrying tests for which
we already have a result in swift, but can't download it. Treat this as a hard
failure now, to let the next britney run try again. This will also tell us if
we need to handle any other status code than 200, 204 (empty result), or 401
(container does not exist).
This is needed by the CI train, where we
(1) don't want to cache intermediate results for PPA runs, as they might
"accidentally" pass in between and fail again for the final silo,
(2) want to seed britney with the Ubuntu results.cache, to detect regressions
relative to Ubuntu.
Introduce ADT_SHARED_RESULTS_CACHE option which can point to a path to
results.cache. This will then not be updated by britney.
This was a giant copy&paste, was disabled four months ago, and the
infrastructure for this ceased to exist.
If this comes back, the AutoPackageTest class should be generalized to also
issue phone boot tests (exposed as new architectures, which should then be
called "platforms"), to avoid all this duplicated code.
Generate https://autopkgtest.ubuntu.com/retry.cgi links for re-running tests
that regressed.
Change Excuse.html() back to usual % string formatting to be consistent with
the rest of the code.
If we have a result, directly link to the log file on swift in excuses.html.
The architecture name still leads to the package history as before.
If result is still pending, link to the "running tests" page instead.
Don't clobber passed run IDs with newer failed results. This is potentially a
bit more expensive as we might re-fetch failed results at every run after a
PASS, but the IDs in our cache will be correct so that we can expose them in
the UI.
Traceback (most recent call last):
File "/home/ubuntu-archive/proposed-migration/code/b2/britney.py", line 3380, in <module>
Britney().main()
File "/home/ubuntu-archive/proposed-migration/code/b2/britney.py", line 3329, in main
self.write_excuses()
File "/home/ubuntu-archive/proposed-migration/code/b2/britney.py", line 1992, in write_excuses
upgrade_me.remove(e.name)
ValueError: list.remove(x): x not in list
Splitting up the processes of request(), submit(), and collect() makes our data
structures, house keeping, and code unnecessarily complicated. Drop the latter
two and now do all of it in just request(). This avoids having to have a
separate requested_test map, having to fetch test results twice, and gets rid
of some state keeping.
This could have led to (re-)fetching results more than once when we only got
the latest ID from the few triggers we were currently looking at, not for all
possible triggers of a package. Drop this kludge, and replace it with a proper
full iteration and caching.
- Invert the map to go from triggers to tested packages, instead of the other
way around. This is the lookup and update mode that we usually want, which
simplifies the code and speeds up lookups. The one exception is for fetching
results (as that is per tested source package, not per trigger), but there
is a FIXME to get rid of the "triggers" argument completely.
- Stop tracking tested package versions. We don't actually care about it
anywhere, as the important piece of data is the trigger.
- Drop our home-grown pending.txt format and write pending.json instead.
ATTENTION: This changes the on-disk cache format for pending tests, so
pending.txt needs to be cleaned up manually and any pending tests at the time
of upgrading to this revision will be re-run.
- Invert the map to go from triggers to tested versions, instead of from
tested versions to triggers. This is the lookup and update mode that we
usually want (except for determining "ever passed"), thus this simplifies
the code and speeds up lookups.
- Drop "latest_stamp" in favor of tracking individual run IDs for every
result. This allows us in the future to directly expose the run IDs on
excuses.{yaml,html}, e. g. by providing direct links to result logs.
- Drop "ever_passed" flag as we can compute this from the individual results.
- Don't track multiple package versions for a package and a particular
trigger. We are only interested in the latest (passing) version and don't
otherwise use the tested version except for displaying.
This requires adjusting the test_dkms_results_per_kernel_old_results test, as
we now consistently ignore "ever passed" for kernel tests also for "RUNNING"
vs. "RUNNING-ALWAYSFAILED", not just for "PASS" vs. "ALWAYSFAIL".
Also fix a bug in results() when checking if a test for which we don't have
results yet is currently running: Check for correct trigger, not for the
current source package version. This most probably already fixes LP: #1494786.
Also upgrade the warning about "result is neither known nor pending" to a grave
error, for making it more obvious to debug remaining errors with this.
ATTENTION: This changes the on-disk format of results.cache, and thus this
needs to be dropped and rebuilt when rolling this out.
We don't want to accept a result for an older package version than what the
trigger says.
Also drop two repeated (and thus unnecessary) results in set_results().