Still trying to figure out this new functionality. We're getting close
with more logging, so I have added a little bit more logging, and
stopped using what was presumably a lit of all binaries for i386, and
now am using what is hopefully just the list of binaries for the source
package
Since we can't seem to figure out where we're going wrong, I thought I'd
just add verbose logging here and hope it helps with debugging the new
feature, which aims to reduce the number of i386 tests queued.
The latest approach unfortunately didn't work, so I'm trying this
approach using the binaries_info variable.
I've also included the full traceback when the functionality doesn't
work, to enable easier debugging
Recent investigations indicated that approximately 85% of all of the
i386 tests run at autopkgtest.ubuntu.com are pointless.
These tests are pointless for the following reason:
for end users, the dependencies of an arch: all package on
non-arch: all packages are satisfied by the amd64 binaries, not i386
binaries.
This commit introduces a check in the `tests_for_source` function, which
is the function that generates a list of tests to be requested for a
src package on a specified architecture.
The check itself takes the src package name, gets the list of binaries
for that src package and checks to see if the architecture for all of
the binaries of said src package is "all". If all the binaries are
Architecture: "all", then the function returns an empty list and no
tests will be requested for that src package on i386.
Since it's quite hard to test britney code, the implementation is
wrapped in a try except block as to avoid tracebacks blocking britney
runs.
The try except block should be removed once the change is considered to
be stable.
autopkgtest-cloud will now serve:
autopkgtest.ubuntu.com/static/autopkgtest.db.sha256
Britney now calculates the sha256 of the newly downloaded db locally and
checks that it matches the sha256 file served by autopkgtest-cloud,
instead of checking that the content-length header matches the
size of the new downloaded database.
Since the most recent apache2 security update in focal [1], the
content-length header isn't served by default, and it seems that when
it is served it's not entirely accurate. This check has become
brittle, and so we have implemented this new mechanism.
[1] https://bugs.launchpad.net/ubuntu/+source/apache2/+bug/2061816
We don't know why the autopkgtest webserver has stopped providing a
Content-Length header but the current code doesn't handle its absence, so
detect this rather than throwing an exception.
Should be reverted once the header is back, we don't want to be in the dark
about short reads of the db.
As seen in
https://ubuntu-archive-team.ubuntu.com/proposed-migration/log/noble/2024-04-01/12:52:27.log
Having an entire britney run bail because of a connection reset is a bad
outcome!
Instead, catch this exception and avoid adding the test in question to the
list of queued tests (we can pick it up on the next run).
Possibly we should do more clever handling of a ConnectionResetError such as
reconnecting, but this is a minimum fix that will stop britney from aborting.
This avoids endlessly requeuing the test if the test produces
an older result.
This will make tests "disappear" if the infrastructure returns
old results for newer triggers but avoids the problem right
now where we end up queuing the same tests every run.
For rolling out britney on a new machine, we want to generate update_excuses
and update_output to confirm it's working correctly all the way through, so
we don't want to use the global --dry-run option; but we *do* want to
disable queuing tests and instead let the production instance of britney
queue the tests while we simply query the results. Add support for
ADT_ENABLE=dry-run in britney.conf, parallelling the behavior of other
policies.
britney currently spends a majority of its runtime querying for baseline
test results that it won't find, and that it doesn't need. Refactor to
eliminate many of these excess queries.
@canonical.com is now DKIM signed and SPF published which means emails
from proposed-migration running on snakefruit sending direct would
likely be caught out. Since we're here, the project is Ubuntu related
so switch to using an @ubuntu.com address instead.
When querying swift there is no way to take results only newer than a
specified point, you can only query newer than or equal to. But for sqlite
we can absolutely use > instead of >= and avoid re-processing results we've
already seen.
Logging all force-reset-test hints for every package causes
about 850 MB of logs in the last run of 880 MB of logs in total,
let's only log ones matching the package instead, as we do for
force-badtest.
In Ubuntu, we only fetch results on demand, so we might not
have seen the results yet.
Debian always fetches results at the beginning so has all the
data ready.
Due to the number of hints in standing use in Ubuntu, hints.search() is an
expensive operation, and we call it once for *every single test* referenced
from -proposed. Since force-reset-test are a small proportion of the hints
in use, searching once for all the hints of this type and only searching
this subset for each autopkgtest improves performance (with 23000
autopkgtests referenced in -proposed, this saves roughly 1 minute of
runtime, or 11% on a 9-minute britney run; the number of packages in
-proposed is typically much higher at other points in the release cycle,
therefore the absolute improvement in performance is expected to be
greater.)
The force-reset-test hints are an Ubuntu delta so this is not expected to be
upstreamed; and it could eventually be dropped if and when baseline
retesting is implemented in Ubuntu and the number of hints required drops.
This could be implemented with a more generic, elegant solution in
HintsCollection, but again, the scalability problem of hints is hopefully
short-lived so I didn't consider it worth the investment here.