We just had the autopkgtest queues DoSed because britney was crashing
after requesting each reverse dependency for a perl upload, but before
it had written pending.json out so it knew what not to request again.
This was 25,000 requests per arch...
Let's write pending.json straight after sending each request, so that
the next run - even after a crash - won't re-request the same things
again.
We should only run autopkgtests for testsuite triggers if the source
package has any binaries on the relevant architecture, as otherwise it
should be expected to fail.
In the previous iteration, if we were ever down/frozen/disabled long enough
to miss sending two mails in a row, we would see unintended "catch-up"
behavior where each subsequent run of britney would send a mail until the
right total number of mails had been sent. Don't do this; instead, catch us
up in one go to the most recent mail that should have been sent, avoiding
bunching of notifications.
This changes one of the tests also to match.
We were checkpointing after each email was sent to ensure that an aborted
p-m run didn't result in double emails; however, because the new cache only
contains records for packages we've seen so far during this run (to avoid
the cache growing without bounds over time), that means an aborted p-m run
*still* throws away records for all packages still waiting to be processed.
To fix this, we:
- only checkpoint records of writing emails during this in-progress run to
a temp file
- check for this temp file on britney startup, and if present, merge the
results into the current state
- move this temp file into the final cache location only at the end of the
britney run
Along the way, fix up a bug introduced in the previous commit that would have
us only saving state for those packages for which we sent email during the
current run, which would have quite bad effects.
The previous code had some issues with respect to how we decided whether to
send an email. The age used for calculating when the next mail should be
sent was saved as a float rather than an integer; since p-m runs never
happen exactly an integer number of days after upload, this results in a
cumulative error in the timing of the emails, that is further exacerbated if
a particular run is significantly delayed or if p-m infrastructure is down
for a period of time.
So instead, we now calculate the age at which the most recent email /should
have been sent/, and store that in our cache instead of the precise age.
There is still a bit of surprising behavior here due to the fact that we use
two different 'max_age' values for valid vs. invalid candidate packages: a
single package can, over the course of its stay in -proposed, move from
being an invalid candidate to being a valid candidate /and back again/
without ever migrating. Such a package will switch back and forth between
two sets of calculations based on different starting offsets, causing the
ages at which the emails are sent to vary in a non-obvious fashion.
However, this will still obey the general principle of "email reminders of
decreasing frequency", so I think this is acceptable given that it is still
an overall improvement in predictability.
LP: #1671468
I want to fix two bugs in interactions between other parts of britney
and the email policy. It's not currently easy to do so because we just
run the policy itself manually by creating some fake excuses.
Steal part of the machinery from the autopkgtest tests, and run a few
tests through britney completely. Use a fake SMTP server to record which
emails we sent.
(The port is hardcoded - that might not be so smart.)
So we can turn it off for the "notest" run and for the non-dev series.
This is a tristate
- 'yes': send email as normal
- 'dry-run': log what it would do, but send no email [nor update the
cache, so each run is effectively a fresh run]
- 'no': disable completely
Currently we re-trigger all reverse binary dependencies of a package,
including binary packages built from the same source. We already
explicity trigger the source's own tests if they still exist in unstable
- don't also consider the source when looking at reverse dependencies.
Add new autopkgtest policy: it determines the autopkgtests for a
source package (its own, direct reverse binary dependencies, and
Testsuite-Triggers), requests tests via AMQP, fetches results from swift, and
keeps track of pending tests between run. This also caches the downloaded
results from swift, as re-dowloading them all is very expensive.
This introduces two new hints:
* force-badtest pkg/ver[/arch]: Failing results for that package will be
ignored. This is useful to deal with broken tests that get imported from
Debian or are from under-maintained packages, or broke due to some
infrastructure changes. These are long-lived usually.
* force-skiptest pkg/ver: Test results *triggered by* that package (i. e.
reverse dependencies) will be ignored. This is mostly useful for landing
packages that trigger a huge amount of tests (glibc, perl) where some tests
are just too flaky to get them all passing, and one just wants to land it
after the remaining failures have been checked. This should be used rarely
and the hints should be removed immediately again.
Add integration tests that call britney in various scenarios on constructed
fake archives, with mocked AMQP and Swift results.
We don't use os.makedirs(dir, exist_ok=True) as that is too strict: it fails if
the directory already exists with different permissions (e. g. with 775). Thus
introduce a helper function ensuredir().
Strip of Multi-Arch qualifiers ":any" and ":native" when building the
dependency fields, as they are not part of the package name.
This will fix cases like
Package: ipython3
Depends: python3:any (>= 3)
and include ipython3 in python3's reverse dependencies.
Closes: #794194