It works like this. We wait until all tests have finished running. and
then grab their results. If there are any regressions, we mail each bug
with a link to pending-sru.html. There's a state file which records the
mails we've sent out, so that we don't mail the same bug multiple times.
The apt version comparison sorts 'blacklisted' greater than most version
numbers, which means that we accidentally apply force hints for version
'blacklisted' to all uploads. Since this is the only case of a hacked
version number, let's special case it so that 'blacklisted' hints only
match packages with 'blacklisted' version.
We're supposed to synthesise an "unknown" version for these, but a bug
in the worker meant we didn't do that in some cases and these leaked
into swift. Let's repair it client-side.
This means 'all tests were skipped', or 'no tests in this package'. The
former case is new after our recent autopkgtest rebase.
Upstream britney (624b185ba6) handles this
as "neutral", which doesn't influence the result and is a more explicit
(better) way of expressing this situation. Once we rebase, we should
take that approach.
We were just seeing an interesting crash in britney. It was trying to
look up the gpg_keys of ~python-modules-team, which is a suspended
account - in LP API terms that's a HTTP error 410.
https://api.launchpad.net/devel/~python-modules-team/gpg_keys
The direct error was fixed in commit 272f41c, but we actually should
*not* have been trying to look up this team's email address in the first
place. This upload was an auto-sync and so should not cause email to be
sent. The problem is that it was synced into universe and then promoted
into main. We were looking at the SPPH for after the promotion, which
has different values in the various signer/creator/sponsor/... fields,
and that made us think that it was a regular upload to email about.
Fix this by always looking at the oldest SPPH which should correspond to
the initial upload and not whatever happened to it afterwards.
We just had the autopkgtest queues DoSed because britney was crashing
after requesting each reverse dependency for a perl upload, but before
it had written pending.json out so it knew what not to request again.
This was 25,000 requests per arch...
Let's write pending.json straight after sending each request, so that
the next run - even after a crash - won't re-request the same things
again.
We should only run autopkgtests for testsuite triggers if the source
package has any binaries on the relevant architecture, as otherwise it
should be expected to fail.
In the previous iteration, if we were ever down/frozen/disabled long enough
to miss sending two mails in a row, we would see unintended "catch-up"
behavior where each subsequent run of britney would send a mail until the
right total number of mails had been sent. Don't do this; instead, catch us
up in one go to the most recent mail that should have been sent, avoiding
bunching of notifications.
This changes one of the tests also to match.
We were checkpointing after each email was sent to ensure that an aborted
p-m run didn't result in double emails; however, because the new cache only
contains records for packages we've seen so far during this run (to avoid
the cache growing without bounds over time), that means an aborted p-m run
*still* throws away records for all packages still waiting to be processed.
To fix this, we:
- only checkpoint records of writing emails during this in-progress run to
a temp file
- check for this temp file on britney startup, and if present, merge the
results into the current state
- move this temp file into the final cache location only at the end of the
britney run
Along the way, fix up a bug introduced in the previous commit that would have
us only saving state for those packages for which we sent email during the
current run, which would have quite bad effects.
The previous code had some issues with respect to how we decided whether to
send an email. The age used for calculating when the next mail should be
sent was saved as a float rather than an integer; since p-m runs never
happen exactly an integer number of days after upload, this results in a
cumulative error in the timing of the emails, that is further exacerbated if
a particular run is significantly delayed or if p-m infrastructure is down
for a period of time.
So instead, we now calculate the age at which the most recent email /should
have been sent/, and store that in our cache instead of the precise age.
There is still a bit of surprising behavior here due to the fact that we use
two different 'max_age' values for valid vs. invalid candidate packages: a
single package can, over the course of its stay in -proposed, move from
being an invalid candidate to being a valid candidate /and back again/
without ever migrating. Such a package will switch back and forth between
two sets of calculations based on different starting offsets, causing the
ages at which the emails are sent to vary in a non-obvious fashion.
However, this will still obey the general principle of "email reminders of
decreasing frequency", so I think this is acceptable given that it is still
an overall improvement in predictability.
LP: #1671468
I want to fix two bugs in interactions between other parts of britney
and the email policy. It's not currently easy to do so because we just
run the policy itself manually by creating some fake excuses.
Steal part of the machinery from the autopkgtest tests, and run a few
tests through britney completely. Use a fake SMTP server to record which
emails we sent.
(The port is hardcoded - that might not be so smart.)
So we can turn it off for the "notest" run and for the non-dev series.
This is a tristate
- 'yes': send email as normal
- 'dry-run': log what it would do, but send no email [nor update the
cache, so each run is effectively a fresh run]
- 'no': disable completely
Currently we re-trigger all reverse binary dependencies of a package,
including binary packages built from the same source. We already
explicity trigger the source's own tests if they still exist in unstable
- don't also consider the source when looking at reverse dependencies.
Add new autopkgtest policy: it determines the autopkgtests for a
source package (its own, direct reverse binary dependencies, and
Testsuite-Triggers), requests tests via AMQP, fetches results from swift, and
keeps track of pending tests between run. This also caches the downloaded
results from swift, as re-dowloading them all is very expensive.
This introduces two new hints:
* force-badtest pkg/ver[/arch]: Failing results for that package will be
ignored. This is useful to deal with broken tests that get imported from
Debian or are from under-maintained packages, or broke due to some
infrastructure changes. These are long-lived usually.
* force-skiptest pkg/ver: Test results *triggered by* that package (i. e.
reverse dependencies) will be ignored. This is mostly useful for landing
packages that trigger a huge amount of tests (glibc, perl) where some tests
are just too flaky to get them all passing, and one just wants to land it
after the remaining failures have been checked. This should be used rarely
and the hints should be removed immediately again.
Add integration tests that call britney in various scenarios on constructed
fake archives, with mocked AMQP and Swift results.