ubuntutools/misc: swap iter_content for raw stream

This is a partial revert of 1e20363.

When downloading a .diff.gz source package file, we do expect it to be
written to disk still compressed. If we were to uncompress it, then we
would get a size mismatch and even if we were to ignore that, we'd get a
hash mismatch.

On the other hand when downloading a changes file we need to make sure
that is written to disk uncompressed.

To make this work in both cases we can ask the HTTP server for no
special content encoding using "Accept-Encoding: identity". This is what
wget requests, for example. Then we can write the output to the file
without performing any decoding at our end by using the raw response
object again.

This fixes both cases.

LP: #2025748
This commit is contained in:
Robie Basak 2023-07-05 15:32:12 +01:00
parent ff1c95e2c0
commit 232a73de31

View File

@ -348,7 +348,11 @@ def download(src, dst, size=0, *, blocksize=DOWNLOAD_BLOCKSIZE_DEFAULT):
with tempfile.TemporaryDirectory() as tmpdir: with tempfile.TemporaryDirectory() as tmpdir:
tmpdst = Path(tmpdir) / "dst" tmpdst = Path(tmpdir) / "dst"
try: try:
with requests.get(src, stream=True, timeout=60, auth=auth) as fsrc: # We must use "Accept-Encoding: identity" so that Launchpad doesn't
# compress changes files. See LP: #2025748.
with requests.get(
src, stream=True, timeout=60, auth=auth, headers={"accept-encoding": "identity"}
) as fsrc:
with tmpdst.open("wb") as fdst: with tmpdst.open("wb") as fdst:
fsrc.raise_for_status() fsrc.raise_for_status()
_download(fsrc, fdst, size, blocksize=blocksize) _download(fsrc, fdst, size, blocksize=blocksize)
@ -433,7 +437,16 @@ def _download(fsrc, fdst, size, *, blocksize):
downloaded = 0 downloaded = 0
try: try:
for block in fsrc.iter_content(blocksize): while True:
# We use fsrc.raw so that compressed files stay compressed as we
# write them to disk. For example, if this is a .diff.gz, then it
# needs to remain compressed and unmodified to remain valid as part
# of a source package later, even though Launchpad sends
# "Content-Encoding: gzip" and the requests library therefore would
# want to decompress it. See LP: #2025748.
block = fsrc.raw.read(blocksize)
if not block:
break
fdst.write(block) fdst.write(block)
downloaded += len(block) downloaded += len(block)
progress_bar.update(downloaded, size) progress_bar.update(downloaded, size)