Streaming Responses and Range Requests
Serve large files with StreamingResponse and support HTTP range requests for resumable downloads.
Why Stream Responses?
By default, returning a file from FastAPI means loading the entire payload into memory before sending it. For a 2 GB video that is a disaster: memory spikes, slow first byte, and crashes under concurrency.
Streaming solves this by sending the body in small chunks as they become available. The server holds only one chunk at a time, and the client starts receiving data almost immediately.
StreamingResponse— wraps a generator/iterator that yields bytes.FileResponse— a convenience for serving a file from disk efficiently.- Range requests — let clients fetch only part of a file (seeking, resuming).
This lesson builds all three, ending with resumable downloads.
A Generator That Yields Bytes
Streaming starts with an iterable of bytes. The cleanest source is a Python generator that reads a file in fixed-size chunks instead of all at once.
Here is the core idea, isolated from any framework. The generator yields 1 MB at a time, so peak memory stays tiny no matter how large the file is.
def file_chunks(path, chunk_size=1024 * 1024):
with open(path, "rb") as f:
while True:
chunk = f.read(chunk_size)
if not chunk:
break
yield chunk
if __name__ == "__main__":
import os
with open("sample.bin", "wb") as f:
f.write(b"x" * (3 * 1024 * 1024 + 17))
total = 0
pieces = 0
for chunk in file_chunks("sample.bin"):
total += len(chunk)
pieces += 1
print("bytes:", total)
print("chunks:", pieces)
os.remove("sample.bin")All lessons in this course
- Multipart Uploads and Content Validation
- Streaming Responses and Range Requests
- Offloading Storage to S3-Compatible Buckets
- Async Image and Document Transformation