Python Type Checker Benchmarks
Python Type Checker Benchmarks
Reproducible, versioned wall-clock benchmarks for the major Python type checkers across a fixed set of public Python codebases.
What this is
Every benchmark cycle measures cold-cache and warm-cache wall-clock check times for ty, pyrefly, mypy, Pyright, basedpyright, and Zuban against requests, rich, FastAPI, django, and pandas. Cycles run biweekly. Per-cell median and IQR are reported on this page; full per-run manifests live in R2 and are linked from each cell.
What this is not
A marketing comparison. The benchmarks do not measure false-positive rate, ergonomics, IDE responsiveness, or error-message quality. They are one input into tool selection.
Methodology
The methodology is versioned and frozen six months at a time. Read it in full at /methodology. Tool maintainers are invited to comment before each freeze; their responses are recorded publicly.