Skip to content

Python Type Checker Benchmarks

Python Type Checker Benchmarks

Reproducible, versioned wall-clock benchmarks for the major Python type checkers across a fixed set of public Python codebases.

Coming soon. The methodology is in its pre-freeze engagement window with tool maintainers. The first benchmark cycle ships when the freeze locks. Read the methodology for what will be measured and why.

What this is

Every benchmark cycle measures cold-cache and warm-cache wall-clock check times for ty, pyrefly, mypy, Pyright, basedpyright, and Zuban against requests, rich, FastAPI, django, and pandas. Cycles run biweekly. Per-cell median and IQR are reported on this page; full per-run manifests live in R2 and are linked from each cell.

What this is not

A marketing comparison. The benchmarks do not measure false-positive rate, ergonomics, IDE responsiveness, or error-message quality. They are one input into tool selection.

Methodology

The methodology is versioned and frozen six months at a time. Read it in full at /methodology. Tool maintainers are invited to comment before each freeze; their responses are recorded publicly.