Case studies

Six milestones where the framework fundamentally changed — plus supporting studies and engineering deep-dives. Read top to bottom for the chronological story, or jump to any milestone.

How we measured

Framing · read before the numbers below

Every case study below cites numbers — wall time, complexity units, cache-hit rates, throughput multipliers. This block explains where those numbers come from, how the measurement approach evolved across framework versions, and what the numbers can and can't prove.

The assumption — a natural experiment: Six sequential UX-alignment refactors ran across six FitMe screens. Identical scope (same phase list, same compliance checklist, same design-system target) meant any velocity difference between refactor 1 and refactor 6 could only come from screen complexity, practitioner learning, or framework evolution. Normalize for complexity and treat learning as roughly constant after the first run, and what's left is framework evolution — a controlled natural experiment with N=6 initial datapoints, now 17.
How measurement evolved: Three generations. v2.0–v5.2 — estimated: wall time from commit timestamps (±15–30 min), cache hit rates inferred from narrative. v6.0 — instrumented: per-phase timestamps, L1/L2/L3 cache counters, tokenizer-based overhead measurement, mandatory eval-coverage gate. v7.0 onwards— continuous factors: view-count tiers replaced binary "has UI", architectural-novelty replaced binary "new model".
The normalization model: Every feature converts to a single number — Complexity Units (CU) — via CU = Tasks × Work_Type_Weight × (1 + Σ Complexity_Factors). The primary metric is min/CU: wall time divided by CU; lower is better. This is what makes a 6.5-hour onboarding refactor comparable to a 54-minute 4-feature parallel run. Full formula →
How we analyzed results: Three comparison axes: framework-era averages (v2.0 → v7.0), work-type segmentation (refactor vs feature vs enhancement), and execution-mode (serial vs parallel). Trend fitted as a power law — R² = 0.87 under v2 factors. Rolling baselines replaced the single anchor to detect plateaus. Regressions documented honestly: two real ones (Training v4.0, Readiness v4.2), both attributed to measurable learning taxes from new framework capabilities. Full retrospective →
How we compared across features: Every case study ends with a Normalized velocity block that cites the same CU formula, making cross-comparison honest. A framework refactor and a new feature land on the same axis. A serial v5.1 run and a parallel v5.1 stress test land on the same axis. The full dataset was submitted for independent review — arithmetic verified, structure sound, weaknesses surfaced and mostly fixed in v6.0. External validation →
What this can't prove: Single practitioner. N=17 is small for robust regression. Of the 185 full-audit findings, only 11.4% are externally-automated (confirmed by build, test, or independent measurement); 78.9% are framework-only (AI assertion from code reading). All claims should be read as directional signals, not statistical certainties. The honest reporting of regressions and limitations is what makes the rest of the dataset trustworthy.

Milestones

Six inflection points across the framework's evolution — each one the study where something that had been a hypothesis became a result.

More case studies

Supporting studies and methodology notes — the work that validates, extends, or explains the milestones above.

Developer deep-dives

Engineering write-ups for readers who want the code-level story — SSR, animation plumbing, component design. Not required reading for the framework narrative.

Case studies

How we measured

Milestones

The Pilot — Running the Full PM Lifecycle on Onboarding

How 6 Screen Refactors Proved a 6.5x Speedup

Can You Test AI Output Quality the Same Way You Test Code?

Shipping 4 Features in 54 Minutes — The Parallel Stress Test

When We Stopped Estimating and Started Measuring

V7.0 HADF — Teaching the Framework to Detect Chip Architecture

More case studies

Developer deep-dives