Cross-corpus comparison
Every case study at a glance
46 shipped case studies, sortable and filterable. Click any row to open the full study. The table reads frontmatter directly — when a study lands or its frontmatter updates, this page reflects it on the next deploy.
Showing 46 of 46 case studies
| Title | TL;DR | Headline | Open | ||||
|---|---|---|---|---|---|---|---|
| — | The Blank-Main Bug — Catching a Production SSR Regression After Promoting | 2026-04-20 | — | appendix | Production site shipped 251 bytes of HTML per page across 48 routes. Suspense + useSearchParams in the root layout susp… | 251 — Bytes per page across 48 routes | |
| — | Watching the Framework Build the Site That's Replaying It — The DispatchReplay Component | 2026-04-21 | — | appendix | Static blueprint diagrams tell you what a system is, not whether it's running. DispatchReplay plays a recorded trace of… | 2 (Sprint I + fitme-story meta) — Live traces | |
| — | The Lego Metaphor — Designing the PM-Flow Ecosystem Page | 2026-04-22 | — | appendix | How to render an 11-skill, 10-phase, 15-data-file framework on a single page without it looking like an org chart. The … | 11 · 10 · 15 — Skills × phases × shared files | |
| — | External Validation — Did Our Numbers Hold Up? | 2026-04-16 | — | appendix | Independent review of the normalization model, velocity claims, and measurement methodology — confirming what is solid,… | 5 (normalization, velocity, cache, complexity, baselines) — Methodology dimensions reviewed | |
| — | What If We Had Measurement From Day One? — A Retrospective ROI Analysis | 2026-04-16 | — | appendix | Counterfactual experiment: retroactively applying deterministic measurement infrastructure to all 24 features, then com… | 24 — Features reanalysed under counterfactual | |
| — | How We Normalized Complexity Across 16 Different Features | 2026-04-16 | — | appendix | Raw metrics like wall time and file count are meaningless without normalization. The Complexity Unit (CU) model — addit… | 16 — Features normalised under CU model | |
| 7.8.1 | Framework v7.8.1 — Branch Isolation + Feature-Closure Completeness from One Full PM Cycle | 2026-05-07 | — | light | Two cooperating pre-commit gates shipped as one feature in advisory mode on 2026-05-07. BRANCH_ISOLATION_VIOLATION prev… | FT2 #244 + #245 + #246; fitme-story #53 — PRs merged | |
| 7.8.1 | UCC Passkey Auth — Replacing basic-auth on the operator dashboard with WebAuthn | 2026-05-07 | — | light | Two-PR cross-repo ship that replaces shared HTTP basic-auth on /control-room/* with WebAuthn passkeys. Per-operator ide… | fitme-story #55 + FT2 #248 — PRs merged | |
| 7.8 | Bridge to v7.9 — How v7.8 Closed the v7.7 Silent-Pass | 2026-05-03 | — | light | v7.7 shipped a gate (CACHE_HITS_EMPTY_POST_V6) that ran on every commit but exercised data on 0 of 46 features — a text… | 0/46 (silent-pass) — CACHE_HITS_EMPTY_POST_V6 effective coverage at v7.7 ship | |
| 7.8 | Smart Reminders Behavioral Learning — PR-1 Shipped Across iOS + Backend | 2026-05-04 | — | light | Sub-feature of Smart Reminders. PR-1 shipped fully on 2026-05-04 in two halves — FT2 PR #190 (iOS data layer + Settings… | 15 / 15 — PR-1 tasks: planned / complete | |
| 7.8 | Unified Control Center — Migrating an Operator Dashboard Across Two Repos in 11 Days | 2026-05-06 | — | light | Retired the legacy Astro operator dashboard, migrated it inside the public showcase as a basic-auth-gated /control-room… | 42 / 44 — Tasks done | |
| 7.8 | Import Training Plan — Resume from Audit-Flagged Partial Ship to Full Phase 1 Ship in 14 Hours | 2026-05-06 | — | light | Resumed an audit-flagged partial-ship feature: rolled back to research mid-flight after discovering the original PRD cl… | 4 — PRs landed | |
| 7.8 | Push Notifications v2 — From v1 Partial-Ship to Platform-Layer Rebuild in a Single Day | 2026-05-07 | — | light | Reopened v1 push-notifications after audit UI-016 caught a substrate-built-but-never-wired partial-ship. Single-session… | FT2 #239 — PR landed | |
| 7.7 | Case-Study Presentation Refactor — Locking Alt-A Across 25 Studies | 2026-04-28 | — | light | An 18-hour serial sprint locked a uniform presentation pattern across 25 case studies — every one now leads with a Summ… | 25 of 25 — Case studies backfilled | |
| 7.7 | Validity Closure — How v7.7 Closed the Last Closable Class B Gap | 2026-04-28 | — | light | v7.7 closes A1–A5 + B1–B2 + C1 from the post-v7.6 gap inventory: 5 new gates (4 write-time pre-commit hooks + 1 cycle-t… | 2 of 9 — Post-v6 fully-adopted features | |
| 7.7 | stats-v2 — Resume, Reconcile, and Ship Despite a Three-Layer Bug Stack | 2026-04-30 | — | light | stats-v2 was the first paused feature picked up after v7.7 shipped. A "small remaining tasks" job turned into a three-l… | ~2.5 hours — Wall time (resume → ship) | |
| 7.7 | When the Gate Read One Field and the Data Lived in Another | 2026-05-01 | — | light | The v7.7 case study claimed cache_hits was "100% gated on next write." A 2026-04-30 sweep found 43 of 46 state.json fil… | 43 — state.json files migrated | |
| 7.7 | HADF Phase 2 — Cloud Fingerprinting Measurement | 2026-05-01 | — | light | Pre-registered measurement experiment to test whether cloud inference endpoints cluster naturally by hardware class via… | 0.5566 — Silhouette score (max across k=2..6) | |
| 7.6 | Mechanical Enforcement — How v7.6 Closed the Class B Gap from Gemini's Audit | 2026-04-25 | — | light | v7.6 promoted four agent-attention checks into pre-commit failures and added two recurring CI defenses, closing the rem… | 7 of 7 — Class B → A promotions | |
| 7.6 | auth-polish-v2 — Three Workstreams Bundled, 18/18 Tasks Shipped | 2026-05-01 | — | light | auth-polish-v2 bundled three auth workstreams — forgot-password, biometric activation refinement, Google Sign-In SDK ac… | 18 / 18 — PRD tasks: planned / shipped | |
| 7.1 | UI-Audit Baseline Burndown — P0 27 → 0 and a Hard Gate | 2026-04-24 | — | light | The make ui-audit scanner had been running advisory-only since shipping — 27 P0 + 103 P1 across 44 files. This burndown… | 27 → 0 — P0 findings | |
| 6.1 | V7.0 HADF — Teaching the Framework to Detect Chip Architecture | 2026-04-17 | — | flagship | V7.0 HADF asks whether passive hardware fingerprinting can improve dispatch routing without requiring provider cooperat… | 17 — Static chip profiles | |
| 6.1 | 185 Findings, 12 Critical — What a Full-System Audit Revealed | 2026-04-18 | — | light | The same AI that built the framework audited its own work across 22 features and 6 framework versions, using a 4-layer … | 185 — Total findings surfaced | |
| 6.1 | Building the Site That Tells the Story — A Two-Hour Meta-Build | 2026-04-20 | — | standard | PM framework built the website that hosts its own case studies. 37 commits over 2 hours of wall clock. 8 routes, 36 pre… | 2h 2m — First commit → preview deploy | |
| 6.1 | The Dual-Sync Race — Two Backends, One Last-Writer-Wins Silence | 2026-04-18 | — | light | v7.0 audit's top backend finding — two sync paths (CloudKit + Supabase) both pull on login with no merge coordination. … | 13 / 3 — Sync findings / critical | |
| 6.1 | The Stacked-PR Misfire — When "Merged" Didn't Mean "On Main" | 2026-04-19 | — | light | 3 stacked PRs (M-2a → M-2c → M-2b). All marked "Merged" by GitHub. Main only got M-2a — downstream PRs merged into thei… | 3 / 1 — PRs claimed merged / actually on main | |
| 6.1 | The XCTWaiter Abort — Learning to Stop, Rollback, and Retry | 2026-04-20 | — | light | First M-series attempt-1 failure. XCTWaiter.wait(for: [a, b, c]) is wait-for-ALL, not wait-for-ANY — but the test code … | 2 (attempt 1 aborted, attempt 2 passed) — Attempts to ship | |
| 6.0 | When We Stopped Estimating and Started Measuring | 2026-04-16 | — | flagship | Through 16 features, every velocity claim rested on ±15–30 min wall-time estimates and narrative-inferred cache hit rat… | 7 of 9 — Measurement DVs now deterministic | |
| 5.2 | What Breaks When You Run 4 Features at Once — And How to Fix It | 2026-04-15 | — | flagship | A 4-feature parallel stress test exposed two bottlenecks (52% permission-routing denial, 23× agent variance) — context … | 48% — Tool-use reduction with dispatch intelligence | |
| 5.2 | From "Zero Conflicts by Luck" to "Zero Conflicts by Design" | 2026-04-15 | — | light | v5.1 stress test had 0 file conflicts across 15 same-file edits — by luck, not by design. v5.2 Parallel Write Safety (s… | 15 across 3 files — Same-file edits without conflict | |
| 5.1 | The Fastest Feature — 86% Velocity Improvement on Auth Flow | 2026-04-13 | — | flagship | Auth embedded into onboarding mid-flow (vs separate hub). Full PM lifecycle plus 3 design iterations in a single 100-mi… | 86% (2.1 min/CU) — Velocity vs baseline | |
| 5.1 | First Feature Under the New Architecture — AI Engine Adaptation | 2026-04-13 | — | flagship | First feature where 'how we build' and 'what we build' used the same architectural principles. Adapter protocol, valida… | 45% — Cache hit rate (framework → product transfer) | |
| 5.1 | Shipping 4 Features in 54 Minutes — The Parallel Stress Test | 2026-04-14 | — | flagship | 4 features advanced through 8 lifecycle phases concurrently in 54 minutes. 0 build failures, 0 test failures, 0 merge c… | 12.4× — Parallel throughput vs baseline | |
| 5.1 | Smart Reminders — Six Reminder Types Designed and Shipped Inside a 12-Hour Stress Test | 2026-04-20 | — | light | Six reminder types, a reusable guest-lock overlay, and a frequency-cap engine — all designed, specified, implemented, a… | ~12h — Wall-clock from init to complete | |
| 5.0 | What If You Designed Software Like a Chip? | 2026-04-12 | — | flagship | 7 hardware architecture principles applied to a PM framework — LoRA hot-swap, palettization, weight-stationary, UMA zer… | 121,714 → 45,125 — Framework overhead tokens | |
| 5.0 | SettingsView v2 — 1170 → 294 Lines via Phased Decomposition | 2026-04-19 | — | light | Closing audit finding UI-002: SettingsView.swift dropped 1170 → 294 lines (~75%) across 4 PRs (#122-#125) in a ~2-hour … | 1170 → 294 — SettingsView.swift line count | |
| 5.0 | Training Plan v2 — Biggest Surface in the App, Stress-Test for the v4.0 Cache | 2026-04-10 | — | light | The biggest surface in the app — 2,135 lines, 13 nested types, 32 audit findings — broken into 6 extracted views via th… | 2,135 lines (largest in app) — v1 file size | |
| 4.4 | Can You Test AI Output Quality the Same Way You Test Code? | 2026-04-09 | — | flagship | AI-output quality treated as a testable property. 2-layer eval design (golden I/O + heuristic XCTest evals + monitoring… | 29 of 29 — Eval cases green on first run | |
| 4.4 | The Most Complex Feature Completed at Refactor Speed | 2026-04-10 | — | light | First greenfield feature under v4.4 — new tab, new data model, 9 eval definitions, 5 views. Shipped in 2 hours end-to-e… | ~2 hours — Wall time (research → Figma screen) | |
| 4.3 | How 6 Screen Refactors Proved a 6.5x Speedup | 2026-04-10 | — | flagship | Six identical-scope screen refactors across four framework versions isolated framework improvement from practitioner le… | 6.5× — Speedup across 6 refactors | |
| 2.0 | The Pilot — Running the Full PM Lifecycle on Onboarding | 2026-04-05 | — | flagship | First feature run through the full 9-phase PM lifecycle. Retroactive UX/design-system alignment on a "finished" feature… | 24 — Audit findings (Onboarding) | |
| pre-v5.0 | Home Today Screen v2 — The V2 Rule Pilot and Birth of Screen-Prefixed Analytics | 2026-04-09 | — | light | The second screen to go through a full UX Foundations alignment pass, and the first under the now-codified V2 Rule. Hom… | 1029 → 703 lines (~32% reduction) — v1 → v2 line count | |
| pre-v5.0 | Backlog Roundup — Two Pre-Rule Features That Stay Roundup-Only | 2026-04-20 | — | light | Roundup of two features that pre-date the 2026-04-13 "every feature gets a case study" rule and have source material to… | 2 (development-dashboard, ai-cohort-intelligence) — Features in this housekeeping roundup | |
| pre-v5.0 | Android Design System — A Documentation Deliverable That Skipped Six Phases on Purpose | 2026-04-04 | — | light | A 92-token iOS → Material Design 3 mapping that ships zero lines of Android code on purpose. Research + PRD execute nor… | 92/92 — Tokens mapped (iOS → MD3) | |
| pre-v5.0 | GDPR Compliance — Two Hours End-to-End on a Legal-Blocker Feature | 2026-04-04 | — | light | The first feature in the project where kill criteria read "Legal requirement — cannot be killed." Shipped 8 files, +711… | 2 hours — Wall time (init → complete) | |
| pre-v5.0 | Google Analytics — The Substrate Every Downstream Feature Depends On | 2026-04-04 | — | light | The feature that went from "11 shipped features, 40 defined metrics, zero analytics instrumentation" to a working GA4 p… | 22 files / +1970 −39 — Files / lines (merge ac85c73) |
- v—2026-04-20
Production site shipped 251 bytes of HTML per page across 48 routes. Suspense + useSearchParams in the root layout suspended the entire chi…
251 — Bytes per page across 48 routes
- v—2026-04-21
Static blueprint diagrams tell you what a system is, not whether it's running. DispatchReplay plays a recorded trace of a real feature flow…
2 (Sprint I + fitme-story meta) — Live traces
- v—2026-04-22
How to render an 11-skill, 10-phase, 15-data-file framework on a single page without it looking like an org chart. The answer: a Lego wall …
11 · 10 · 15 — Skills × phases × shared files
- v—2026-04-16
Independent review of the normalization model, velocity claims, and measurement methodology — confirming what is solid, flagging what is we…
5 (normalization, velocity, cache, complexity, baselines) — Methodology dimensions reviewed
- v—2026-04-16
Counterfactual experiment: retroactively applying deterministic measurement infrastructure to all 24 features, then computing the cost, the…
24 — Features reanalysed under counterfactual
- v—2026-04-16
Raw metrics like wall time and file count are meaningless without normalization. The Complexity Unit (CU) model — additive factors for task…
16 — Features normalised under CU model
- v7.8.12026-05-07
Two cooperating pre-commit gates shipped as one feature in advisory mode on 2026-05-07. BRANCH_ISOLATION_VIOLATION prevents agents from mut…
FT2 #244 + #245 + #246; fitme-story #53 — PRs merged
- v7.8.12026-05-07
Two-PR cross-repo ship that replaces shared HTTP basic-auth on /control-room/* with WebAuthn passkeys. Per-operator identity, per-device cr…
fitme-story #55 + FT2 #248 — PRs merged
- v7.82026-05-03
v7.7 shipped a gate (CACHE_HITS_EMPTY_POST_V6) that ran on every commit but exercised data on 0 of 46 features — a textbook silent-pass. Th…
0/46 (silent-pass) — CACHE_HITS_EMPTY_POST_V6 effective coverage at v7.7 ship
- v7.82026-05-04
Sub-feature of Smart Reminders. PR-1 shipped fully on 2026-05-04 in two halves — FT2 PR #190 (iOS data layer + Settings toggle-off, squash …
15 / 15 — PR-1 tasks: planned / complete
- v7.82026-05-06
Retired the legacy Astro operator dashboard, migrated it inside the public showcase as a basic-auth-gated /control-room route, instrumented…
42 / 44 — Tasks done
- v7.82026-05-06
Resumed an audit-flagged partial-ship feature: rolled back to research mid-flight after discovering the original PRD claimed an impossible …
4 — PRs landed
- v7.82026-05-07
Reopened v1 push-notifications after audit UI-016 caught a substrate-built-but-never-wired partial-ship. Single-session full PM cycle (Phas…
FT2 #239 — PR landed
- v7.72026-04-28
An 18-hour serial sprint locked a uniform presentation pattern across 25 case studies — every one now leads with a SummaryCard, a "how to r…
25 of 25 — Case studies backfilled
- v7.72026-04-28
v7.7 closes A1–A5 + B1–B2 + C1 from the post-v7.6 gap inventory: 5 new gates (4 write-time pre-commit hooks + 1 cycle-time check + 1 adviso…
2 of 9 — Post-v6 fully-adopted features
- v7.72026-04-30
stats-v2 was the first paused feature picked up after v7.7 shipped. A "small remaining tasks" job turned into a three-layer reveal: the sta…
~2.5 hours — Wall time (resume → ship)
- v7.72026-05-01
The v7.7 case study claimed cache_hits was "100% gated on next write." A 2026-04-30 sweep found 43 of 46 state.json files used the legacy `…
43 — state.json files migrated
- v7.72026-05-01
Pre-registered measurement experiment to test whether cloud inference endpoints cluster naturally by hardware class via TTFT/TPS alone. Pre…
0.5566 — Silhouette score (max across k=2..6)
- v7.62026-04-25
v7.6 promoted four agent-attention checks into pre-commit failures and added two recurring CI defenses, closing the remaining Class B → Cla…
7 of 7 — Class B → A promotions
- v7.62026-05-01
auth-polish-v2 bundled three auth workstreams — forgot-password, biometric activation refinement, Google Sign-In SDK activation — into one …
18 / 18 — PRD tasks: planned / shipped
- v7.12026-04-24
The make ui-audit scanner had been running advisory-only since shipping — 27 P0 + 103 P1 across 44 files. This burndown migrated 12 view fi…
27 → 0 — P0 findings
- v6.12026-04-17
V7.0 HADF asks whether passive hardware fingerprinting can improve dispatch routing without requiring provider cooperation. 5-layer archite…
17 — Static chip profiles
- v6.12026-04-18
The same AI that built the framework audited its own work across 22 features and 6 framework versions, using a 4-layer methodology (paralle…
185 — Total findings surfaced
- v6.12026-04-20
PM framework built the website that hosts its own case studies. 37 commits over 2 hours of wall clock. 8 routes, 36 pre-rendered pages, 12 …
2h 2m — First commit → preview deploy
- v6.12026-04-18
v7.0 audit's top backend finding — two sync paths (CloudKit + Supabase) both pull on login with no merge coordination. Last writer wins. No…
13 / 3 — Sync findings / critical
- v6.12026-04-19
3 stacked PRs (M-2a → M-2c → M-2b). All marked "Merged" by GitHub. Main only got M-2a — downstream PRs merged into their stacked parents, n…
3 / 1 — PRs claimed merged / actually on main
- v6.12026-04-20
First M-series attempt-1 failure. XCTWaiter.wait(for: [a, b, c]) is wait-for-ALL, not wait-for-ANY — but the test code reads like 'any'. Ap…
2 (attempt 1 aborted, attempt 2 passed) — Attempts to ship
- v6.02026-04-16
Through 16 features, every velocity claim rested on ±15–30 min wall-time estimates and narrative-inferred cache hit rates. v6.0 instrumente…
7 of 9 — Measurement DVs now deterministic
- v5.22026-04-15
A 4-feature parallel stress test exposed two bottlenecks (52% permission-routing denial, 23× agent variance) — context window pressure was …
48% — Tool-use reduction with dispatch intelligence
- v5.22026-04-15
v5.1 stress test had 0 file conflicts across 15 same-file edits — by luck, not by design. v5.2 Parallel Write Safety (snapshot/rollback + r…
15 across 3 files — Same-file edits without conflict
- v5.12026-04-13
Auth embedded into onboarding mid-flow (vs separate hub). Full PM lifecycle plus 3 design iterations in a single 100-min session — first fe…
86% (2.1 min/CU) — Velocity vs baseline
- v5.12026-04-13
First feature where 'how we build' and 'what we build' used the same architectural principles. Adapter protocol, validation gate, analytics…
45% — Cache hit rate (framework → product transfer)
- v5.12026-04-14
4 features advanced through 8 lifecycle phases concurrently in 54 minutes. 0 build failures, 0 test failures, 0 merge conflicts across 31 s…
12.4× — Parallel throughput vs baseline
- v5.12026-04-20
Six reminder types, a reusable guest-lock overlay, and a frequency-cap engine — all designed, specified, implemented, and test-covered duri…
~12h — Wall-clock from init to complete
- v5.02026-04-12
7 hardware architecture principles applied to a PM framework — LoRA hot-swap, palettization, weight-stationary, UMA zero-copy, mixed precis…
121,714 → 45,125 — Framework overhead tokens
- v5.02026-04-19
Closing audit finding UI-002: SettingsView.swift dropped 1170 → 294 lines (~75%) across 4 PRs (#122-#125) in a ~2-hour single-session decom…
1170 → 294 — SettingsView.swift line count
- v5.02026-04-10
The biggest surface in the app — 2,135 lines, 13 nested types, 32 audit findings — broken into 6 extracted views via the V2 Rule. First v2 …
2,135 lines (largest in app) — v1 file size
- v4.42026-04-09
AI-output quality treated as a testable property. 2-layer eval design (golden I/O + heuristic XCTest evals + monitoring schema) and a lifec…
29 of 29 — Eval cases green on first run
- v4.42026-04-10
First greenfield feature under v4.4 — new tab, new data model, 9 eval definitions, 5 views. Shipped in 2 hours end-to-end. Stress-tested wh…
~2 hours — Wall time (research → Figma screen)
- v4.32026-04-10
Six identical-scope screen refactors across four framework versions isolated framework improvement from practitioner learning. Wall time fe…
6.5× — Speedup across 6 refactors
- v2.02026-04-05
First feature run through the full 9-phase PM lifecycle. Retroactive UX/design-system alignment on a "finished" feature surfaced 24 finding…
24 — Audit findings (Onboarding)
- vpre-v5.02026-04-09
The second screen to go through a full UX Foundations alignment pass, and the first under the now-codified V2 Rule. Home v2 produced two pr…
1029 → 703 lines (~32% reduction) — v1 → v2 line count
- vpre-v5.02026-04-20
Roundup of two features that pre-date the 2026-04-13 "every feature gets a case study" rule and have source material too thin to support a …
2 (development-dashboard, ai-cohort-intelligence) — Features in this housekeeping roundup
- vpre-v5.02026-04-04
A 92-token iOS → Material Design 3 mapping that ships zero lines of Android code on purpose. Research + PRD execute normally; Tasks / UX / …
92/92 — Tokens mapped (iOS → MD3)
- vpre-v5.02026-04-04
The first feature in the project where kill criteria read "Legal requirement — cannot be killed." Shipped 8 files, +711 lines, full 10-phas…
2 hours — Wall time (init → complete)
- vpre-v5.02026-04-04
The feature that went from "11 shipped features, 40 defined metrics, zero analytics instrumentation" to a working GA4 pipeline with protoco…
22 files / +1970 −39 — Files / lines (merge ac85c73)