How this site stays honest
Every case study and line of code on this site is periodically reviewed by an independent external AI model — a neutral second pass that checks accuracy, honesty, and methodology. No curator-written spin; an outside model reads the same artifacts you do.
What the audit checks for
- Factual accuracy of every number in the case studies — commits, tests, metrics, timing claims
- Honest representation of failures alongside successes — nothing quietly dropped
- No cherry-picked data in comparisons
- No silent edits to historical claims (git history is preserved and publicly inspectable)
- No secrets or private data inadvertently published
Which models audit the site
Audits are performed by models from different vendors than the one that writes the site. The goal is independence — a model that had no hand in producing the artifacts it reviews.
- Google Gemini 2.5 Pro — inaugural external auditor, first pass on 2026-04-21.
Additional auditors (from other vendors and model families) will be added as they run. Each audit is archived verbatim with the date it was performed and the model that performed it. Nothing is silently re-audited or retracted.
Audit cadence
- On-demand after each framework version bump or major case-study batch.
- Minimum quarterly even if no version bump has occurred, to catch drift.
- Immediately if an internal structural check (the 72h integrity cycle) flags a regression that is not resolvable in-repo.
Latest audit results
2026-04-21 · Google Gemini 2.5 Pro · follow-up progress 2026-04-24
Mixed: methodologically strong, empirically weak on pre-v6.0 quantitative claims.
- Scope
- 24 showcase + 41 main-repo case studies + 3 internal meta-analyses
- Remediation progress (2026-04-24)
- 7 of 9 Tier 1/2/3 items fully shipped, 2 partial/pilot, 1 external-blocked. Remaining: real-provider auth verification (manual 7-step playbook), Tier 2.2 process adoption, Tier 3.2 trend after 3 integrity cycles, and Tier 1.1 adoption gap (see “New measurement gap” below).
- Well-supported
- Process documentation; internal arithmetic; post-v6.0 instrumentation; honest failure reporting
- Weak / uncertain
- Pre-v6.0 quantitative claims; causality of speedup; power-law predictive power; runtime correctness
- Corrections (same-day)
- Initial “3 broken PR citations” finding was a false positive propagated from the input meta-analysis — all three were GitHub issues, not PRs. Correction appended to the audit; issue #138 closed with full explanation.
- Honest downgrade (2026-04-23)
- Tier 1.1 (automated time/event metrics) was initially marked “done” on 2026-04-21; the 2026-04-23 status pass downgraded it to “partial” because system-wide adoption is still incomplete. Publishing the downgrade rather than leaving the overstatement in place.
- Tier advancement (2026-04-24)
- Tier 1.2 (integrate with sources of truth) promoted from partial to fully shipped — a pre-commit hook now verifies that the
phases.merge.pr_numberfield resolves on GitHub at write time, not just at audit time. Tier 2.2 (contemporaneous logging) expanded from pilot to 5 live logs, with fresh scaffolds for app-store-assets, import-training-plan, and push-notifications. - New measurement gap (2026-04-24)
- A fresh integrity run found
cache_hitsat 0 of 40 across the feature corpus. The v6.0 measurement protocol defined the field, but no feature session actually writes to it — distinct from Tier 1.1's adoption gap. Filed as issue #140 rather than silently fixed. Measurable viamake measurement-adoption. - Framework evolution (2026-04-24)
- Framework v7.1 → v7.5 (“Data Integrity Framework”) shipped 2026-04-24 as a direct extension of the Gemini audit remediation — the same-direction response to “our own measurement was the weakest link.” Details in the audit archive.
Framework advancement — what the audit could and couldn't verify
The chart below plots each significant feature against the framework version it shipped under. Solid dots are T1 instrumented data — the post-v6.0 era Gemini endorsed as reliable. Dashed outlines are pre-v6.0 T3 narrative estimates, which the audit flagged as unreliable and which are shown here for context rather than as trend evidence. The trend line only connects T1 points.
How we act on audit findings
Findings are not silently fixed. When an audit surfaces a broken citation, a methodological flaw, or a data inconsistency, we:
- Publish the audit verbatim — even before corrections are made, so the raw finding is visible.
- File a tracked issue on the underlying code repository (e.g., FitTracker2) with the remediation plan.
- Append corrections, don't overwrite. The audit archive and its git history show what was found, what was done, and when.
- Harden the processso the same class of finding can't recur (e.g., the 72h integrity cycle and the “Auditor Agent” recommended by Gemini).
Everything on this site and its underlying code is open for inspection at github.com/Regevba/fitme-story and github.com/Regevba/fitme-showcase.