iOS UI Audit P1 Burndown — 2-PR Token Substitution + Audit Hardening

Context

The parent feature ui-audit-baseline-burndown shipped 2026-04-24 with the make ui-audit scanner promoted to a hard verify-local gate. At that moment the codebase sat at 0 P0 + 103 P1 across 42 files. The P0=0 gate ensures no NEW P0 ever lands; the 103 P1 findings stayed under "fix-as-you-touch" — addressed only when a PR happened to touch the file.

By 2026-05-11 the P1 count had drifted to 108 (+5 from baseline) despite the fix-as-you-touch policy. This enhancement shipped a proactive burndown instead of waiting for organic touches.

The strategy decision

Initial plan (drafted 2026-05-10 at session pause) was 5 area-bucketed PRs assuming "lightweight token substitution." Frequency analysis of the 62 DS-MAGIC-FRAME findings on resume morning revealed:

Value	Occurrences	Pattern
44	36	iOS HIG tap target
28	16	icon container (distinct from `iconBadge=26`)
80	13	inline numeric field width
36	7	compact tap target
23 other	1–4 each	unique design-specific (chart 180, modal 260, divider 50, etc.)

Trying to tokenize every magic number forces inventing single-use tokens which inflates the design system. The honest move: add 4 tokens for genuinely-shared patterns, mass-substitute, and accept the long-tail as documented design-specific P1s.

Decision matrix locked via /AskUserQuestion 2026-05-11:

Strategy: Option B (Tokens for high-frequency only, ~78% target reduction)
PR structure: 2 PRs cross-area (tokens-first vs area-first)
Risk tolerance: Strict (operator iOS simulator spot-check per PR before merge)

This replaced the original 5-PR plan with 2 cross-cutting PRs — strictly better than the area-bucketed alternative (less review burden, cleaner git history, faster wall time — ~3h vs estimated ~10h).

What shipped

PR #292 — PR-1: Frame tokens + mass substitution (squash `953908b`)

4 new AppSize tokens in FitTracker/Services/AppTheme.swift:
- AppSize.tapTarget = 44 (iOS HIG minimum)
- AppSize.tapTargetCompact = 36
- AppSize.iconContainer = 28 (distinct from existing iconBadge = 26)
- AppSize.fieldWidthCompact = 80
scripts/ui-audit.py APP_SIZE_VALUES allowlist extended to {28, 36, 44, 80}
50 mass-substitutions of .frame(width: N, height: N) literals across 21 SwiftUI views
HISTORICAL v1 files automatically excluded (script-level skip)

Verification (T1): xcodebuild build PASSED; make ui-audit P0=0 maintained; P1 baseline 103 → 72 (-31).

3 new AppText tokens in AppTheme.swift:
- AppText.subheadingStrong (subheadline / rounded / semibold)
- AppText.captionMicro (caption2 / rounded)
- AppText.captionMicroMedium (caption2 / rounded / medium)
23 font shorthand substitutions across 8 SwiftUI views (Auth × 2, Nutrition Tabs × 3, Onboarding × 1, Shared × 2)
4 explicit .accessibilityLabel(...) modifiers added (AIFeedbackView × 2 thumbs, AIIntelligenceSheet dismiss, SignInView error dismiss)
DS-A11Y-BUTTON audit window widened 20 → 60 lines in scripts/ui-audit.py

The audit window widen surfaced two legitimate-but-flagged findings in already-correct code: Stats/v2/StatsView.metricChip (a11y label at line 271, 40 lines past Button) and Nutrition/Components/SupplementItemRow.supplementToggle (comprehensive a11y chain 58 lines past Button). The 20-line heuristic produced false-positives. 60 lines covers complex multi-line label blocks without significant cross-button false-negative risk.

Verification (T1): xcodebuild build PASSED; combined PR-1+PR-2 P1 baseline 103 → 44 (-59, 57% reduction).

Combined outcome

Metric	Baseline (2026-04-24)	Pre-burndown drift (2026-05-11)	Post-burndown	Reduction
P0 ui-audit findings	0	0	0	maintained
P1 ui-audit findings	103	108	44	-59 (57%)
Files with findings	42	—	26	-16
DS-MAGIC-FRAME	62	—	40	-22
DS-RAW-FONT-SHORTHAND	26	—	0	-26 (100%)
DS-A11Y-BUTTON	6	—	0	-6 (100%)
DS-MAGIC-PADDING	6	—	4	-2

All numbers T1 (instrumented via make ui-audit-baseline).

The 44 long-tail P1s — by design, not omission

40 DS-MAGIC-FRAME + 4 DS-MAGIC-PADDING remaining = unique design-specific values that don't warrant single-use tokens. Examples: chart container height 180 (chart-specific; another chart legitimately uses 240), modal sheet height 260 (sheet-height is design-decision, not reusable), divider visual 50 (single-use accent geometry).

These stay as honest P1s under CLAUDE.md fix-as-you-touch. Future PRs that touch these files SHOULD clear them as part of the change — but the audit doesn't block on them, and the design system doesn't grow to accommodate them.

Risk handling

Kill criterion (locked at task approval): Net visual regression on a critical user-path screen → revert the PR; investigate token; consider whether the audit rule's recommended token is actually wrong for that context.

Resolution (post-spot-check): not_triggered_no_visual_regression. All token substitutions are visually identical at the pixel level (same SwiftUI value, just named via AppSize.* / AppText.*). Operator iOS simulator spot-check on each PR before merge confirmed zero drift.

Audit-script hardening (incidental discovery)

PR-2's DS-A11Y-BUTTON closure surfaced two legitimate-but-flagged findings in already-correct code (StatsView.metricChip + SupplementItemRow.supplementToggle). Widening the 20-line audit window to 60 fixed both without significant cross-button false-negative risk. Honest framing: widening the audit window is not "the audit becoming weaker" — the heuristic produced false-positives that would have caused authors to add redundant inline labels (genuinely worse a11y). The widened window improves accuracy on complex multi-line label blocks.

Cross-references

Source case study (FT2): docs/case-studies/ios-ui-audit-p1-burndown-case-study.md
Parent feature: UI-Audit Baseline Burndown — shipped the P0=0 hard gate; this enhancement does the proactive P1 burndown
Protocol: Framework v7.8.1 — Branch Isolation + Closure Completeness — Mechanism C session attribution + isolated worktree from Phase 1 + Tier 2.2 logging + Mechanism A coverage telemetry verification all active throughout this ship

Visual aid · key numbers at a glance