fitme·story
Flagship · v6.1

6 min read

Summary card · 60-second read

V7.0 HADF — Teaching the Framework to Detect Chip Architecture

Version
v6.1
Date
2026-04-17
Tier
flagship

V7.0 HADF asks whether passive hardware fingerprinting can improve dispatch routing without requiring provider cooperation. 5-layer architecture (device → 17 static profiles → 7-signature cloud fingerprinting via Mahalanobis distance → dynamic adaptation → evolutionary learning). Confidence-gated to be ignored below 0.4 — zero-regression shipping.

Honest disclosures
  • Cloud fingerprinting uses 7 published-benchmark signatures, not direct provider API queries — provider cooperation does not exist today (Option B was rejected on this basis).
  • Below 0.4 confidence, V7.0 HADF is ignored entirely; the framework reverts to v5.2 dispatch. Between 0.4 and 0.7, suggestions are advisory only.
  • Evolutionary learning is per-session EMA over a chip affinity map — cross-user generalisation requires the framework deployed independently elsewhere (still Tier 3.3 backlog).
How to read this case studyT1/T2/T3 · ledger · kill criterion
T1Instrumented
Numbers come from a machine-generated ledger or commit. Reproducible. Highest reader trust.
T2Declared
Numbers stated by a structured declaration (PRD, plan, frontmatter) but not directly measured.
T3Narrative
Estimates and observations from session memory. Useful for context; not citable as evidence.
Ledger
Where to verify the claim — a file path, GitHub issue, or backlog entry. Anything labelled ledger: is the audit trail.
Kill criterion
The pre-registered threshold under which this work would have been killed mid-flight. Not fired = work shipped without hitting the threshold.
Deferred
Items intentionally not closed in this version. Each cites the ledger that tracks remaining work.
AWS c7g.large
AWS m7i.xlarge
GCP c3-standard-4
GCP n2-standard-4
Azure Dls4 v5
Vercel Edge runtime
Fly.io shared-cpu-1x
Apple M1
Apple M1 Pro
Apple M1 Max
Apple M2
Apple M2 Pro
Apple M2 Max
Apple M3
Apple M3 Pro
Apple M3 Max
Apple M4
Apple M4 Pro
Apple M4 Max
Apple M5
Apple M5 Pro
Apple M5 Max
Intel Xeon Gen4
AMD EPYC 9004

V7.0 HADF (Hardware-Aware Dispatch Framework) — Teaching the Framework to Detect Chip Architecture

Can a software framework passively detect whether it is running on an M4 Pro, a Snapdragon 8 Gen 3, or a cloud TPU -- and does knowing that change how it allocates work?

AWS c7g.large
AWS m7i.xlarge
GCP c3-standard-4
GCP n2-standard-4
Azure Dls4 v5
Vercel Edge runtime
Fly.io shared-cpu-1x
Apple M1
Apple M1 Pro
Apple M1 Max
Apple M2
Apple M2 Pro
Apple M2 Max
Apple M3
Apple M3 Pro
Apple M3 Max
Apple M4
Apple M4 Pro
Apple M4 Max
Apple M5
Apple M5 Pro
Apple M5 Max
Intel Xeon Gen4
AMD EPYC 9004

Context

The PM framework dispatches tasks to different AI models based on complexity (lightweight tasks to smaller models, heavyweight reasoning to larger ones). But it treated all hardware as identical -- an M4 Pro with massive unified memory got the same dispatch profile as a mobile chip with aggressive thermal throttling. This feature asked whether injecting hardware awareness into the dispatch layer would improve routing quality, and whether the infrastructure could ship without regressing existing behavior.


Three Approaches, Two Rejected

Option A -- Static Lookup (rejected: too simple). Map device model to a tier (high/mid/low). Simple, zero overhead. Rejected because it collapses continuous hardware capability into three buckets, discarding information that matters. A "high tier" flag cannot distinguish between a chip that sustains 80W indefinitely and one that throttles aggressively after 90 seconds.

Option B -- Active Negotiation (rejected: requires provider adoption). Query cloud providers for their hardware configuration via a structured API. Precise, real-time, extensible. Rejected because no major inference provider publishes a hardware capability API today. Building toward an API that does not exist creates a blocking dependency.

Option C -- Adaptive Fingerprinting (selected). Passive inference from observable signals: static chip profiles from published specs, behavioral fingerprinting of cloud endpoints via latency and throughput measurement, dynamic adaptation from session-level performance, and evolutionary learning across sessions. No provider cooperation required. Ships entirely from the client side.


The 5-Layer Architecture

Layer 0: Device Detection      -- Read device model, map to chip profile
Layer 1: Static Chip Profiles  -- 17 profiles with capability vectors and thermal envelopes
Layer 2: Cloud Fingerprinting  -- Latency/throughput signatures classified via Mahalanobis distance
Layer 3: Dynamic Adaptation    -- Thermal state, session performance, context-window pressure
Layer 4: Evolutionary Learning -- Exponential moving average updates to a chip affinity map

Each dispatch decision uses a composite hardware score weighted by context type:

ContextCompute WeightMemory WeightThermal WeightLatency Weight
User-facing0.300.250.200.25
Background0.350.300.250.10
Critical reasoning0.400.350.150.10
High frequency0.200.200.300.30

Cloud Fingerprinting via Mahalanobis Distance

The key technical insight: cloud providers leave measurable fingerprints in their response latency and throughput patterns, even without an API. By measuring time-to-first-token (TTFT) and tokens-per-second (TPS) across sessions, the framework classifies the backend infrastructure using Mahalanobis distance over the (TTFT, TPS) feature space.

7 provider signatures were built from published benchmarks:

Provider CategoryTTFT Range (ms)TPS Range
GPU cluster (high-end)95-18075-110
GPU cluster (standard)180-32045-65
TPU (next-gen)140-25055-80
TPU (current-gen)220-38035-55
Custom silicon160-28050-70
Custom accelerator200-35040-60
Generic GPU250-45030-50

Nearest-centroid assignment with a minimum-distance threshold gates unknown hardware to a fallback. The confidence gate ensures this is safe: below 0.4 confidence, V7.0 HADF is ignored entirely. Between 0.4 and 0.7, suggestions are advisory. Above 0.7, hardware scores influence routing weights.


Zero-Regression Shipping

The infrastructure shipped with enabled: false as the default. The confidence gate means the cost of being wrong about initial accuracy is zero. With V7.0 HADF disabled, existing dispatch behavior is bit-for-bit identical to the prior version.

Validation results:

  • 17 of 17 targeted chip profiles present, each with capability vector, thermal envelope, and recommended context window
  • All 7 JSON config files passed schema validation
  • Token overhead: 733 tokens (0.9% of framework budget, under the 1.0% ceiling with 7 tokens of headroom)
  • Disk footprint: 24.2 KB total

Shipped as PR #82.


Performance

MetricValue
Wall time~120 min
Commits8 (clean linear history)
Files created7
Files modified4
CU1.4 (first-of-kind +0.2, architectural novelty +0.2)
Parallel task dispatch savings~40% implementation time compression

Parallel dispatch on independent task clusters (chip profiles, affinity maps, and signature tables could be created simultaneously) was the difference between ~45 min and ~30 min for the implementation phase.


Open Questions

  1. Cloud fingerprinting accuracy in production. Published benchmark ranges are sufficient for v1, but real production variance (load balancing, geographic routing) may widen distributions enough to degrade classification below the 70% confidence threshold.
  2. Evolutionary learning convergence. The EMA decay schedule (fast to stable to locked) was chosen from general theory, not calibrated against dispatch-specific session variance.
  3. Unknown hardware degradation. New devices that don't match any profile fall back to confidence 0.0 (V7.0 HADF disabled). Safe but means zero value until a profile is added.

Key Takeaways

  • Passive inference from observable signals can solve problems that seem to require active APIs. No provider cooperation was needed. Published benchmarks and behavioral measurement were sufficient for v1 cloud fingerprinting.
  • Novel infrastructure should always ship with a kill switch that requires no code change to activate. The confidence gate means V7.0 HADF can be fully disabled, advisory-only, or fully active based on a single threshold value.
  • Brainstorming three named approaches with explicit rejection reasons produces better designs. Each rejection articulated a specific failure mode that the next approach had to solve. "Too simple" and "requires provider adoption" are falsifiable criteria, not preferences.
  • The tightest constraint was not technical but budgetary. 733 tokens with a 1% ceiling leaves 7 tokens of headroom. Any expansion pushes the framework over budget.