Measurement

Measuring Agent Ecosystem Health

The four-layer Agent Ecosystem Health Dashboard — what to measure, how to instrument it, and where to start when you have nothing yet.

Why traditional DevRel metrics are blind

The standard DevRel dashboard tracks human participation: active community members, forum posts per month, event attendance, ambassador program NPS, documentation traffic, support ticket volume. These metrics measure the human community layer accurately.

They tell you nothing about the agent ecosystem layer.

A platform with declining forum activity may have a thriving agent ecosystem — developers successfully using AI tools to integrate don't need to post questions. A platform with excellent community metrics may have a catastrophically degraded agent ecosystem — developers using AI tools encountering systematic failures, then quietly abandoning the platform rather than debugging AI-generated code.

The two layers can diverge. In practice, they do diverge. And the divergence is currently invisible to most DevRel programs.

FAISR — the primary metric

First-Attempt Integration Success Rate measures what percentage of integrations generated by AI coding tools — using the platform's published materials as context — produce working code on the first attempt without manual correction.

Three instrumentation paths, one combined metric. Start with the first and layer the rest in.

Approach 1: Controlled test suite (most direct)

Build a suite of 10–15 canonical integration tasks covering the most common integration paths. For each task, write a standard "median" prompt — not the cleverest prompt, the one a reasonably competent developer would actually type. Run each prompt through each AI tool in scope. Evaluate output against a rubric:

Does it compile? (automated)
Does it use current API endpoints? (automated — parse and diff against API spec)
Does it authenticate correctly? (automated — run against test API instance)
Does it handle the primary error state? (human review, scored 0–2)
Is it architecturally sound, or over-engineered? (human review, scored 0–3)

FAISR = percentage of outputs passing binary checks and scoring above threshold on scored checks, on the first attempt. Track as a time series. Run weekly.

Approach 2: API log analysis (continuous signal)

AI-generated code has characteristic error patterns in API logs that distinguish it statistically from human-authored code:

Burst errors at session start (failed auth on first call, no successful calls before errors)
Deprecated endpoint hits in known AI-training patterns
Burst-429 errors at regular intervals (fixed retry interval generated by AI rather than exponential backoff)

Add three queries to your weekly log review: deprecated endpoint hit rate, burst-401/403 rate at session start, burst-429 rate at regular intervals. Track as time series alongside controlled test results.

Approach 3: Developer survey (self-reported baseline)

Add two questions to your existing survey:

"When you use AI coding tools to integrate with this platform, approximately what percentage of the time does the generated code work without significant manual correction?"
"When AI-generated code for this platform doesn't work, what is the most common reason?" (Options: deprecated endpoints, incorrect auth, missing error handling, over-engineered output, unclear documentation, other)

The four measurement layers

Layer 01

Integration quality signals

FAISR, measured three ways: controlled test suite (weekly), API log analysis (continuous), developer survey (quarterly). This is the layer closest to the developer's actual experience.

Layer 02

Documentation effectiveness for agents

Semantic coverage: % of common agent queries answered directly and unambiguously.
Drift rate: lag between API changes and doc updates, in days. Target < 7.
Failure correlation: which pages, when retrieved as context, produce the most errors.

Layer 03

Recipe ecosystem health

Coverage: % of common tasks with a validated recipe.
Freshness: days since last update, vs. API change dates.
Adoption: downloads, links followed, community sharing.
MCP tier: not accessible / agent-accessible / agent-native.

Layer 04

Competitive positioning

Competitive FAISR benchmark: your score vs. competitors on equivalent tasks, quarterly.
Developer sentiment: "How does this platform's AI tool compatibility compare?" Five-point, quarterly.

Starting sequence — no existing instrumentation

The temptation is to wait until full instrumentation is in place. Don't. The manual controlled test suite from Week 1 already produces more signal about agent ecosystem health than anything in the traditional DevRel dashboard.

Next: For DevRel Teams · For Companies