I Sprinkled data-* Everywhere for AI Agents. They Were Reading Something Else.

I started tagging every button and card with data-* so Claude and Playwright agents could navigate my UI. Turns out they read the accessibility tree — and semantic HTML was the right answer the whole time.

By
Hand-drawn illustration of a developer and an AI agent sitting side by side, reading the same HTML button through its semantic roles and labels

A few months ago I started doing something that felt smart at the time. Every interactive thing in the products I was shipping got a data-* attribute. data-action="share" on buttons. data-target="cart-item" on cards. data-agent-hook="primary-cta" on anything important. The reasoning seemed clean: agents were going to drive these UIs — Claude in the browser, Playwright MCP, internal tools — so I should give them hooks to grab onto.

Then I actually watched one work.

Claude for Chrome clicked the right button on the first try, on a page where I'd added exactly zero of those attributes. Playwright MCP did the same. Neither of them cared about the hooks I'd been scattering. They were reading something else entirely — the same structure a screen reader sees. Roles, names, labels. The accessibility tree.

That sent me back through my own codebases with a sharper question: do data-* attributes still earn their keep? And if so, for what exactly? Here's what I landed on.

What data-* Was Always Good At (And Where It Went Wrong)

The spec is clear: data attributes exist to store custom data private to the page or application. That's it. Not a styling hook. Not a state machine. Not a sneaky place to stash props you didn't want to prop-drill.

When I used them well, it looked like this:

html
1<article data-post-id="abc123">
2 <button data-action="share">Share</button>
3</article>

Domain-level identifiers. Meaningful. Traceable from HTML → JS → analytics.

When I used them badly, it looked like this:

html
1<div data-is-button="true" data-clickable="true" data-variant="primary"
2 onClick={handleSave}>
3 Save
4</div>

That's not custom data. That's a <button> in disguise. Overuse is the documented anti-pattern — MDN and the broader community have been saying for years: if you end up with ten-plus data attributes on one element, the structure is wrong. Same for stuffing them in "just in case."

The other hard-won rule: never anything sensitive. They ship to the client. They're in View Source. That data-user-role="admin" you thought was harmless is a public API.

What Agents Actually Read

So if not my carefully placed data-* attributes — what were they reading?

Claude for Chrome and Playwright MCP don't parse the DOM the way a devtools inspector does. They read the accessibility tree — the same structure a screen reader sees. Roles, accessible names, states, labels. Playwright MCP calls it a "snapshot"; Claude's read_page filters it for interactive elements by role.

That means the element most discoverable to an agent isn't the one decorated with the most data-* attributes. It's the one that's already accessible:

html
1<!-- The agent reads this cleanly: role=button, name="Save draft" -->
2<button type="submit" aria-label="Save draft">
3 <SaveIcon />
4</button>
5
6<!-- The agent sees a generic "group" with no accessible name -->
7<div data-action="save" data-variant="primary" onClick={handleSave}>
8 <SaveIcon />
9</div>

The irony is a little delicious: the move that made your app friendlier to blind users in 2015 is the same move that makes it friendlier to AI agents in 2026. Kent C. Dodds argued for years that data-testid should be an escape hatch — only when you can't select by role or text. That advice aged quietly into gospel, and now browser agents are enforcing it.

The Rules I Follow Now

A short hierarchy, in order:

  1. Semantic HTML first. <button>, <nav>, <article>, <dialog>, <main>. Real elements come with real roles, for free.
  2. ARIA when semantics don't reach. aria-label, aria-describedby, aria-current, aria-expanded. Agents treat these as first-class hints, not decorative polish.
  3. data-* for genuine domain data. data-post-id, data-order-id, data-session-ref — identifiers the app logic actually needs that don't belong in visible text.
  4. data-testid as a last resort. If nothing above can uniquely target an element, add it — and only then. Playwright even lets you configure the attribute name so you standardize on one.
  5. Never for styling, never for secrets, never for "we might need it later."

If you catch yourself reaching for a data-* attribute, try asking: could an aria-* attribute or a semantic element carry this same meaning? Nine times out of ten, yes — and you'll get screen reader support and agent discoverability as a bonus.

A Reflection

I used to think the AI era would push us toward new frontend primitives — "agent hooks," "intent attributes," some new thing we'd have to learn. It hasn't. The tools we already had — semantic HTML, thoughtful ARIA, good accessibility hygiene — are what the agents pick up on. The future of data attributes isn't more of them. It's fewer, more meaningful, and only when nothing else fits.

In The Future of Frontend Is Quietly Changing, I wrote that UI is becoming an expression of intent. Data attributes fit that frame — but only when they describe intent that roles and labels can't already carry. Most of the time, they can.

The HTML that serves a blind user well in 2015 is the HTML that serves an AI agent well in 2026. That's not a coincidence — it's the same problem solved twice.