Amit Kothari
Amit Kothari CEO of Tallyfy · Workflow AI Expert

How we built a B2B SaaS site for agentic browsing

In brief

Chrome put an Agentic Browsing category into Lighthouse in 2026. We registered for the WebMCP origin trial, shipped four in-browser tools an AI agent can call on tallyfy.com, and drove mobile layout shift to zero. Here are the real Lighthouse numbers, the parts that scored well, and the ones that still do not.

Summary

  • Chrome’s Lighthouse now scores an Agentic Browsing category - it checks registered WebMCP tools, schema validity, an llms.txt file, the accessibility tree, and layout stability. We built tallyfy.com to pass it and measured an average agentic-browsing score of 90 across five key pages.
  • The WebMCP origin trial is live on production - four in-browser tools (search_templates, explain_template, schedule_demo, explain_pricing) register on Chrome 149, and explain_pricing returned a correct answer when we drove it from a script on June 15, 2026. The trial token runs through November 16, 2026.
  • Mobile layout shift came in at zero - Cumulative Layout Shift measured 0.00 on every page we tested, though mobile Performance sat at 64, which is the number we are least proud of.
  • Discovery and functionality are two different jobs - Google says you do not need AI-specific files to get found; Chrome scores whether an agent can finish a task once it arrives. Book a 30-minute walkthrough

Over May and June 2026 we rebuilt how tallyfy.com behaves when an AI agent shows up instead of a person. As far as we can tell, it is among the first B2B SaaS marketing sites to register for Chrome’s WebMCP origin trial and ship tools an agent can actually call.

We have not found another one.

That is not a trophy. It is just where the bar happens to sit right now, and the bar is low because almost nobody has noticed it moved.

Here is the short version, because you should get the result before the story. The origin-trial token is live on production. Four tools register in the browser. One of them answered a pricing question correctly when we drove it from a headless Chrome 149 on June 15, 2026. Mobile layout shift is zero on the pages we measured. And our mobile Performance score is mediocre, which I will get to, because pretending otherwise would make the rest of this worthless.

This is part of the bigger shift in AI and work we keep writing about. An agent that can browse your site is a new kind of visitor with new expectations, and almost nobody has met them yet.

A website now has two jobs that used to be one. The first is getting found, the old SEO problem: can Google, an AI Overview, or a chat assistant discover the page in the first place. The second is finishing the task once a visitor lands, and that visitor is increasingly a script, not a human with a mouse. Those are separate problems with separate scorecards, and most teams only think about the first one.

Google has been blunt about the dividing line. Its official guide to generative AI features says plainly: “You don’t need to create new machine readable files, AI text files, markup, or Markdown to appear in generative AI search.” John Mueller has said the same about llms.txt, that it is not a search ranking signal. Discovery still runs on the boring fundamentals: crawlable pages, a clean sitemap, useful content, fast loads.

You can see that split in our own scores. The SEO number Lighthouse gives our pages sits at 98, and not because we did anything clever for AI. It is high because we never stopped doing the dull discovery work, titles that match what people actually search, a sitemap that stays accurate, pages that parse cleanly. That surface has not changed. What changed is that a second scorecard showed up next to it, grading something the first one never looked at.

So why did we bother with WebMCP and an accessibility audit at all? Because that second surface is real, and it has its own referee.

A website splits into a discovery surface (crawl, sitemap, schema for getting found) and a functionality surface (WebMCP tools, accessibility tree, zero layout shift for an agent to finish a task)

We treated these as two audits with two tools. Discovery: Google Search Console, the same checks we have run for years. Functionality: Chrome’s new Lighthouse category, which did not exist eighteen months ago. Keeping them apart stopped us from doing the thing Google warns against, bolting on “AI files” that do nothing for ranking and calling it a strategy.

What does Chrome’s agentic browsing audit check?

Chrome added an “Agentic browsing” category to Lighthouse, sitting right next to Performance, Accessibility, Best Practices, and SEO. You can read the list of audits on the Chrome docs. They split into a few groups, and each one maps to a concrete thing a script needs in order to use your page.

There are checks for “Registered WebMCP tools” and “WebMCP schema validity”, which ask whether your page exposes callable tools and whether their schemas parse. There is a discoverability check that looks for an llms.txt file. There is an “Accessibility for agents” group, because the accessibility tree is the data model a non-visual client reads instead of pixels. And there is “Layout stability”, which is Cumulative Layout Shift wearing a different hat: if your page jumps around while it loads, an agent clicking by coordinates clicks the wrong thing.

That last point reframes a familiar metric. We used to chase low CLS so a human did not fat-finger the wrong button on a phone. Now a low CLS is also what keeps an automated client from acting on a layout that has already moved. Same number, two reasons to care.

The two WebMCP audits are the ones almost no marketing site can pass today, because they make you ship something. Writing copy will not do it. “Registered WebMCP tools” wants at least one callable tool present when the page loads. “WebMCP schema validity” wants each tool to declare an input schema that actually parses, so a client knows what arguments to pass. A site with zero tools scores zero on both and there is no content trick that fixes it. You either built the tools or you did not. That is the unusual thing about this category: most of it cannot be faked with words, which is probably why so few sites have touched it.

Notice what is not here. No “write for AI” tone, no keyword density, no magic file that lifts your ranking. The category is about whether the page works for a machine that arrived with a job to do.

What we actually shipped

The headline piece is the WebMCP origin trial. WebMCP lets a page register tools that an AI agent in the browser can call, the same idea as a Model Context Protocol server but running client-side, with no auth, for anonymous visitors. We registered four read-only tools: search_templates and explain_template (which read our public workflow templates), schedule_demo (which hands back our booking link), and explain_pricing (which describes our per-seat model without quoting a number, so it can never go stale).

One implementation detail matters if you copy this, because the API moved under our feet. The tools register through document.modelContext.registerTool(), with each tool carrying an execute callback. But current stable Chrome 149 still exposes the older navigator.modelContext namespace, and Chrome 150 beta exposes both. So our registration code tries document.modelContext first and falls back to navigator.modelContext. Skip that fallback and a Chrome 149 reviewer opening the console sees nothing register, which is the worst possible first impression for the one feature you are pitching. We verified it the annoying way: a headless probe against production, in real Chrome 149, that confirmed all four tools register and watched explain_pricing return a correct answer.

Here is what that buys an agent. Picture a shopping assistant landing on our pricing page with a user question like “does this plan include single sign-on”. Instead of scraping the page and guessing, it sees a registered tool named explain_pricing, calls it, and gets back a plain sentence: full members create and run workflows, light members only complete assigned tasks, SSO and migration help are included, check the page for current rates. No dollar figure to misread, no stale number cached in a model’s memory. The tool answers, the pricing page stays the one source of truth, and the assistant moves on without inventing anything.

Solution Workflow & Process
Workflow Automation Software

Workflow Automation Software Made Easy & Simple

Save Time On Workflows
Track & Delegate Tasks
Consistency
Explore this solution

The rest is less flashy and matters just as much. We treated the accessibility tree as the site’s real information architecture instead of a compliance afterthought. An icon-only button that used to read as “button” to a screen reader, and to an agent, now reads as “search”. A pricing card announces the plan it represents instead of a blank role. That is the whole trick: the tree an assistive client reads is the same tree an automated client reads, so fixing it for a blind user fixes it for a script at the same time. We lean on native HTML elements like details and dialog instead of div-and-JavaScript reconstructions, so the roles arrive for free and we are not hand-maintaining ARIA we will forget to update.

We also deployed schema types most marketing sites skip: Speakable, Service across thirty-odd pages, knowsAbout, isPartOf, and a sitewide SearchAction. Google says schema feeds rich results rather than AI ranking, and we believe them, so we did not pretend it was a growth hack. We shipped it because it is cheap structured truth about the pages, and structured truth is what a machine reader wants whether it credits us for it or not.

There is a second layer an agent can read before it even renders a page. We publish a /.well-known/mcp.json file that advertises both surfaces: the anonymous in-browser tools, and an authenticated Tallyfy MCP server at mcp.tallyfy.com running OAuth 2.1 with dynamic client registration and a dozen scoped permissions. An agent that wants to read a public template uses the in-browser tools with no login. An agent acting on someone’s real account goes through the authenticated server, with consent and a scope attached. Same company, two front doors, each sized to the trust level of what is being asked. That file is not part of the WebMCP spec yet, it is an emerging convention, but it costs nothing and it tells a capable client exactly where to knock.

And we kept publishing, because content is still how you get found in the first place. Twenty-five migration guides, eighteen vendor reviews written without a sales voice, eight how-tos on getting an MCP server listed in places like the Anthropic and OpenAI directories, and a run of industry AI playbooks, which carried the blog past 575 posts.

None of that is glamorous. It is the workflow-infrastructure-for-AI bet made concrete: an agent that can call a tool still needs a defined process telling it which tool, in what order, with a person in the loop where it counts. That part is the product. The site is just the first place we are eating our own cooking.

The numbers, including the ones that sting

I ran Lighthouse 13.3.0 on the mobile profile, in real Chrome 149, against production tallyfy.com on June 15, 2026. No staging, no cherry-picked run. Here is what came back across five pages, scores out of 100, CLS as the raw value:

PagePerfAccessBest PrSEOAgenticCLS
Homepage6397771001000.00
Solution page5695771001000.00
Pricing668277100750.00
Blog index659177100750.00
Templates699577921000.00
Average64927798900.00

The good news is real. Cumulative Layout Shift was a flat zero on all five pages, so the layout-stability audit is satisfied and nothing jumps while the page settles. Agentic Browsing averaged 90, with three pages at a clean 100 and two held back to 75. Accessibility averaged 92 and SEO 98. For the surface most sites have not even started on, that is a solid place to stand.

Now the part that stings.

Mobile Performance averaged 64, and Largest Contentful Paint ran between roughly 5.8 and 8.4 seconds on the throttled mobile test, which is slow. Best Practices sat at 77 on every page, a flat line that says there is one systemic issue we have not chased down yet. We did not capture a clean Lighthouse baseline before the program, so I am not going to invent a tidy before-and-after delta. I would rather show you the current numbers, warts on, than dress up a comparison I cannot back.

Why publish the weak scores? Because a chain of steps that quietly fails is exactly the kind of thing you only catch by measuring it straight, and a case study that hides its bad numbers teaches nobody anything.

The playbook, and what it cost

If you run a B2B SaaS marketing site and want to do this, the order that worked for us is boring and cheap. Read Google’s AI optimization guide first, so you stop wasting time on “AI files” that do nothing for discovery. Audit your accessibility tree as if it were the IA, because for an agent it is. Register for the WebMCP origin trial and ship two or three real read-only tools rather than a fake checkout. Add the schema types you have been skipping. Drive mobile CLS toward zero. Then run the Lighthouse Agentic Browsing category and fix what it flags.

The thing I would skip if I did it again is the part we did first. Early on we built a dedicated /for-ai/ page, a tidy machine-readable summary aimed at crawlers and assistants, on the theory that AI systems would want a special door. Then Google’s guide landed and said the quiet part out loud: you do not need AI-targeted pages, and a page that exists only for machines can read as scaled-content abuse. So we deleted it and pointed the URL at our normal about page. The lesson stuck. Discovery is won with pages that serve humans well; the agent surface is about the page working once someone, or something, arrives. We had briefly mixed the two, which is exactly the mistake the guide warns against.

What did it cost? Mostly engineering attention rather than money. The origin trial is free. The schema and accessibility work is measured in hours and needs no new license. The tool registration is about a hundred and fifty lines of code, and the ugliest cost was time spent on the namespace mismatch between Chrome 149 and 150, which no doc warned us about. The content programs were the real spend, and those would have happened anyway. Add it up and the agentic-browsing-specific work was a handful of engineering days, most of it on the accessibility tree, which we should have tightened years ago regardless.

Would I tell every team to do this tomorrow? No. If your buyers are not anywhere near agentic browsing yet, the discovery fundamentals earn more per hour. This was a fit for us because “give AI a process to follow” is literally what we sell, so being early on the surface where agents meet websites is on message, not a vanity sprint.

What is next is the part I am least sure about, which feels right for something this new. We will watch whether agents actually call the tools, fix that Performance score, and chase the Best Practices line down to its root cause. If agentic browsing turns out to be a fad, we are out a few engineering days and a better accessibility tree, which is not a bad consolation prize. If it does not, we already live there.

Want to see how a defined process keeps an AI step inside its lane, on a real workflow rather than a slide? Book a 30-minute walkthrough and we will show you the same setup running live.

About the author

Amit is the CEO of Tallyfy. He has 25+ years of practical experience in technology, entrepreneurship, and operational efficiency. He's been hands-on with AI-first engineering and changing Tallyfy to AI-native workflow automation since Claude Code was first released. He's also an Entrepreneur in Residence at WashU's Skandalaris Center, created the OneDay (Woolf) AI curriculum for their accredited MBA and consults with clients who need help with AI via Blue Sheen. He graduated with a Computer Science degree from the University of Bath. He's originally British and lives in St. Louis, MO.

Find Amit on his website , LinkedIn , or GitHub . Read Amit's bio →

Automate your workflows with Tallyfy

Stop chasing status updates. Give people and AI a process to follow.