NIST asked how to secure AI agents - do not wait for the answer

Summary

NIST wants to know how anyone secures an AI agent - the Center for AI Standards and Innovation published a Request for Information in the Federal Register on January 8, 2026, and took public comment through March 9. The submitter who flagged it on Hacker News counted 43 questions.
Three risk classes anchor the document - models fed adversarial data such as indirect prompt injection, models carrying intentionally placed backdoors, and uncompromised models that pursue misaligned objectives anyway.
What happens now that the window is closed? Responses feed voluntary guidelines, and NIST launched an AI Agent Standards Initiative on February 17, 2026. Voluntary NIST guidance tends to become the enterprise security questionnaire your buyers send you.
Every defense NIST gestures at is process design - scoped context, bounded tools, monitored runs. See how Tallyfy builds those gates

Solution Compliance & Finance

Compliance Management Software

Compliance Management Made Easy

Save Time On Compliance

Track & Delegate

Audit trails

Explore this solution

NIST asked the software industry a question this winter that most AI vendors can’t answer cleanly: how do you secure software that decides for itself what to do next?

The asking was formal. On January 8, 2026, the Center for AI Standards and Innovation - CAISI, the Commerce Department office that now owns this beat at NIST - published a Request for Information on security considerations for AI agents in the Federal Register. It defined its subject as systems “capable of planning and taking autonomous actions that impact real-world systems or environments,” built from “at least one generative AI model and scaffolding software that equips the model with tools.” It flagged, in its own dry words, that such systems “can be deployed with little to no human oversight.” Then it asked for help: dozens of pointed questions about threats and mitigations - the Hacker News submitter who surfaced the RFI counted 43 of them. Comments closed March 9, 2026, on Regulations.gov docket NIST-2025-0035.

So the window is shut, and you might think the story is over until the guidance ships. Wrong way around. The questions themselves are the useful part, because they sketch the security review your enterprise buyers will be running on you in a year or two, and the defenses the document keeps gesturing at - constrained environments and bounded, monitored tools - are process design rather than exotic security tech. This sits squarely inside the security questions that follow AI into operations: once an agent can change real records, somebody official eventually asks who let it.

You don’t need to wait for the framework to act on the questions.

What NIST actually asked

Strip the docket formatting and the RFI is refreshingly blunt about what scares it. Not chatbots. The document explicitly excludes “AI chatbots or retrieval-augmented generation systems that are not orchestrated to act autonomously” and scopes itself to agents whose actions cause “persistent changes outside of the AI agent system itself.” Translation: if your AI only talks, this docket isn’t about you. The moment it acts, it is.

Within that scope, the document names three classes of novel risk. First, models interacting with adversarial data - it cites indirect prompt injection and data poisoning by name, the attack where hostile instructions arrive disguised as ordinary content. Second, models with “intentionally placed backdoors,” meaning the model itself shipped compromised. Third, and most interesting to me, the risk that uncompromised models “may nonetheless pose a threat to confidentiality, availability, or integrity,” for example by exhibiting “specification gaming” or pursuing “misaligned objectives.”

Read that third one twice. A model nobody attacked and nobody tampered with still makes the risk list, just for being an optimizer aimed at a goal you specified imperfectly. The RFI also notes that early mitigations borrow from familiar territory: “the principle of least privilege” and “zero trust architecture,” alongside newer ideas like instruction hierarchy. Research by CAISI’s own technical staff, the document adds, has demonstrated risks of agent hijacking.

The thread that surfaced the RFI read it through a practitioner lens. The submitter, ascarola, highlighted the question about agent registration and tracking as “analogous to drone registration” - an idea that tells you where regulators’ heads are at.

Commenters pushed on measurement. One, posting as digitr33, described running AI models against deliberately vulnerable targets and watching every model break into almost every OWASP Top 10 challenge in their lab, with wildly different efficiency: “One model solved a JWT forgery in 16 seconds and 5K tokens. Another took 170 seconds and 210K tokens.” The same commenter noted the inverse surprise - a lab a junior pentester would have caught in ten minutes stumped the best models. Their conclusion is the line worth keeping: “If we’re serious about measuring agent risk, we need to stop theorizing about what they can do and start actually benchmarking it.”

Why does the closed window still matter?

Because of what the responses become. NIST says the input “will inform future work on voluntary guidelines and best practices related to AI agent security.” If “voluntary” makes you relax, look at the track record. NIST’s cybersecurity framework started voluntary too, and it now shows up, in mutated form, in procurement checklists, insurance underwriting, and contract boilerplate across industries that never read the original. Guidance like this doesn’t need the force of law to reach you. It arrives by way of your largest customer’s security team.

The momentum is visible already. On February 17, 2026, NIST announced an AI Agent Standards Initiative - a standing program rather than a one-off consultation - aimed at making sure agents can “function securely on behalf of [their] users” and interoperate across the digital ecosystem. It named three pillars: industry-led standards development, community-led open source protocol work, and research into agent security and identity. Concept papers were due April 2. Listening sessions on sector-specific barriers were scheduled to begin in April. That’s a pipeline, and pipelines produce documents, and documents produce questionnaires.

What nobody warned us about, going from demo to production on the agent-facing side, is how fast those questionnaires arrive once real customer data sits anywhere near the system. The first serious security review doesn’t wait for a regulation. It waits for your deal to get big enough to route through procurement, which tends to arrive in months rather than years.

Turns out the benchmarking gap digitr33 described cuts the same direction. If model behavior is this variable - 16 seconds on one run, 210,000 tokens of flailing on another - then “is the model safe” is a question without a stable answer, and reviewers know it. So the questions migrate to the things that hold still: what can this system reach, what stops it, who checked. Controls, not capabilities.

Which is precisely the territory NIST’s fourth section stakes out. Its questions ask how “the access to or extent of an AI agent system’s deployment environment” could be constrained, and about “undoes, rollbacks, or negations for unwanted actions or trajectories” - rollback for a sequence of agent actions, asked about as a maturity question. Anyone who has designed a decent business process has answers to both sitting in a drawer.

Another commenter in the thread, 7777777phil, pointed at a gap the formal documents tiptoe around: the RFI landed “right as the agent stack is splitting into layers with completely different threat models,” where a model-layer vulnerability looks nothing like a tool-use one, and asked who owns the audit trail when an agent chain spans six vendors. That question has no good answer in an architecture diagram. It has an obvious one in a process: the workflow that the chain serves owns the trail, because every step lands in it regardless of which vendor’s component did the work.

Map the three risks to workflow defenses

Here’s the part the RFI doesn’t say out loud: each of its three risk classes has a structural answer, and the structure is a defined workflow. Not a smarter model. A narrower job.

NIST agent risk classes mapped to workflow defenses: scoped context, gated tools, bounded runs with escalation

Take adversarial inputs first. Indirect prompt injection works because the agent reads broadly and treats whatever it finds as instruction-adjacent. An agent executing one workflow step doesn’t read broadly. It reads what the step hands it, and nothing else. The poisoned page sitting elsewhere in the company wiki never enters its context, because the step never passes it over. The inbound version of this threat, agents reaching for your tools with hostile prompts behind them, is its own story; the defense is the same shape pointed the other way. Scope what comes in, and most injection paths just never connect.

Backdoored models get the same treatment from the tool side. NIST’s worry is a model that behaves until it doesn’t. You can’t audit your way to certainty about model weights, so assume the worst case and bound it: at any step, the agent holds only that step’s tools, and anything consequential - a payment, say, or a record change - waits behind an approval step a human actually completes. A compromised model inside a scoped step is a contained problem. The same logic underpins per-tool authorization on MCP servers, where authenticating at the door was never the hard part.

Misaligned objectives are the subtle one, and the place where process thinking earns the most. An agent given a goal and left to decompose it on its own will optimize something, and you find out what after the fact. A workflow inverts the relationship: the process owns the goal, broken into named steps with defined outputs, and the model supplies judgment inside one step at a time. Drift has nowhere to accumulate. When a step’s output misses its definition, an automation rule routes it to a person - the escalation path NIST asks about, expressed as an if-this-then-that rule any operations manager can read. The broader case for binding agents to workflows instead of letting them roam stands on its own; the NIST lens just gives it a federal citation format.

Run all three defenses through one real workflow and the abstraction disappears. Say an agent helps with new-vendor setup inside your procurement process. Its step hands it the vendor’s tax form, the banking details, and the requester’s justification - that’s the whole context, so a hostile instruction buried in a vendor email or a shared doc never reaches it. Its tools are “check the documents for gaps” and “draft the vendor record,” so even a model that woke up compromised today couldn’t approve the vendor it just drafted, and the banking-detail change that fraudsters love stays behind a human gate. The goal it serves isn’t a prompt that says “onboard vendors efficiently” - it’s a step definition with a named output, and any output that misses the definition lands on the procurement lead’s desk through an escalation rule. Three NIST risk classes, one boring process diagram. Nothing about the model changed.

A fair objection: this assumes the workflow itself is well designed. It does. That’s the point. Securing an agent forces you to define the process around it, and a defined process was worth having before AI showed up.

What should an operations leader do now?

Not commission a threat model with a six-figure consulting line. The cheaper move is to borrow NIST’s own questions and answer them for every place AI touches your operation. Three translate directly.

First: what each AI system can actually reach. Not in theory - in configuration. List every workflow where a model reads or writes anything, and for each one, the data it sees and the actions it can fire. If the answer is “whatever the integration allows,” that’s the finding. An agent connected through a Model Context Protocol server scoped to workflow steps gives you this answer for free, because reach is defined per step, with 100+ tools mediated through one auditable surface rather than a pile of one-off connections.

Second: who approves the irreversible actions? NIST asked about rollbacks for a reason. Some actions don’t roll back: money moves, or a customer reads an email that already went out. Each of those needs a named person at a gate in front of it. If your answer is “the prompt tells the agent to be careful,” you’ve discovered the gap a reviewer will find later, for what it’s worth.

Third: where you’d look afterward. When something odd happens, monitoring is the difference between an incident report and a shrug. Every agent action should land in a tracked process run - who triggered it, what it did, which step, what happened next. That record accumulates on its own when the work runs inside a workflow. Bolted on afterward, it’s a project that never quite ships.

One thing that surprised us about running an agent-facing server in production is how little of the security work turned out to be model work. The hard questions were process questions - the same three above. Which model sits on the other side of the connection changes with whatever client shows up. The process answers don’t move.

Could your team answer all three today?

If yes, you’ve already written your half of the future framework. If no, the gap is process design, which is messy but fixable in weeks - and far cheaper to close before a buyer asks than after.

Where this is heading

The Standards Initiative’s three pillars say where NIST thinks this lands: formal standards, shared protocols, and identity research, on a multi-year clock. Somewhere in that pipeline, today’s RFI responses get distilled into guidance with section numbers, and the section numbers get pasted into vendor review templates by people who will never read the docket.

If you want one thing to watch, watch the listening sessions. The February announcement said CAISI would hold them on sector-specific barriers to AI adoption beginning in April - and which sectors show up loudest is a decent early signal for whose procurement templates change first.

Meanwhile a commenter in that February thread, ildar, framed the gap better than the formal documents did: registration-style oversight “tells you an agent exists, not what it’s doing,” and the deeper mismatch is that we keep “treating agents like software to be certified, but agents are more like employees to be supervised.” That’s the operational insight underneath the whole exercise. You don’t certify an employee once and walk away. You scope their job and review their work, with approvals where it counts. Fair enough - and that supervision structure is exactly what everyday workflow automation already builds: a scoped job with named approvals and a reviewable record.

An agent is only as safe as the structure it runs inside. NIST is assembling the official version of that sentence, with footnotes, on a federal timeline.

The framework is still being written.

The questions it will ask you are already public.

NIST asked how to secure AI agents - do not wait for the answer

NIST asked how to secure AI agents - do not wait for the answer

Summary

What NIST actually asked

Why does the closed window still matter?

Map the three risks to workflow defenses

What should an operations leader do now?

Where this is heading

About the author

Automate your workflows with Tallyfy