Skip to content

OpenAI agent capabilities

Using OpenAI agent capabilities with Tallyfy

OpenAI offers agent capabilities through three main tools: the Responses API, Agents SDK, and Computer Use (CUA) model. These connect with Tallyfy to automate web interactions, document processing, and data extraction.

The CUA model now runs on o3 reasoning instead of the original GPT-4o. It scored 38.1% on OSWorld and 58.1% on WebArena - modest results that highlight why you shouldn’t rely on agents for anything complex yet. These capabilities are available through “ChatGPT agent” mode.

Important guidance for AI agent tasks

Your step-by-step instructions for the AI agent go into the Tallyfy task description. Start with short, easy tasks that are mundane and tedious. Don’t ask an AI agent to handle large, decision-heavy jobs - they’re prone to unpredictable behavior, hallucination, and costs can spiral fast.

How OpenAI agents work with Tallyfy

Tallyfy triggers tasks, provides structured inputs through form fields, and captures outputs. Webhooks or API calls connect Tallyfy processes with OpenAI’s agent tools.

Diagram

What to notice:

  • Multiple tools - agents pick between computer use, web search, and file search based on what the task needs
  • Structured data flow - Tallyfy sends inputs via form fields and captures agent outputs for downstream tasks
  • Error handling - failed tasks route to human review through Tallyfy’s conditional logic

OpenAI agent capabilities

Responses API - combines chat with multi-tool support, so agents can use web search, file search, and computer use in a single API call.

Agents SDK - open-source toolkit for single-agent and multi-agent workflows. Works with OpenAI models and competitors like Anthropic and Google.

Computer Use (CUA) - browser automation using o3 reasoning. Takes screenshots, finds UI elements, and simulates mouse/keyboard actions. Available through ChatGPT agent mode.

Web search - uses GPT-4o search with cited sources.

File search - retrieves documents from large sets with metadata filtering.

Connecting with Tallyfy workflows

Trackable execution - every agent action runs inside a Tallyfy process with a full audit trail.

Error recovery - when agents hit issues, Tallyfy’s conditional logic routes tasks to humans.

Start small - begin with simple tasks like form filling or data extraction. Watch the results. Expand gradually.

MCP support - with Model Context Protocol servers, OpenAI agents can interact with Tallyfy using natural language commands like “complete the next task in customer onboarding.”

CUA model details

Visual perception - captures screenshots and identifies buttons, text fields, and links.

Action execution - simulates mouse clicks and keyboard input on websites.

Safety controls - pauses for user approval before sensitive actions like payments or credential entry. Includes fine-tuning to resist prompt injection.

Performance benchmarks (original CUA model):

  • OSWorld: 38.1% (real-world computer tasks)
  • WebArena: 58.1% (web navigation)

Works best for: form filling, restaurant reservations, simple information gathering, and pulling documents from web portals.

Where it struggles: complex interfaces, multi-page workflows needing sustained context, and sites with anti-bot measures.

Current availability

ChatGPT agent mode - available to Plus, Pro, and Enterprise subscribers. Select “agent mode” from the ChatGPT composer dropdown.

API access - available through the Responses API and Agents SDK.

Pricing and availability change frequently. See OpenAI’s pricing page for current details.

Integration approach

  1. Find suitable tasks Look for Tallyfy tasks involving web interactions - online ordering, booking appointments, data extraction from public websites.

  2. Write clear instructions Put specific natural language instructions in task descriptions. Include exact URLs, button names, and expected outputs.

  3. Pick an integration method

    • API integration using Responses API or Agents SDK
    • Webhook triggers from Tallyfy to your custom middleware
    • MCP server for natural language control
  4. Test incrementally Start with simple, low-risk tasks. Monitor success rates. Adjust instructions before expanding scope.

  5. Set up error handling Use Tallyfy’s conditional logic to route failed or partially completed tasks to human review.

Example: restaurant reservation

Scenario - automate restaurant reservations for client meetings.

Tallyfy task setup:

  • Task description: “Make reservation on OpenTable for the restaurant and date specified in form fields”
  • Form fields collect: restaurant name, party size, date/time, special requests
  • Webhook fires when task is assigned

What the agent does:

  1. Receives structured data from Tallyfy webhook
  2. Navigates to the booking site
  3. Fills the reservation form
  4. Captures the confirmation number
  5. Returns the result to Tallyfy

Result handling:

  • Success: confirmation number stored in a form field, next task triggered
  • Failure: task reassigned to a human with error details
  • Partial completion: flagged for manual review

Integration options

Responses API - send task data directly to OpenAI’s API. The agent picks the right tools (web search, computer use, file search) to finish the job.

Agents SDK - build custom orchestration with the open-source SDK. Supports multi-agent workflows and mixed model usage.

MCP server - expose Tallyfy operations through a Model Context Protocol server. Agents interact with Tallyfy using natural language.

Webhook architecture - use message queues between Tallyfy and OpenAI for high-volume automation with retry logic.

Use case examples

Invoice data extraction:

  • Agent goes to supplier portal, downloads invoices
  • Extracts invoice number, amount, and date
  • Returns structured data to Tallyfy
  • Human reviews before accounting entry

Customer onboarding:

  • Sending workspace invites and setting up shared folders
  • Scheduling kickoff meetings
  • Generating welcome emails
  • Human handles contract review

Competitive research:

  • Visits competitor websites and captures pricing info
  • Compares with previous data
  • Generates a summary report for team review

Writing good task instructions

Be specific. Include exact URLs, button names, field labels, and expected outputs.

Good: “Go to acme.com/invoices, download PDFs from last 30 days, extract invoice numbers and amounts” Poor: “Get the invoices”

Security - never store passwords in task descriptions. Use secure credential management and enable user approval for sensitive actions.

Monitoring - track success rates by task type and adjust instructions based on failure patterns.

Troubleshooting

Task fails repeatedly:

  • Instructions too vague - add button names, URLs, exact text
  • Site has anti-bot measures - consider manual fallback
  • Dynamic content loading - agent may need explicit wait instructions

Integration not triggering:

  • Verify webhook URL is accessible
  • Check authentication tokens
  • Confirm the Tallyfy process is published and active

Partial completion:

  • Route partially completed tasks to humans via conditional logic
  • Add verification steps after agent tasks
  • Track patterns to refine instructions

Limitations

  • Performance on complex tasks is low (38% OSWorld, 58% WebArena for the original CUA model)
  • Processing takes several minutes per task
  • Geographic availability is limited
  • Experimental technology - expect changes and occasional failures
  • Not suitable for tasks requiring judgment calls

Tips for success: start with simple, high-volume tasks. Always keep a human fallback for critical processes. Test before production. Monitor success rates and keep instructions short and specific.

Integrations > Computer AI agents

Computer AI agents are programs that visually interpret and interact with any screen interface like a human would and Tallyfy provides the structured workflow layer that sends instructions and captures results so these agents can be monitored and managed alongside broader business processes.

Mcp Server > Using Tallyfy MCP server with ChatGPT

ChatGPT Enterprise Team and Education users can connect to Tallyfy’s MCP server using OAuth 2.1 with PKCE to manage workflows through natural language with full read/write capabilities via Developer Mode though the text-based interface has limitations for visual workflows form interactions and real-time collaboration making it best suited for complex searches analysis automation planning and template optimization.

Computer Ai Agents > AI agent vendors

AI agent products from both commercial vendors like OpenAI Operator and Claude Computer Use and open-source options like Skyvern and Manus AI can handle browser-based tasks where no API exists and should be started with small tedious tasks to avoid unpredictable behavior and escalating costs.

Computer Ai Agents > Local computer use agents

Local Computer Use Agents run AI-powered automation entirely on your own hardware to deliver complete privacy and zero latency and no token costs while Tallyfy orchestrates these agents through structured workflows with small language models handling 95% of routine business tasks like form filling and invoice processing on standard laptops.