Skip to content

OpenAI agent capabilities

Using OpenAI agent capabilities with Tallyfy

OpenAI provides agent capabilities through several tools: the Responses API, Agents SDK, and Computer Use (CUA) model. These work with Tallyfy to automate workflow tasks that involve web interactions, document processing, and data extraction.

OpenAI’s Operator (launched January 2025, deprecated August 2025) demonstrated browser automation using the Computer-Using Agent (CUA) model. The CUA model now uses o3 reasoning capabilities instead of the original GPT-4o. The model scored 38.1% on the OSWorld benchmark and 58.1% on WebArena - showing limitations for complex automation. These capabilities are now available through “ChatGPT agent” mode in ChatGPT.

More recently, OpenAI released the Responses API and open-source Agents SDK, which provide developers with tools for building AI agents that combine multiple capabilities including web search, file processing, and computer use.

Important guidance for AI agent tasks

Your step-by-step instructions for the AI agent to perform work go into the Tallyfy task description. Start with short, bite-size and easy tasks that are just mundane and tedious. Do not try and ask an AI agent to do huge, complex decision-driven jobs that are goal-driven - they are prone to indeterministic behavior, hallucination, and it can get very expensive quickly.

How OpenAI agents work with Tallyfy

Tallyfy orchestrates OpenAI agent capabilities by triggering tasks, providing structured inputs, and capturing outputs. The integration uses webhooks or API calls to connect Tallyfy processes with OpenAI’s agent tools.

Diagram

What to notice:

  • Multiple tools - OpenAI agents use computer use, web search, and file search based on task requirements
  • Structured data flow - Tallyfy provides inputs through form fields and captures agent outputs for downstream tasks
  • Error handling - Failed tasks can route to human review through Tallyfy’s conditional logic

OpenAI agent capabilities

OpenAI’s agent platform includes several components that work together:

Responses API: Combines chat capabilities with multi-tool support, allowing agents to use web search, file search, and computer use within a single API call.

Agents SDK: Open-source toolkit for building single-agent and multi-agent workflows. Works with OpenAI models and competitor models from Anthropic and Google.

Computer Use (CUA): Browser automation powered by the CUA model with o3 reasoning. Captures screenshots, identifies UI elements, and simulates mouse/keyboard actions. Available through “ChatGPT agent” mode in ChatGPT.

Web Search: Integration with GPT-4o search providing factual accuracy of 90% on SimpleQA benchmark. Includes citations to original sources.

File Search: Document retrieval from large document sets with metadata filtering support.

Integration with Tallyfy workflows

Tallyfy provides structure for OpenAI agent automation:

Trackable execution: Every agent action happens within a Tallyfy process with full audit trail.

Error recovery: When agents encounter issues, Tallyfy’s conditional logic can route tasks to humans for completion.

Incremental approach: Start with simple tasks like form filling or data extraction. Monitor results. Expand gradually to more complex workflows.

MCP support: With Model Context Protocol servers, OpenAI agents can interact with Tallyfy using natural language commands like “complete the next task in customer onboarding.”

Technical details of OpenAI agent capabilities

The Computer-Using Agent (CUA) model now uses o3 reasoning capabilities:

Visual perception: Captures screenshots of web pages and identifies UI elements like buttons, text fields, and links.

Action execution: Simulates mouse clicks and keyboard input to interact with websites.

Safety controls: Pauses for user approval before sensitive actions like payments or credential entry. Includes fine-tuning to resist prompt injection attacks.

Performance benchmarks (original CUA model):

  • OSWorld: 38.1% (real-world computer tasks)
  • WebArena: 58.1% (web navigation scenarios)
  • Performance may improve with o3-based updates

Best suited for:

  • Form filling and data entry
  • Restaurant reservations and bookings
  • Simple information gathering
  • Document retrieval from web portals

Limitations: Struggles with complex interfaces, multi-page workflows requiring sustained context, and websites with anti-bot measures.

Current availability

ChatGPT agent mode: Available to ChatGPT Plus, Pro, and Enterprise subscribers. Access by selecting “agent mode” from the dropdown in the ChatGPT composer.

API access: Available through Responses API and Agents SDK.

Note: Features, pricing, and availability change frequently. See OpenAI’s pricing page for current rates and subscription details.

Integration approach

  1. Identify suitable tasks Look for Tallyfy tasks that involve web interactions - online ordering, booking appointments, data extraction from public websites.

  2. Write clear instructions Create specific natural language instructions in task descriptions. Include exact URLs, button names, and expected outputs.

  3. Choose integration method

    • API integration using Responses API or Agents SDK
    • Webhook triggers from Tallyfy to your custom middleware
    • MCP server for natural language control
  4. Test incrementally Start with simple, low-risk tasks. Monitor success rates. Adjust instructions based on results before expanding to more complex workflows.

  5. Implement error handling Use Tallyfy’s conditional logic to route failed or partially completed tasks to human review.

Example workflow: restaurant reservation

Scenario: Automate restaurant reservations for client meetings.

Tallyfy task configuration:

  • Task description contains clear instructions: “Make reservation on OpenTable for the restaurant and date specified in form fields”
  • Form fields collect: restaurant name, party size, date/time, special requests
  • Webhook triggers when task is assigned

Agent execution:

  1. Receives structured data from Tallyfy webhook
  2. Navigates to restaurant booking site
  3. Fills reservation form
  4. Captures confirmation number
  5. Returns result to Tallyfy

Result handling:

  • Success: Confirmation number stored in form field, next task triggered
  • Failure: Task reassigned to human with error details
  • Partial completion: Flagged for manual review

Integration options

Responses API: Send task instructions and data directly to OpenAI’s API. Agent uses available tools (web search, computer use, file search) to complete the task.

Agents SDK: Build custom workflow orchestration using the open-source SDK. Supports multi-agent workflows and mixed model usage.

MCP server: Create a Model Context Protocol server that exposes Tallyfy operations. OpenAI agents can then interact with Tallyfy using natural language.

Webhook architecture: Use message queues between Tallyfy and OpenAI to handle high-volume automation with retry logic and parallel processing.

Use case examples

Invoice data extraction:

  • Agent navigates to supplier portal
  • Downloads invoices
  • Extracts key data (invoice number, amount, date)
  • Returns structured data to Tallyfy
  • Human reviews before accounting entry

Customer onboarding tasks:

  • Creating collaboration workspace invites
  • Setting up shared folders
  • Scheduling kickoff meetings
  • Generating welcome communications
  • Human handles contract review and customization

Competitive research:

  • Visits competitor websites
  • Captures pricing and feature information
  • Compares with previous data
  • Generates summary report
  • Routes to team for analysis

Configuration considerations

Task instructions: Write specific, clear instructions. Include exact URLs, button names, field labels, and expected outputs.

Good: “Navigate to acme.com/invoices, download PDFs from last 30 days, extract invoice numbers and amounts” Poor: “Get the invoices”

Security: Never store passwords in task descriptions. Use secure credential management. Enable user approval for sensitive actions.

Error handling: Configure Tallyfy conditional logic to route failed tasks to humans. Log agent actions for debugging.

Monitoring: Track success rates by task type. Adjust instructions based on failure patterns.

API pricing

OpenAI API pricing varies by model (GPT-4o, o1, o3, o4-mini) and changes frequently. See OpenAI’s pricing page for current rates.

Troubleshooting

Task fails repeatedly:

  • Instructions may be too vague - add specific details (button names, URLs, exact text)
  • Website may have anti-bot measures - consider manual fallback
  • Dynamic content loading - agent may need explicit wait instructions
  • Login credentials missing or incorrect

Integration not triggering:

  • Verify webhook URL is accessible
  • Check authentication tokens
  • Confirm Tallyfy process is published and active
  • Review webhook logs for error responses

Partial completion:

  • Use Tallyfy conditional logic to route partially completed tasks to humans
  • Add verification steps after agent tasks
  • Track patterns in partial completions to refine instructions

Limitations

Current constraints:

  • Performance on complex tasks is limited (38% OSWorld, 58% WebArena benchmarks for original CUA model)
  • Processing time varies, typically several minutes per task
  • Geographic availability limited to select regions
  • Experimental technology - expect changes and occasional failures
  • Requires careful credential management for security
  • Not suitable for tasks requiring nuanced decision-making

Best practices:

  • Start with simple, high-volume tasks
  • Always have human fallback for critical processes
  • Test thoroughly before production deployment
  • Monitor success rates and adjust instructions
  • Use Tallyfy’s conditional logic for error handling
  • Keep instructions simple and specific

Mcp Server > Using Tallyfy MCP server with ChatGPT

ChatGPT Enterprise Team and Education users can connect to Tallyfy through MCP servers to manage workflows using natural language with full read/write capabilities through Developer Mode though the text-based interface has significant limitations for visual workflows form interactions real-time collaboration and bulk operations making it best suited for complex searches analysis automation planning and template optimization while still requiring Tallyfy’s native visual interface for process tracking collaboration and interactive form completion.

Integrations > Computer AI agents

Computer AI Agents are software programs that can see interpret and interact with any screen interface using visual perception and natural language instructions to automate browser-based tasks while Tallyfy provides workflow orchestration structure and transparency around these AI-powered automation capabilities.

Computer Ai Agents > AI agent vendors

The computer AI agent market features enterprise-ready solutions like OpenAI Operator and Claude Computer Use alongside open-source options like Skyvern and Manus AI with each offering different strengths for automating web-based tasks that lack API access.

Byo Ai > ChatGPT integration

ChatGPT Plus or Team subscribers can connect their accounts to Tallyfy through OAuth2 authentication and GPT Actions to automatically read task data process information according to custom instructions and complete workflow tasks without writing code or managing API keys.