Skip to content

OpenAI agent capabilities

Using OpenAI agent capabilities with Tallyfy

OpenAI provides agent capabilities through several tools: the Responses API, Agents SDK, and Computer Use (CUA) model. These work with Tallyfy to automate workflow tasks that involve web interactions, document processing, and data extraction.

OpenAI’s Operator (launched January 23, 2025) demonstrated browser automation using the Computer-Using Agent (CUA) model built on GPT-4o. The CUA model scored 38.1% on the OSWorld benchmark and 58.1% on WebArena - showing current limitations for complex automation.

More recently, OpenAI released the Responses API and open-source Agents SDK, which provide developers with tools for building AI agents that combine multiple capabilities including web search, file processing, and computer use.

Important guidance for AI agent tasks

Your step-by-step instructions for the AI agent to perform work go into the Tallyfy task description. Start with short, bite-size and easy tasks that are just mundane and tedious. Do not try and ask an AI agent to do huge, complex decision-driven jobs that are goal-driven - they are prone to indeterministic behavior, hallucination, and it can get very expensive quickly.

How OpenAI agents work with Tallyfy

Tallyfy orchestrates OpenAI agent capabilities by triggering tasks, providing structured inputs, and capturing outputs. The integration uses webhooks or API calls to connect Tallyfy processes with OpenAI’s agent tools.

Diagram

What to notice:

  • Multiple tools - OpenAI agents use computer use, web search, and file search based on task requirements
  • Structured data flow - Tallyfy provides inputs through form fields and captures agent outputs for downstream tasks
  • Error handling - Failed tasks can route to human review through Tallyfy’s conditional logic

OpenAI agent capabilities

OpenAI’s agent platform includes several components that work together:

Responses API: Combines chat capabilities with multi-tool support, allowing agents to use web search, file search, and computer use within a single API call.

Agents SDK: Open-source toolkit for building single-agent and multi-agent workflows. Works with OpenAI models and competitor models from Anthropic and Google.

Computer Use (CUA): Browser automation powered by the CUA model. Captures screenshots, identifies UI elements, and simulates mouse/keyboard actions. Current performance is 38.1% on OSWorld benchmark.

Web Search: Integration with GPT-4o search providing factual accuracy of 90% on SimpleQA benchmark. Includes citations to original sources.

File Search: Document retrieval from large document sets with metadata filtering support.

Integration with Tallyfy workflows

Tallyfy provides structure for OpenAI agent automation:

Trackable execution: Every agent action happens within a Tallyfy process with full audit trail.

Error recovery: When agents encounter issues, Tallyfy’s conditional logic can route tasks to humans for completion.

Incremental approach: Start with simple tasks like form filling or data extraction. Monitor results. Expand gradually to more complex workflows.

MCP support: With Model Context Protocol servers, OpenAI agents can interact with Tallyfy using natural language commands like “complete the next task in customer onboarding.”

Technical details of OpenAI Operator

Operator uses the Computer-Using Agent (CUA) model built on GPT-4o architecture:

Visual perception: Captures screenshots of web pages and identifies UI elements like buttons, text fields, and links.

Action execution: Simulates mouse clicks and keyboard input to interact with websites.

Safety controls: Pauses for user approval before sensitive actions like payments or credential entry. Includes fine-tuning to resist prompt injection attacks.

Performance benchmarks:

  • OSWorld: 38.1% (real-world computer tasks)
  • WebArena: 58.1% (web navigation scenarios)

Best suited for:

  • Form filling and data entry
  • Restaurant reservations and bookings
  • Simple information gathering
  • Document retrieval from web portals

Limitations: Struggles with complex interfaces, multi-page workflows requiring sustained context, and websites with anti-bot measures.

Current availability

ChatGPT subscription plans:

  • Plus: $20/month
  • Pro: $200/month
  • Enterprise: Custom pricing

API access: Available through Responses API and Agents SDK

Geographic availability: United States, Canada, UK, Australia

Note: Features and availability change frequently. Check OpenAI’s official documentation for current status.

Integration approach

  1. Identify suitable tasks Look for Tallyfy tasks that involve web interactions - online ordering, booking appointments, data extraction from public websites.

  2. Write clear instructions Create specific natural language instructions in task descriptions. Include exact URLs, button names, and expected outputs.

  3. Choose integration method

    • API integration using Responses API or Agents SDK
    • Webhook triggers from Tallyfy to your custom middleware
    • MCP server for natural language control
  4. Test incrementally Start with simple, low-risk tasks. Monitor success rates. Adjust instructions based on results before expanding to more complex workflows.

  5. Implement error handling Use Tallyfy’s conditional logic to route failed or partially completed tasks to human review.

Example workflow: Restaurant reservation

Scenario: Automate restaurant reservations for client meetings.

Tallyfy task configuration:

  • Task description contains clear instructions: “Make reservation on OpenTable for the restaurant and date specified in form fields”
  • Form fields collect: restaurant name, party size, date/time, special requests
  • Webhook triggers when task is assigned

Agent execution:

  1. Receives structured data from Tallyfy webhook
  2. Navigates to restaurant booking site
  3. Fills reservation form
  4. Captures confirmation number
  5. Returns result to Tallyfy

Result handling:

  • Success: Confirmation number stored in form field, next task triggered
  • Failure: Task reassigned to human with error details
  • Partial completion: Flagged for manual review

Integration options

Responses API: Send task instructions and data directly to OpenAI’s API. Agent uses available tools (web search, computer use, file search) to complete the task.

Agents SDK: Build custom workflow orchestration using the open-source SDK. Supports multi-agent workflows and mixed model usage.

MCP server: Create a Model Context Protocol server that exposes Tallyfy operations. OpenAI agents can then interact with Tallyfy using natural language.

Webhook architecture: Use message queues between Tallyfy and OpenAI to handle high-volume automation with retry logic and parallel processing.

Use case examples

Invoice data extraction:

  • Agent navigates to supplier portal
  • Downloads invoices
  • Extracts key data (invoice number, amount, date)
  • Returns structured data to Tallyfy
  • Human reviews before accounting entry

Customer onboarding tasks:

  • Creating collaboration workspace invites
  • Setting up shared folders
  • Scheduling kickoff meetings
  • Generating welcome communications
  • Human handles contract review and customization

Competitive research:

  • Visits competitor websites
  • Captures pricing and feature information
  • Compares with previous data
  • Generates summary report
  • Routes to team for analysis

Configuration considerations

Task instructions: Write specific, clear instructions. Include exact URLs, button names, field labels, and expected outputs.

Good: “Navigate to acme.com/invoices, download PDFs from last 30 days, extract invoice numbers and amounts” Poor: “Get the invoices”

Security: Never store passwords in task descriptions. Use secure credential management. Enable user approval for sensitive actions.

Error handling: Configure Tallyfy conditional logic to route failed tasks to humans. Log agent actions for debugging.

Monitoring: Track success rates by task type. Adjust instructions based on failure patterns.

API pricing

Current OpenAI API pricing (subject to change):

  • GPT-4o: $2.50 per 1M input tokens, $10.00 per 1M output tokens
  • GPT-5: $1.25 per 1M input tokens, $10.00 per 1M output tokens
  • GPT-5 mini: $0.25 per 1M input tokens, $2.00 per 1M output tokens

Check OpenAI’s official pricing page for current rates and subscription plan details.

Troubleshooting

Task fails repeatedly:

  • Instructions may be too vague - add specific details (button names, URLs, exact text)
  • Website may have anti-bot measures - consider manual fallback
  • Dynamic content loading - agent may need explicit wait instructions
  • Login credentials missing or incorrect

Integration not triggering:

  • Verify webhook URL is accessible
  • Check authentication tokens
  • Confirm Tallyfy process is published and active
  • Review webhook logs for error responses

Partial completion:

  • Use Tallyfy conditional logic to route partially completed tasks to humans
  • Add verification steps after agent tasks
  • Track patterns in partial completions to refine instructions

Limitations

Current constraints:

  • Performance on complex tasks is limited (38% OSWorld, 58% WebArena benchmarks for CUA model)
  • Processing time varies, typically several minutes per task
  • Geographic availability limited to select regions
  • Experimental technology - expect changes and occasional failures
  • Requires careful credential management for security
  • Not suitable for tasks requiring nuanced decision-making

Best practices:

  • Start with simple, high-volume tasks
  • Always have human fallback for critical processes
  • Test thoroughly before production deployment
  • Monitor success rates and adjust instructions
  • Use Tallyfy’s conditional logic for error handling
  • Keep instructions simple and specific

Integrations > Computer AI agents

Computer AI Agents are software programs that can see interpret and interact with any screen interface using visual perception and natural language instructions to automate browser-based tasks while Tallyfy provides workflow orchestration structure and transparency around these AI-powered automation capabilities.

Computer Ai Agents > AI agent vendors

The Computer AI Agent market offers both enterprise-ready solutions like OpenAI Operator Claude Computer Use and Twin.so alongside open-source alternatives like Skyvern and Manus AI each bringing unique strengths for different workflows from consumer tasks to developer automation with integration capabilities for handling mundane web-based processes.

Mcp Server > Using Tallyfy MCP server with ChatGPT

ChatGPT Enterprise Team and Education users can now connect to Tallyfy workflows through MCP servers for natural language management with immediate value despite text-based UI limitations that restrict visual interactions and complex form handling while excelling at template search automation scenarios and process analysis.

Computer Ai Agents > Local computer use agents

Tallyfy leads the revolution in running Computer Use Agents completely offline on local hardware while maintaining complete privacy zero latency and no token costs through specialized solutions that deploy AI systems entirely on properly equipped laptops and computers solving every major limitation of cloud-based agents including privacy concerns internet dependency API costs and latency issues.