Integrations > Computer AI agents
OpenAI agent capabilities
OpenAI provides agent capabilities through several tools: the Responses API, Agents SDK, and Computer Use (CUA) model. These work with Tallyfy to automate workflow tasks that involve web interactions, document processing, and data extraction.
OpenAI’s Operator (launched January 23, 2025) demonstrated browser automation using the Computer-Using Agent (CUA) model built on GPT-4o. The CUA model scored 38.1% on the OSWorld benchmark and 58.1% on WebArena - showing current limitations for complex automation.
More recently, OpenAI released the Responses API and open-source Agents SDK, which provide developers with tools for building AI agents that combine multiple capabilities including web search, file processing, and computer use.
Important guidance for AI agent tasks
Your step-by-step instructions for the AI agent to perform work go into the Tallyfy task description. Start with short, bite-size and easy tasks that are just mundane and tedious. Do not try and ask an AI agent to do huge, complex decision-driven jobs that are goal-driven - they are prone to indeterministic behavior, hallucination, and it can get very expensive quickly.
Tallyfy orchestrates OpenAI agent capabilities by triggering tasks, providing structured inputs, and capturing outputs. The integration uses webhooks or API calls to connect Tallyfy processes with OpenAI’s agent tools.
What to notice:
- Multiple tools - OpenAI agents use computer use, web search, and file search based on task requirements
 - Structured data flow - Tallyfy provides inputs through form fields and captures agent outputs for downstream tasks
 - Error handling - Failed tasks can route to human review through Tallyfy’s conditional logic
 
OpenAI’s agent platform includes several components that work together:
Responses API: Combines chat capabilities with multi-tool support, allowing agents to use web search, file search, and computer use within a single API call.
Agents SDK: Open-source toolkit for building single-agent and multi-agent workflows. Works with OpenAI models and competitor models from Anthropic and Google.
Computer Use (CUA): Browser automation powered by the CUA model. Captures screenshots, identifies UI elements, and simulates mouse/keyboard actions. Current performance is 38.1% on OSWorld benchmark.
Web Search: Integration with GPT-4o search providing factual accuracy of 90% on SimpleQA benchmark. Includes citations to original sources.
File Search: Document retrieval from large document sets with metadata filtering support.
Tallyfy provides structure for OpenAI agent automation:
Trackable execution: Every agent action happens within a Tallyfy process with full audit trail.
Error recovery: When agents encounter issues, Tallyfy’s conditional logic can route tasks to humans for completion.
Incremental approach: Start with simple tasks like form filling or data extraction. Monitor results. Expand gradually to more complex workflows.
MCP support: With Model Context Protocol servers, OpenAI agents can interact with Tallyfy using natural language commands like “complete the next task in customer onboarding.”
Operator uses the Computer-Using Agent (CUA) model built on GPT-4o architecture:
Visual perception: Captures screenshots of web pages and identifies UI elements like buttons, text fields, and links.
Action execution: Simulates mouse clicks and keyboard input to interact with websites.
Safety controls: Pauses for user approval before sensitive actions like payments or credential entry. Includes fine-tuning to resist prompt injection attacks.
Performance benchmarks:
- OSWorld: 38.1% (real-world computer tasks)
 - WebArena: 58.1% (web navigation scenarios)
 
Best suited for:
- Form filling and data entry
 - Restaurant reservations and bookings
 - Simple information gathering
 - Document retrieval from web portals
 
Limitations: Struggles with complex interfaces, multi-page workflows requiring sustained context, and websites with anti-bot measures.
ChatGPT subscription plans:
- Plus: $20/month
 - Pro: $200/month
 - Enterprise: Custom pricing
 
API access: Available through Responses API and Agents SDK
Geographic availability: United States, Canada, UK, Australia
Note: Features and availability change frequently. Check OpenAI’s official documentation for current status.
- 
Identify suitable tasks Look for Tallyfy tasks that involve web interactions - online ordering, booking appointments, data extraction from public websites.
 - 
Write clear instructions Create specific natural language instructions in task descriptions. Include exact URLs, button names, and expected outputs.
 - 
Choose integration method
- API integration using Responses API or Agents SDK
 - Webhook triggers from Tallyfy to your custom middleware
 - MCP server for natural language control
 
 - 
Test incrementally Start with simple, low-risk tasks. Monitor success rates. Adjust instructions based on results before expanding to more complex workflows.
 - 
Implement error handling Use Tallyfy’s conditional logic to route failed or partially completed tasks to human review.
 
Scenario: Automate restaurant reservations for client meetings.
Tallyfy task configuration:
- Task description contains clear instructions: “Make reservation on OpenTable for the restaurant and date specified in form fields”
 - Form fields collect: restaurant name, party size, date/time, special requests
 - Webhook triggers when task is assigned
 
Agent execution:
- Receives structured data from Tallyfy webhook
 - Navigates to restaurant booking site
 - Fills reservation form
 - Captures confirmation number
 - Returns result to Tallyfy
 
Result handling:
- Success: Confirmation number stored in form field, next task triggered
 - Failure: Task reassigned to human with error details
 - Partial completion: Flagged for manual review
 
Responses API: Send task instructions and data directly to OpenAI’s API. Agent uses available tools (web search, computer use, file search) to complete the task.
Agents SDK: Build custom workflow orchestration using the open-source SDK. Supports multi-agent workflows and mixed model usage.
MCP server: Create a Model Context Protocol server that exposes Tallyfy operations. OpenAI agents can then interact with Tallyfy using natural language.
Webhook architecture: Use message queues between Tallyfy and OpenAI to handle high-volume automation with retry logic and parallel processing.
Invoice data extraction:
- Agent navigates to supplier portal
 - Downloads invoices
 - Extracts key data (invoice number, amount, date)
 - Returns structured data to Tallyfy
 - Human reviews before accounting entry
 
Customer onboarding tasks:
- Creating collaboration workspace invites
 - Setting up shared folders
 - Scheduling kickoff meetings
 - Generating welcome communications
 - Human handles contract review and customization
 
Competitive research:
- Visits competitor websites
 - Captures pricing and feature information
 - Compares with previous data
 - Generates summary report
 - Routes to team for analysis
 
Task instructions: Write specific, clear instructions. Include exact URLs, button names, field labels, and expected outputs.
Good: “Navigate to acme.com/invoices, download PDFs from last 30 days, extract invoice numbers and amounts” Poor: “Get the invoices”
Security: Never store passwords in task descriptions. Use secure credential management. Enable user approval for sensitive actions.
Error handling: Configure Tallyfy conditional logic to route failed tasks to humans. Log agent actions for debugging.
Monitoring: Track success rates by task type. Adjust instructions based on failure patterns.
Current OpenAI API pricing (subject to change):
- GPT-4o: $2.50 per 1M input tokens, $10.00 per 1M output tokens
 - GPT-5: $1.25 per 1M input tokens, $10.00 per 1M output tokens
 - GPT-5 mini: $0.25 per 1M input tokens, $2.00 per 1M output tokens
 
Check OpenAI’s official pricing page for current rates and subscription plan details.
Task fails repeatedly:
- Instructions may be too vague - add specific details (button names, URLs, exact text)
 - Website may have anti-bot measures - consider manual fallback
 - Dynamic content loading - agent may need explicit wait instructions
 - Login credentials missing or incorrect
 
Integration not triggering:
- Verify webhook URL is accessible
 - Check authentication tokens
 - Confirm Tallyfy process is published and active
 - Review webhook logs for error responses
 
Partial completion:
- Use Tallyfy conditional logic to route partially completed tasks to humans
 - Add verification steps after agent tasks
 - Track patterns in partial completions to refine instructions
 
Current constraints:
- Performance on complex tasks is limited (38% OSWorld, 58% WebArena benchmarks for CUA model)
 - Processing time varies, typically several minutes per task
 - Geographic availability limited to select regions
 - Experimental technology - expect changes and occasional failures
 - Requires careful credential management for security
 - Not suitable for tasks requiring nuanced decision-making
 
Best practices:
- Start with simple, high-volume tasks
 - Always have human fallback for critical processes
 - Test thoroughly before production deployment
 - Monitor success rates and adjust instructions
 - Use Tallyfy’s conditional logic for error handling
 - Keep instructions simple and specific
 
Computer Ai Agents > AI agent vendors
Mcp Server > Using Tallyfy MCP server with ChatGPT
Computer Ai Agents > Local computer use agents
Was this helpful?
- 2025 Tallyfy, Inc.
 - Privacy Policy
 - Terms of Use
 - Report Issue
 - Trademarks