Integrations > Computer AI agents
OpenAI agent capabilities
OpenAI offers agent capabilities through three main tools: the Responses API, Agents SDK, and Computer Use (CUA) model. These connect with Tallyfy to automate web interactions, document processing, and data extraction.
The CUA model now runs on o3 reasoning instead of the original GPT-4o. It scored 38.1% on OSWorld and 58.1% on WebArena - modest results that highlight why you shouldn’t rely on agents for anything complex yet. These capabilities are available through “ChatGPT agent” mode.
Important guidance for AI agent tasks
Your step-by-step instructions for the AI agent go into the Tallyfy task description. Start with short, easy tasks that are mundane and tedious. Don’t ask an AI agent to handle large, decision-heavy jobs - they’re prone to unpredictable behavior, hallucination, and costs can spiral fast.
Tallyfy triggers tasks, provides structured inputs through form fields, and captures outputs. Webhooks or API calls connect Tallyfy processes with OpenAI’s agent tools.
What to notice:
- Multiple tools - agents pick between computer use, web search, and file search based on what the task needs
- Structured data flow - Tallyfy sends inputs via form fields and captures agent outputs for downstream tasks
- Error handling - failed tasks route to human review through Tallyfy’s conditional logic
Responses API - combines chat with multi-tool support, so agents can use web search, file search, and computer use in a single API call.
Agents SDK - open-source toolkit for single-agent and multi-agent workflows. Works with OpenAI models and competitors like Anthropic and Google.
Computer Use (CUA) - browser automation using o3 reasoning. Takes screenshots, finds UI elements, and simulates mouse/keyboard actions. Available through ChatGPT agent mode.
Web search - uses GPT-4o search with cited sources.
File search - retrieves documents from large sets with metadata filtering.
Trackable execution - every agent action runs inside a Tallyfy process with a full audit trail.
Error recovery - when agents hit issues, Tallyfy’s conditional logic routes tasks to humans.
Start small - begin with simple tasks like form filling or data extraction. Watch the results. Expand gradually.
MCP support - with Model Context Protocol servers, OpenAI agents can interact with Tallyfy using natural language commands like “complete the next task in customer onboarding.”
Visual perception - captures screenshots and identifies buttons, text fields, and links.
Action execution - simulates mouse clicks and keyboard input on websites.
Safety controls - pauses for user approval before sensitive actions like payments or credential entry. Includes fine-tuning to resist prompt injection.
Performance benchmarks (original CUA model):
- OSWorld: 38.1% (real-world computer tasks)
- WebArena: 58.1% (web navigation)
Works best for: form filling, restaurant reservations, simple information gathering, and pulling documents from web portals.
Where it struggles: complex interfaces, multi-page workflows needing sustained context, and sites with anti-bot measures.
ChatGPT agent mode - available to Plus, Pro, and Enterprise subscribers. Select “agent mode” from the ChatGPT composer dropdown.
API access - available through the Responses API and Agents SDK.
Pricing and availability change frequently. See OpenAI’s pricing page ↗ for current details.
-
Find suitable tasks Look for Tallyfy tasks involving web interactions - online ordering, booking appointments, data extraction from public websites.
-
Write clear instructions Put specific natural language instructions in task descriptions. Include exact URLs, button names, and expected outputs.
-
Pick an integration method
- API integration using Responses API or Agents SDK
- Webhook triggers from Tallyfy to your custom middleware
- MCP server for natural language control
-
Test incrementally Start with simple, low-risk tasks. Monitor success rates. Adjust instructions before expanding scope.
-
Set up error handling Use Tallyfy’s conditional logic to route failed or partially completed tasks to human review.
Scenario - automate restaurant reservations for client meetings.
Tallyfy task setup:
- Task description: “Make reservation on OpenTable for the restaurant and date specified in form fields”
- Form fields collect: restaurant name, party size, date/time, special requests
- Webhook fires when task is assigned
What the agent does:
- Receives structured data from Tallyfy webhook
- Navigates to the booking site
- Fills the reservation form
- Captures the confirmation number
- Returns the result to Tallyfy
Result handling:
- Success: confirmation number stored in a form field, next task triggered
- Failure: task reassigned to a human with error details
- Partial completion: flagged for manual review
Responses API - send task data directly to OpenAI’s API. The agent picks the right tools (web search, computer use, file search) to finish the job.
Agents SDK - build custom orchestration with the open-source SDK. Supports multi-agent workflows and mixed model usage.
MCP server - expose Tallyfy operations through a Model Context Protocol server. Agents interact with Tallyfy using natural language.
Webhook architecture - use message queues between Tallyfy and OpenAI for high-volume automation with retry logic.
Invoice data extraction:
- Agent goes to supplier portal, downloads invoices
- Extracts invoice number, amount, and date
- Returns structured data to Tallyfy
- Human reviews before accounting entry
Customer onboarding:
- Sending workspace invites and setting up shared folders
- Scheduling kickoff meetings
- Generating welcome emails
- Human handles contract review
Competitive research:
- Visits competitor websites and captures pricing info
- Compares with previous data
- Generates a summary report for team review
Be specific. Include exact URLs, button names, field labels, and expected outputs.
Good: “Go to acme.com/invoices, download PDFs from last 30 days, extract invoice numbers and amounts” Poor: “Get the invoices”
Security - never store passwords in task descriptions. Use secure credential management and enable user approval for sensitive actions.
Monitoring - track success rates by task type and adjust instructions based on failure patterns.
Task fails repeatedly:
- Instructions too vague - add button names, URLs, exact text
- Site has anti-bot measures - consider manual fallback
- Dynamic content loading - agent may need explicit wait instructions
Integration not triggering:
- Verify webhook URL is accessible
- Check authentication tokens
- Confirm the Tallyfy process is published and active
Partial completion:
- Route partially completed tasks to humans via conditional logic
- Add verification steps after agent tasks
- Track patterns to refine instructions
- Performance on complex tasks is low (38% OSWorld, 58% WebArena for the original CUA model)
- Processing takes several minutes per task
- Geographic availability is limited
- Experimental technology - expect changes and occasional failures
- Not suitable for tasks requiring judgment calls
Tips for success: start with simple, high-volume tasks. Always keep a human fallback for critical processes. Test before production. Monitor success rates and keep instructions short and specific.
Mcp Server > Using Tallyfy MCP server with ChatGPT
Computer Ai Agents > AI agent vendors
Computer Ai Agents > Local computer use agents
Was this helpful?
- 2025 Tallyfy, Inc.
- Privacy Policy
- Terms of Use
- Report Issue
- Trademarks