Skip to content

Skyvern AI agents

Browser automation with Skyvern

Skyvern automates browser workflows using LLMs and computer vision. It’s open source (AGPL-3.0 license) and scored 85.8% on the WebVoyager benchmark - currently best-in-class for AI browser agents. Unlike traditional RPA scripts that break when websites change, Skyvern adapts in real-time using visual understanding.

Important guidance for AI agent tasks

Your step-by-step instructions for the AI agent to perform work go into the Tallyfy task description. Start with short, bite-size and easy tasks that are just mundane and tedious. Do not try and ask an AI agent to do huge, complex decision-driven jobs that are goal-driven - they are prone to indeterministic behavior, hallucination, and it can get very expensive quickly.

Integration possibilities with Tallyfy

Skyvern can be integrated with Tallyfy through webhooks or middleware platforms (Zapier, Make, n8n) to automate browser-based tasks. The integration follows this pattern: Tallyfy triggers the automation request, Skyvern executes the browser workflow, and returns structured data back to Tallyfy.

What you get:

  • Three-agent architecture - Planner decides goals, Actor executes actions, Validator confirms success
  • Self-correcting behavior - Failed tasks trigger automatic retries with different approaches
  • Structured output - Returns JSON or CSV data that maps to Tallyfy form fields

Key capabilities

Deployment options:

  • Open source - Self-host under AGPL-3.0 license with full source code access
  • Cloud - Managed service at app.skyvern.com with anti-bot measures, proxies, and CAPTCHA solving

Performance:

  • Scored 85.8% on WebVoyager benchmark (tested on 5,750 tasks across 452 websites)
  • Outperformed Google Mariner and other commercial solutions
  • Skyvern 1.0 scored 45% on the same benchmark before the multi-agent architecture

Technical foundation:

  • Supports multiple LLM providers: OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, Ollama, OpenRouter, Gemini, Novita AI
  • Python 3.11-3.13 compatibility
  • Uses Playwright for browser automation
  • Real-time visual parsing with computer vision

Advanced features:

  • CAPTCHA solving and 2FA authentication (QR codes, email, SMS)
  • Enterprise proxy networks for geo-targeting
  • Livestream browser viewport for debugging
  • File downloads and uploads
  • Credit card form filling

Pricing (as of January 2025):

  • Cloud: $0.05 per step (reduced from $0.10)
  • Free tier available with $5 credit
  • Self-hosted: Free (you pay for infrastructure and LLM API costs)

Multi-agent architecture

Skyvern 2.0’s performance improvement (from 45% to 85.8%) came from splitting work across specialized agents:

Planner Agent

  • Sets goals based on the overall objective
  • Maintains working memory of progress
  • Breaks complex tasks into sub-goals

Actor Agent

  • Executes specific actions for narrowly scoped goals
  • Reports completion status and issues
  • Handles browser interactions and element identification

Validator Agent

  • Checks if goals were achieved successfully
  • Provides feedback to Planner and Actor
  • Triggers retries when tasks fail

Specialized sub-agents:

  • Interactable Element Agent - Identifies buttons, forms, links in HTML
  • Navigation Agent - Plans action sequences to reach goals
  • Data Extraction Agent - Structures webpage data into JSON or CSV
  • Password Agent - Handles login forms with password manager integration
  • 2FA Agent - Manages authentication prompts during login
  • Auto-complete Agent - Handles complex form fields like address lookups

Getting started

  1. Choose deployment:

    • Skyvern Cloud - Visit app.skyvern.com for managed service ($0.05 per step, $5 free credit)
    • Self-hosted - Clone from github.com/Skyvern-AI/skyvern (requires Python 3.11-3.13 or Docker)
  2. Self-hosting setup (if chosen):

    • Local install: Run pip install skyvern, then skyvern init to configure
    • Docker: Clone repository, configure docker-compose.yml with LLM API keys, run docker compose up -d
    • Access UI at http://localhost:8080
  3. Configure LLM provider:

    • Add API keys for your chosen provider (OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, Ollama, OpenRouter, Gemini, Novita AI)
  4. Define your first task:

    • Specify url (starting URL)
    • Write prompt in natural language describing the goal
    • Optional: Add data_schema for structured extraction (JSON/CSV format)
    • Optional: Define error_codes for when to stop
  5. Execute and monitor:

    • Run tasks via UI or API
    • Use livestream feature to watch the browser in real-time

Real-world use cases

According to Skyvern’s documentation, production deployments handle these scenarios:

Invoice management:

  • Automated login to multiple vendor portals
  • Download monthly statements and invoices
  • Rename, organize, and store files automatically

Job applications:

  • Apply to job postings across multiple platforms
  • Fill application forms with candidate information
  • Upload resumes and cover letters

Government compliance:

  • Submit forms to state and federal portals
  • Handle multi-step flows with 2FA
  • Upload documents and confirm receipt

E-commerce operations:

  • Purchase items from hundreds of different websites
  • Extract competitor pricing data
  • Post listings across multiple platforms

IT operations:

  • Employee onboarding and offboarding workflows
  • System access provisioning
  • Credential management across tools

Why Skyvern stands out

Resilient to website changes: Traditional RPA scripts break when websites redesign. Skyvern uses visual understanding to adapt automatically - no XPath selectors to maintain.

Open source flexibility: Self-host and customize without vendor lock-in. AGPL-3.0 license provides full source access.

Handles web complexity: CAPTCHA solving, 2FA authentication, proxy networks for geo-targeting, and credit card processing work out of the box.

Proven performance: 85.8% accuracy on WebVoyager benchmark across 5,750 real-world tasks - currently best-in-class for AI browser agents.

Scalable infrastructure: API-driven design supports running thousands of parallel automation tasks through cloud or self-hosted deployments.

Important considerations

Prompt quality matters: Vague instructions lead to failed tasks. Clear, specific prompts are essential for reliable automation.

Self-hosting requirements: Running browser automation with LLMs requires significant CPU and RAM. Budget for infrastructure costs beyond the free software.

AGPL-3.0 license implications: If you modify Skyvern and offer it as a public service, you must share your source code changes.

Website defenses exist: Even with Skyvern’s anti-bot measures, aggressive automation can trigger rate limits. The cloud version includes proxy networks to help.

Task complexity limits: Break down complex multi-step workflows carefully. Test incrementally to identify failure points early.

Computer Ai Agents > AI agent vendors

The Computer AI Agent market offers both enterprise-ready solutions like OpenAI Operator Claude Computer Use and Twin.so alongside open-source alternatives like Skyvern and Manus AI each bringing unique strengths for different workflows from consumer tasks to developer automation with integration capabilities for handling mundane web-based processes.

Integrations > Computer AI agents

Computer AI Agents are software programs that can see interpret and interact with any screen interface using visual perception and natural language instructions to automate browser-based tasks while Tallyfy provides workflow orchestration structure and transparency around these AI-powered automation capabilities.

Vendors > Twin.so AI agents

Twin.so builds AI agents that automate web browser tasks through direct website interaction without requiring APIs and currently serves over 500,000 European SMB customers through their Invoice Operator product while focusing on enterprise partnerships rather than self-service integration with platforms like Tallyfy.

Vendors > OpenAI agent capabilities

OpenAI’s agent capabilities integrate with Tallyfy to automate workflow tasks through browser automation web search and document processing using the Responses API Agents SDK and Computer Use model while requiring careful task design and human fallbacks for complex processes.