Skip to content

Skyvern AI agents

Browser automation with Skyvern

Skyvern automates browser workflows using LLMs and computer vision. It’s open source (AGPL-3.0 license) and performs well on the WebVoyager benchmark. Unlike traditional RPA scripts that break when websites change, Skyvern adapts in real-time by visually understanding page layouts.

Important guidance for AI agent tasks

Your step-by-step instructions for the AI agent go into the Tallyfy task description. Start with short, easy tasks that are mundane and tedious. Don’t ask an AI agent to handle huge, decision-driven jobs - they’re prone to unpredictable behavior, hallucination, and costs can spiral quickly.

Integration with Tallyfy

You can connect Skyvern to Tallyfy through webhooks or middleware platforms (Zapier, Make, n8n). The flow works like this: Tallyfy triggers the automation, Skyvern runs the browser workflow, and structured data comes back to Tallyfy.

What you get:

  • Three-agent setup - Planner decides goals, Actor executes actions, Validator confirms success
  • Self-correcting behavior - Failed tasks trigger automatic retries with different approaches
  • Structured output - Returns JSON or CSV data that maps to Tallyfy form fields

Key capabilities

Deployment options:

  • Open source - Self-host under AGPL-3.0 with full source access
  • Cloud - Managed service at app.skyvern.com with anti-bot measures, proxies, and CAPTCHA solving

Technical foundation:

  • Multiple LLM providers: OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, Ollama, OpenRouter, Gemini, Novita AI
  • Python 3.11-3.13 compatibility
  • Playwright for browser automation
  • Real-time visual parsing

Advanced features:

  • CAPTCHA solving and 2FA (QR codes, email, SMS)
  • Proxy networks for geo-targeting
  • Livestream browser viewport for debugging
  • File downloads and uploads
  • Credit card form filling

Pricing:

  • Cloud: Pay-per-step model (check current rates at skyvern.com)
  • Free tier with starter credit
  • Self-hosted: Free (you cover infrastructure and LLM API costs)

Multi-agent architecture

Skyvern splits work across three core agents:

  • Planner - Sets goals, tracks progress, breaks tasks into sub-goals
  • Actor - Executes browser actions for specific goals and reports status
  • Validator - Checks if goals succeeded, triggers retries when they don’t

These are backed by specialized sub-agents:

  • Interactable Element Agent - Identifies buttons, forms, and links in HTML
  • Navigation Agent - Plans action sequences to reach goals
  • Data Extraction Agent - Structures webpage data into JSON or CSV
  • Password Agent - Handles logins with password manager integration
  • 2FA Agent - Manages authentication prompts
  • Auto-complete Agent - Handles form fields like address lookups

Getting started

  1. Pick a deployment:

    • Skyvern Cloud - Visit app.skyvern.com for managed service with free starter credit
    • Self-hosted - Clone from github.com/Skyvern-AI/skyvern (needs Python 3.11+ or Docker)
  2. Self-hosting setup (if chosen):

    • Local install: Run pip install skyvern, then skyvern init to configure
    • Docker: Clone the repo, set LLM API keys in docker-compose.yml, run docker compose up -d
    • Access the UI at http://localhost:8080
  3. Configure your LLM provider:

    • Add API keys for your chosen provider (OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, Ollama, OpenRouter, Gemini, Novita AI)
  4. Define your first task:

    • Set the url (starting page)
    • Write a prompt in plain language describing what you want done
    • Optionally add data_schema for structured extraction (JSON/CSV)
    • Optionally define error_codes for when to stop
  5. Run and monitor:

    • Launch tasks via UI or API
    • Use the livestream feature to watch the browser in real-time

Real-world use cases

Skyvern’s documentation highlights these production scenarios:

Invoice management - Log into vendor portals, download statements, rename and organize files automatically.

Job applications - Apply across multiple platforms, fill forms with candidate info, upload resumes.

Government compliance - Submit forms to state and federal portals, handle multi-step 2FA flows, upload documents.

E-commerce - Purchase from hundreds of sites, extract competitor pricing, post listings across platforms.

IT operations - Employee onboarding/offboarding, system access provisioning, credential management.

What sets Skyvern apart

Resilient to website changes - Traditional RPA breaks when sites redesign. Skyvern uses visual understanding to adapt - no XPath selectors to maintain.

Open source - Self-host and customize without vendor lock-in under the AGPL-3.0 license.

Handles web complexity - CAPTCHA solving, 2FA, proxy networks, and credit card processing all work out of the box.

Scalable - The API-driven design supports thousands of parallel automation tasks.

Important considerations

Prompt quality matters - Vague instructions lead to failed tasks. Write clear, specific prompts.

Self-hosting needs resources - Browser automation with LLMs eats CPU and RAM. Budget for infrastructure costs on top of the free software.

AGPL-3.0 license implications - If you modify Skyvern and offer it as a public service, you must share your source code changes.

Website defenses - Even with anti-bot measures, aggressive automation can trigger rate limits. The cloud version includes proxy networks to help.

Task complexity - Break multi-step workflows into smaller pieces. Test incrementally to find failure points early.

Computer Ai Agents > AI agent vendors

AI agent products from both commercial vendors like OpenAI Operator and Claude Computer Use and open-source options like Skyvern and Manus AI can handle browser-based tasks where no API exists and should be started with small tedious tasks to avoid unpredictable behavior and escalating costs.

Integrations > Computer AI agents

Computer AI agents are programs that visually interpret and interact with any screen interface like a human would and Tallyfy provides the structured workflow layer that sends instructions and captures results so these agents can be monitored and managed alongside broader business processes.

Vendors > Twin.so AI agents

Twin.so builds AI agents that automate browser-based tasks like invoice retrieval at scale and when paired with Tallyfy’s workflow orchestration can handle repetitive web portal work such as document downloads and data extraction through enterprise partnerships rather than self-service access.

Vendors > OpenAI agent capabilities

OpenAI offers agent capabilities through the Responses API, Agents SDK and ChatGPT agent mode that connect with Tallyfy via webhooks to automate web tasks and route failures to human review.