Integrations > Computer AI Agents
Skyvern AI Agents
Skyvern is an open-source platform (AGPL-3.0 license) built to automate browser-based workflows using Large Language Models (LLMs) and computer vision. With Skyvern 2.0’s groundbreaking 85.8% performance on the WebVoyager benchmark, it has become one of the most capable browser automation platforms available. Its ability to understand and interact with web pages based on natural language prompts makes it a versatile tool for automating tasks within Tallyfy processes that involve web interactions.
Important guidance for AI agent tasks
Your step-by-step instructions for the AI agent to perform work go into the Tallyfy task description. Start with short, bite-size and easy tasks that are just mundane and tedious. Do not try and ask an AI agent to do huge, complex decision-driven jobs that are goal-driven - they are prone to indeterministic behavior, hallucination, and it can get very expensive quickly.
Skyvern aims to replace brittle automation scripts by enabling AI agents to parse items in a browser’s viewport in real-time, create an interaction plan, and execute it. It’s designed to work even on websites it hasn’t encountered before and to be resilient to website layout changes.
Key aspects of Skyvern include:
- Open Source with Cloud Option: The core Skyvern logic is open-source (AGPL-3.0), allowing for self-hosting. They also offer Skyvern Cloud (
app.skyvern.com
) with additional features like anti-bot measures, proxy networks, and CAPTCHA solvers. - State-of-the-Art Performance: Skyvern 2.0 achieved 85.8% on the WebVoyager benchmark, surpassing Google Mariner (83.5%), AgentE (73.1%), and other leading platforms. This performance was validated using the new Web Bench dataset with 5,750 tasks across 452 websites.
- LLM and Computer Vision Powered: Skyvern supports multiple LLM providers including OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, Ollama, OpenRouter, Gemini, and Novita AI. It uses Python 3.11-3.13 compatibility and browser automation libraries like Playwright for interaction.
- Advanced Features: Supports CAPTCHA solving, 2FA/TOTP (QR-based, email, SMS), proxy networks for geo-targeting, credit card processing for e-commerce automation, and offers explainable AI through run summaries and livestreaming of the browser viewport.
- API-Driven & Scalable: Enables running numerous automation tasks concurrently with enterprise-grade infrastructure supporting thousands of simultaneous executions.
- Transparent Pricing: Cloud version costs $0.10 per step with $5 free credit to start, making it accessible for testing and small-scale deployments.
Skyvern 2.0 employs an advanced multi-agent system that has dramatically improved performance over the original single-agent approach:
1. Planner Agent
- Decides on goals to accomplish on a website
- Maintains working memory of the overall objective and progress
- Breaks down complex tasks into manageable sub-goals
2. Actor Agent
- Given a narrowly scoped goal, executes specific actions on the website
- Reports back on completion status and any issues encountered
- Handles detailed browser interactions and element identification
3. Validator Agent
- Assesses whether goals were successfully achieved
- Provides feedback to both Actor and Planner agents
- Ensures quality control and error correction
This architecture addresses the limitations of Skyvern 1.0, which scored ~45% on WebVoyager due to insufficient memory and limited complex reasoning capabilities.
Additionally, Skyvern employs specialized sub-agents for specific functions:
- Interactable Element Agent: Parses HTML to identify and extract all interactive elements (buttons, forms, links, etc.).
- Navigation Agent: Plans the sequence of actions needed to navigate websites and progress toward goals.
- Data Extraction Agent: Extracts specific data from webpages and structures it into user-defined formats like JSON or CSV.
- Password Agent: Securely handles login forms with integration to password managers (Bitwarden, 1Password, LastPass).
- 2FA Agent: Manages two-factor authentication prompts during login processes.
- Dynamic Auto-complete Agent: Handles complex auto-complete form fields like address lookups or searchable dropdowns.
Skyvern offers both a managed cloud service and a self-hosted open-source option.
-
Choose Your Deployment Model:
- Skyvern Cloud: Navigate to
app.skyvern.com
, create an account. This is the quickest way to start and includes managed infrastructure for anti-bot measures, proxies, and CAPTCHA solving. Costs $0.10 per step with $5 free credit. - Self-Hosted (Local/Docker): If you prefer control over the environment or want to leverage the open-source aspect fully.
- Skyvern Cloud: Navigate to
-
Self-Hosting Skyvern (if chosen):
- Prerequisites: Make sure you have Python 3.11, 3.12, or 3.13. For Docker, ensure Docker Desktop is running.
- Local Install:
- Install Skyvern:
pip install skyvern
- Configure: Run
skyvern init
. This creates a.env
file for your LLM API keys and other settings. - Launch Server:
skyvern run server
- Launch UI:
skyvern run ui
(access athttp://localhost:8080
)
- Install Skyvern:
- Docker Compose Setup:
- Clone the Skyvern GitHub repository:
git clone https://github.com/Skyvern-AI/skyvern.git
- Navigate to the cloned directory.
- Configure
docker-compose.yml
with your LLM provider API key(s). - Run:
docker compose up -d
- Access UI:
http://localhost:8080
- Clone the Skyvern GitHub repository:
-
Configure LLM Providers:
- Skyvern supports LLMs from OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, Ollama, OpenRouter, Gemini, Novita AI, and OpenAI-compatible endpoints. You’ll need to supply API keys for your chosen LLM provider(s).
-
Define Your First Task or Workflow:
- Tasks: A single request to Skyvern. You need to specify:
url
: The starting URL for the task.prompt
: A natural language instruction detailing the goal.data_schema
(optional): A JSONC formatted schema if you want Skyvern to extract data in a specific structure.error_codes
(optional): To define specific situations where Skyvern should stop.
- Workflows: For more complex operations, you can chain multiple tasks with features including navigation, actions, data extraction, loops, file parsing, sending emails, and text prompts.
- Tasks: A single request to Skyvern. You need to specify:
-
Execute and Monitor:
- Use the Skyvern UI or API to start tasks. You can often livestream the browser’s viewport to see the agent in action, which helps with debugging.
Integrating Tallyfy with Skyvern allows you to delegate browser automation steps from your Tallyfy processes.
Tallyfy Task: “Extract Competitor Pricing for Product X”
- Inputs from Tallyfy Form Fields:
Competitor Site URL
:https://competitor-store.com/products/comparable-to-x
Product Name on Competitor Site
: “MegaWidget Advanced”Data Schema (JSONC)
:{"product_name": "string", // Should match 'MegaWidget Advanced'"price": "string","stock_status": "string"}
- Integration Steps (Conceptual - API route):
- When the Tallyfy task starts, a webhook or middleware calls the Skyvern API’s endpoint for creating/running a task.
- The API request to Skyvern includes:
url
:https://competitor-store.com/products/comparable-to-x
(from Tallyfy)prompt
: “Navigate to the product page. Find the product named ‘MegaWidget Advanced’. Extract its current price and stock status. If the product is not found, indicate ‘Product not found’.”data_schema
: The JSONC schema defined above (from Tallyfy).
- Skyvern’s Planner Agent analyzes the task, the Actor Agent locates the product, and the Data Extraction Agent extracts the price and stock status according to the schema.
- The Tallyfy integration layer polls Skyvern for task completion.
- Once complete, Skyvern returns the structured JSON output (e.g.,
{"product_name": "MegaWidget Advanced", "price": "$99.99", "stock_status": "In Stock"}
). - This JSON data is parsed, and the relevant values are used to update designated Tallyfy form fields (e.g., ‘Competitor Price’, ‘Competitor Stock Status’). The Tallyfy process then moves to the next step.
- State-of-the-Art Performance: With 85.8% accuracy on WebVoyager benchmark, Skyvern 2.0 leads the industry in browser automation capabilities.
- Flexibility of Open Source: Option to self-host, customize, and avoid vendor lock-in for the core agent platform.
- Resilience to UI Changes: Designed to be more robust against website updates than traditional RPA due to its reliance on visual understanding and LLM-based reasoning.
- Handles Complex Web Elements: Features like CAPTCHA solving, 2FA support, and proxy networks extend automation reach to challenging websites.
- Structured Data Output: Directly extract data into usable JSON or CSV formats for easy consumption by Tallyfy and subsequent process steps.
- Scalable Automation: The API-driven nature allows for parallel execution of many browser tasks.
- Transparent Pricing: Clear $0.10 per step pricing model makes cost planning straightforward.
- Prompt Engineering: The effectiveness of Skyvern heavily relies on well-crafted prompts for tasks and workflows. Ambiguous or poorly defined goals can lead to task failures.
- Resource Requirements for Self-Hosting: Running LLMs and browser automation locally can be resource-intensive (CPU, RAM, GPU if using local vision models). Ensure your self-hosting environment is adequately provisioned.
- Learning Curve: While Skyvern 2.0’s architecture is more capable, it may require understanding of the Planner-Actor-Validator system for optimal results.
- AGPL-3.0 License: If you modify the open-source code and use it in a publicly accessible service, the AGPL-3.0 license has specific requirements regarding making your modifications available.
- Rate Limits and Anti-Bot Measures: Even with advanced agents, very frequent or rapid interactions can trigger anti-bot defenses on websites. The Skyvern Cloud offering aims to mitigate some of this with managed proxies and anti-bot measures.
- Task Complexity: While significantly improved in 2.0, very complex multi-domain tasks may still require careful task decomposition and monitoring.
By combining Tallyfy for process orchestration with Skyvern’s industry-leading browser automation capabilities, businesses can tackle automation challenges that were previously impossible with traditional RPA or API-based solutions.
Computer Ai Agents > AI Agent Vendors
- 2025 Tallyfy, Inc.
- Privacy Policy
- Terms of Use
- Report Issue
- Trademarks