Integrations > Computer AI Agents
Claude Computer Use and Tallyfy
Anthropic’s “Computer Use” capability (currently in public beta for Claude 3.5 Sonnet and Claude 3.7 Sonnet) allows developers to build applications where Claude can interact with computer desktop environments much like a human. This includes perceiving screen information, moving a cursor, clicking buttons, and typing text, opening new possibilities for automating UI-based tasks within Tallyfy processes.
Anthropic is enabling Claude with general computer interaction skills rather than creating specific integrations for countless applications. This is achieved through an API that allows Claude to perceive and act within a sandboxed computing environment.
Key aspects of Claude’s Computer Use include:
- Models Supported: Claude 3.5 Sonnet (new) and Claude 3.7 Sonnet (which offers enhanced tools and an optional “thinking” capability to expose reasoning steps).
- API-Driven Interaction: Developers use Anthropic’s Messages API, providing Claude with a set of Anthropic-defined computer use tools. Claude then requests to use these tools to achieve a user’s goal.
- The Agent Loop:
- Your application sends a user prompt and the list of available computer use tools to Claude.
- Claude decides if a tool can help and responds with a
tool_use
request, specifying the tool and its inputs. - Your application (the client) is responsible for executing this tool request in a secure, sandboxed computing environment and then sending the
tool_result
(e.g., screenshot, command output, success/failure) back to Claude in a new user message. - Claude analyzes the result and decides on the next action, which could be another tool call or a final text response. This loop continues until the task is completed.
- Sandboxed Computing Environment: Essential for safety, this environment (often a Docker container) typically includes:
- A virtual X11 display server (e.g., Xvfb) for rendering the desktop.
- A lightweight Linux desktop environment (e.g., Mutter window manager, Tint2 panel).
- Pre-installed applications (e.g., Firefox, LibreOffice, text editors, file managers).
- Your implementations of the Anthropic-defined tools that translate Claude’s requests into actual UI operations.
- Anthropic-Defined Tools (User Executed): Anthropic specifies the tools, but your application executes them. Core tools include:
computer
: For mouse/keyboard actions (key presses, typing, cursor movement, clicks, drags, scrolling) and taking screenshots. Requiresdisplay_width_px
anddisplay_height_px
.text_editor
(str_replace_editor
): To view, create, and edit files within the environment.bash
: To run shell commands in the sandboxed environment.- Each tool has versions for Claude 3.5 Sonnet (e.g.,
computer_20241022
) and Claude 3.7 Sonnet (e.g.,computer_20250124
).
- Natural Language Instructions: Claude translates user prompts (goals) into a sequence of tool uses.
- Broad Availability: The Claude API is accessible directly from Anthropic, on Amazon Bedrock, and Google Cloud’s Vertex AI.
Integrating Tallyfy with Claude’s Computer Use would typically involve an intermediary application or service that you build to bridge Tallyfy and the Anthropic API.
-
Set Up Anthropic API Access:
- Obtain an Anthropic API key and familiarize yourself with their API documentation, particularly the sections on “Tool Use” and “Computer Use (beta)”.
-
Build or Use a Reference Implementation:
- Anthropic provides a reference implementation (often a Docker setup) that includes a containerized environment, implementations of the computer use tools, an agent loop, and a web interface. This is the recommended starting point.
- This environment will be where Claude’s requested actions are actually performed.
-
Configure the Computing Environment:
- Ensure the sandboxed environment has the necessary applications (e.g., specific browser, office suite) that tasks automated via Tallyfy will need to interact with.
-
Develop the Intermediary Application/Service:
- This service will receive requests (e.g., via a webhook from Tallyfy or a Tallyfy-compatible middleware).
- It will then construct the appropriate prompt and tool list for the Claude API based on the Tallyfy task instructions and input data.
- It will manage the “agent loop,” sending Claude’s
tool_use
requests to your sandboxed environment for execution and returningtool_result
back to Claude. - Once Claude indicates task completion, this service will relay the final output back to Tallyfy.
-
Prompt Engineering:
- Craft clear and explicit prompts for Claude. Anthropic recommends:
- Specifying simple, well-defined tasks.
- Instructing Claude to verify outcomes with screenshots after each step.
- Suggesting keyboard shortcuts for difficult UI elements.
- Providing examples of successful interactions if possible.
- Using XML tags like
<robot_credentials>
for sensitive data, while being mindful of prompt injection risks.
- Craft clear and explicit prompts for Claude. Anthropic recommends:
Tallyfy Task: “Open Product_Feedback.xlsx
from desktop, filter for ‘Urgent’ issues, and count them.”
- Inputs from Tallyfy Form Fields:
File Name
:Product_Feedback.xlsx
Filter Column Name
:Priority
Filter Value
:Urgent
Output Field Name
:Urgent Issue Count
- Integration Steps (Conceptual):
- Tallyfy task starts, triggering your intermediary application via a webhook, passing the form field data.
- Your application constructs an initial prompt for Claude: “Open the Excel file named
Product_Feedback.xlsx
located on the desktop. This file contains columns includingPriority
. Filter the data to show only rows where thePriority
column is ‘Urgent’. Count the number of such rows and provide the total count.” - The application initiates a session with the Claude API, providing the
computer
,text_editor
(if needed for scripts), andbash
tools. - Agent Loop Begins:
- Claude might first request a
screenshot
to see the desktop. - Your app executes this in the sandbox, returns the image.
- Claude identifies the Excel file icon, requests a
double_click
at its coordinates. - Your app executes, returns a new screenshot showing Excel open.
- Claude identifies the filter controls (e.g., a ‘Data’ menu, then ‘Filter’), requests clicks to apply the filter based on
Priority
=Urgent
. - Your app executes, returns screenshots.
- Claude identifies the number of visible rows (or a status bar count), then formulates a final text response: “The number of urgent issues is 15.”
- Claude might first request a
- Your intermediary application receives this final text response, parses the count (15), and updates the
Urgent Issue Count
form field in the Tallyfy task via Tallyfy’s API.
- Automate UI Interactions for Diverse Applications: Potentially interact with a wide array of desktop and web applications without needing application-specific APIs.
- Leverage Claude’s Reasoning: Utilizes the advanced language understanding and reasoning of Claude models to interpret tasks and navigate UIs.
- Structured Task Management by Tallyfy: Tallyfy defines the “what” and “why” of the task, passes necessary inputs, and receives structured outputs, ensuring the AI’s actions are part of a documented, trackable process.
Anthropic highlights these for its beta Computer Use feature:
- Security Risks: Computer use poses unique risks. Always run in a dedicated virtual machine or container with minimal privileges. Avoid giving the model access to sensitive data directly. Limit internet access if possible. Require human confirmation for impactful decisions.
- Prompt Injection: Claude might follow instructions found in on-screen content (e.g., on a webpage or in an image) even if they conflict with your primary instructions. Anthropic has classifiers to detect and flag potential prompt injections in screenshots, steering the model to ask for user confirmation, but caution is advised.
- Latency: Human-AI interaction latency might be slower than direct human action for some tasks. Best for non-time-critical tasks or background processing in trusted environments initially.
- Accuracy and Reliability (Beta): Claude may still make mistakes in interpreting screens or selecting actions (e.g., coordinate precision, tool selection). The “thinking” capability in Claude 3.7 Sonnet can help debug. Scrolling and complex spreadsheet interactions have known limitations, though improving.
- Restricted Actions: Anthropic limits Claude’s ability to create accounts or generate content on social media and communication platforms to prevent impersonation.
- Cost: API calls, especially those involving image analysis (screenshots) and multiple tool interactions, will have associated costs. The use of Anthropic-defined tools adds to the token count.
Always carefully review and verify Claude’s computer use actions, especially in a production environment integrated with Tallyfy. Start with low-risk tasks and incorporate human review steps in Tallyfy where appropriate.
Vendors > Manus AI Agents and Tallyfy
Vendors > Twin.so AI Agents and Tallyfy
Vendors > OpenAI Operator and Tallyfy
- 2025 Tallyfy, Inc.
- Privacy Policy
- Terms of Use
- Report Issue
- Trademarks