Local computer use agents

Run AI automation on your own computers

A local computer use agent is AI that runs on your own computers instead of the cloud. It looks at the screen, clicks, types, and reads, just like a person would, to do routine screen-based work. Because everything runs on your hardware, your data never leaves the building.

That’s the big draw for most businesses. No screenshots or customer data sent to an outside service. No per-use fees. It keeps working even when the internet is down. Tallyfy is building support for this, so you can hand a local agent a task and track what it does, the same way you track any other work.

This is early and advanced. Today it’s most useful for the small, boring jobs that eat up people’s time.

Start small with local AI agents

Put your step-by-step instructions for the agent in the Tallyfy task description. Start with short, bite-size jobs that are just mundane and tedious, like filling a form or copying data between screens.

Don’t ask an AI agent to handle big, complex, goal-driven work yet. Agents are still unpredictable on hard tasks, they make things up, and costs add up fast. A small AI model is plenty for routine jobs. You don’t need a giant model (or an expensive graphics card) to fill a form or pull data off an invoice, and the small one is faster.

Why run agents locally

Privacy. Screen captures, business data, and your workflows never leave your premises.
No per-use costs. Once it’s set up, a local agent runs without paying per task or per token.
Works offline. No internet connection required.
Compliance. For rules like GDPR and HIPAA, keeping data in-house is often a must. Healthcare, finance, and government teams need automation that never sends data outside.
Speed. Running on your own hardware skips the network round-trip, so each step is quicker.

The trade-off: you need a reasonably capable computer. The good news is that small models handle most everyday business tasks well, and they run on ordinary hardware.

How it works

The agent runs a simple loop. It looks at the screen, decides what to do next, does it, and checks the result. Then it repeats until the job is done.

Each loop takes a few seconds, depending on your hardware and the model. The agent keeps going until it reaches the goal or hits a stopping point you set.

Small models are usually enough

Bigger isn’t better here. Small AI models are great at the structured, repetitive tasks that make up most office work, like form filling, data extraction, and routine data entry.

Tallyfy’s approach puts reliability first. A simple agent that finishes a mundane task every time beats a clever one that crashes halfway through your invoice run. Pick the smallest model that does the job:

A tiny model can sort tasks by type or read a form.
A small-to-mid model handles most everyday automation.
A larger model is only worth it for the occasional complex case.

Many capable open models are free to run locally, in a range of sizes. Your IT team can match the size to the task (see the technical notes below).

Using local agents with Tallyfy

Tallyfy is the control layer. It hands the agent a clear task and the data to work with, the agent does the work on your machine, and the results come back into Tallyfy, fully tracked.

When a Tallyfy task needs computer work, the agent gets the step-by-step instructions from the task description and the input data from your form fields. It does the work, then returns the results into Tallyfy along with a log of what it did. You get real-time progress, an audit trail, and human approval checkpoints for anything important.

Here’s a simple example. A Tallyfy step says “get last month’s invoices from the supplier portal.” The agent logs in, filters to last month, reads the invoice number, amount, and due date off each one, and drops that data back into the right Tallyfy fields, with the PDFs attached to the process.

Keeping it safe

A local agent can do anything a person at the keyboard can, so guardrails matter:

Approval gates. Require a human “yes” before anything sensitive, like sending an email, deleting a file, or making a payment.
Sandboxing. Run the agent in an isolated environment with limited access.
Audit logging. Keep a full record of every action for compliance and debugging.
Emergency stop. Shut the agent down and roll back at any time.

Smaller models help here too. Their behavior is more predictable, which is exactly what you want for routine work.

Is it worth it?

Over time, local agents usually cost less than cloud automation tools, which charge ongoing subscription or per-use fees. After you’ve covered the hardware, routine automation essentially runs for free. Tallyfy plans to charge a simple per-minute rate for active agent time, so you only pay when an agent is actually working.

One honest caveat: research from MIT found that most AI pilots fail to deliver real savings. The ones that succeed share a pattern. They target small, well-defined back-office tasks, use focused models instead of general-purpose AI, and measure results from day one. So start with one repetitive task, prove the value, then expand.

For your IT team

(Skip this unless you’re setting up the technical side.)

How an agent is put together. Four pieces work together: a vision-language model (the “brain” that reads screenshots and decides actions), screen capture with OCR, an action engine that performs clicks and keystrokes, and an orchestration loop that runs the perceive-reason-act cycle, handles errors, and talks to Tallyfy.

Cross-platform. Local agents work on Windows, macOS, and Linux. The most reliable setups combine screenshot-based vision with each OS’s native automation layer (Windows UI Automation, the macOS Accessibility API, or Linux AT-SPI) and fall back to vision when needed.

Hardware, roughly. Small models (about 1B to 8B parameters) run comfortably on a normal laptop or a graphics card with 4 to 12 GB of memory. Larger models (32B and up) want 24 GB or more. Quantization (a compression step) cuts a model’s memory use significantly with little quality loss, so you can run bigger models on smaller hardware.

Models worth a look. Many strong open models run locally, including small multimodal ones built for modest hardware (for example, Google’s Gemma family) and compact general models from the Llama, Qwen, and Phi families. Match the size to the task rather than reaching for the biggest one.

Getting started. A tool like Ollama lets you download and run these models locally with a single command. Start with one model and one repetitive task, integrate it with Tallyfy for coordination and tracking, measure the time saved, then scale up.

Frameworks and research. Production-ready local agents build on open work like Microsoft’s UFO2 (Windows), ScreenAgent (cross-platform), and Hugging Face’s open computer agent, plus local inference engines such as Ollama, vLLM, and llama.cpp.

Integrations > Computer AI agents

Computer AI agents visually interpret and interact with any screen-based interface like a human…

Vendors > Claude computer use

Claude Computer Use lets an AI agent control a screen through screenshots and mouse and keyboard…

Vendors > OpenAI agent capabilities

OpenAI’s agent tools including the Responses API, Agents SDK, and Computer Use model connect…

Vendors > Skyvern AI agents

Skyvern is an open-source browser automation tool that uses LLMs and computer vision to run web…

Was this helpful?

Get in touch

About Tallyfy