Factlen ExplainerAgentic AIExplainerJun 17, 2026, 8:51 AM· 6 min read· #7 of 7 in ai

From Chatbots to Agents: How Large Action Models Are Automating the Web

In 2026, artificial intelligence has evolved from generating text to executing multi-step tasks across apps and websites. Large Action Models (LAMs) are turning passive digital assistants into autonomous agents capable of booking flights, managing calendars, and operating software.

By Factlen Editorial Team

Share this story

Productivity Optimizers 40%Consumer Tech Giants 35%Enterprise Security Teams 25%

Productivity Optimizers: Focuses on the massive time-saving potential of agents that can collapse fragmented software stacks and automate daily busywork.
Consumer Tech Giants: Views agentic AI as the new foundational layer of operating systems, integrating intelligence directly into the user interface.
Enterprise Security Teams: Warns that autonomous execution introduces severe risks, demanding strict human-in-the-loop constraints and solutions for prompt injection.

What's not represented

· Gig economy workers and administrative staff whose routine digital tasks are the primary targets for LAM automation.

Why this matters

The transition from AI that 'talks' to AI that 'acts' means users can offload hours of digital busywork—from booking travel to triaging emails—fundamentally changing how we interact with our devices and reclaiming significant personal time.

Key points

Large Action Models (LAMs) are shifting AI from text generation to autonomous task execution.
LAMs operate on a Perceive-Plan-Execute loop, allowing them to adapt to errors in real-time.
Apple's Siri AI now features 'On-Screen Awareness' to orchestrate tasks across multiple apps.
Enterprise adoption is accelerating, saving knowledge workers 10 to 15 hours of busywork weekly.
Security vulnerabilities like prompt injection require 'human-in-the-loop' safeguards for high-stakes actions.

10–15 hours

Weekly busywork saved by top AI agents

8 billion

Parameters in Salesforce's xLAM model

61.4%

Claude Sonnet 4.5's score on the OSWorld benchmark

For years, artificial intelligence was essentially a conversational encyclopedia. Users typed prompts, and Large Language Models (LLMs) generated text, code, or images in return. But the AI remained trapped inside the chat window. By mid-2026, the paradigm has fundamentally shifted from generation to execution. The industry is rapidly adopting Large Action Models (LAMs)—systems designed not just to talk, but to do. Instead of asking an AI how to book a flight and receiving a list of instructions, users can now ask a LAM-powered agent to navigate to a travel website, input dates, compare prices, and complete the transaction autonomously.[2][5]

This transition from passive chatbots to proactive digital agents marks the most significant leap in consumer technology since the smartphone. While LLMs predict the next word in a sequence, LAMs predict and execute the next action in a digital environment. They can click buttons, scroll through pages, fill out forms, and trigger APIs. This capability is transforming personal computing, allowing knowledge workers and everyday users to offload hours of digital busywork. Industry surveys indicate that professionals using advanced AI agents are reclaiming 10 to 15 hours a week previously lost to scheduling, inbox triage, and routine data entry.[4][6]

The architecture behind this revolution is fundamentally different from standard text generation. LAMs operate on a continuous, dynamic loop: Perceive, Plan, Execute, Adapt, and Iterate. When given a high-level command like "organize my upcoming business trip to New York," the model first perceives its environment, which might involve reading the user's calendar, scanning their inbox for flight confirmations, and checking corporate travel policies. It then plans a multi-step sequence, breaking the abstract goal into concrete subtasks that can be tackled one by one.[2][3]

Unlike LLMs that generate text in one shot, LAMs operate on a continuous loop of perception and adaptation.

Execution is where LAMs truly distinguish themselves. Using neuro-symbolic AI—which combines the pattern recognition of neural networks with structured, logical reasoning—the model interacts directly with graphical user interfaces (GUIs) or software backends. If an error occurs, such as a website timing out or a password failing, the LAM doesn't simply crash or wait for human intervention. It adapts, reading the error message on the screen and attempting an alternative route, much like a human user would when navigating a broken webpage.[3][5]

The consumer integration of this technology reached a major milestone with Apple's recent overhaul of its mobile operating system. Unveiled at the Worldwide Developers Conference, the newly rechristened "Siri AI" is deeply embedded with Apple Intelligence and powered by on-device models alongside cloud partnerships. Moving far beyond a simple voice assistant, Siri now possesses "On-Screen Awareness." This allows the agent to see exactly what the user is looking at on their display and take complex, cross-app actions based on that visual context.[1][7]

The consumer integration of this technology reached a major milestone with Apple's recent overhaul of its mobile operating system.

For instance, a user can ask Siri to "take the photos I just edited, create a shared album for the family, and draft an email to Mom saying they are ready." The LAM orchestrates this entire workflow seamlessly across the Photos, iCloud, and Mail applications without the user needing to tap a single button. By making AI the primary interface layer across its ecosystem, Apple is normalizing agentic workflows for hundreds of millions of smartphone users, effectively turning the operating system itself into an autonomous assistant.[1][7]

Beyond consumer smartphones, the enterprise and startup ecosystems are aggressively deploying LAMs to solve complex business operations and administrative bottlenecks. Startups like Lindy, Manus AI, and Dume.ai have built cross-app orchestrators that collapse the highly fragmented modern software stack into a single unified interface. A single agent can now monitor a customer relationship management (CRM) tool, draft highly personalized follow-up emails based on recent meeting transcripts, and update project management boards simultaneously, entirely in the background without requiring manual data entry from human employees.[4][7]

The underlying models powering these agents are becoming increasingly specialized and efficient. While massive, general-purpose LLMs require vast computational resources, LAMs can be smaller and highly optimized for specific environments. For example, Salesforce's open-source xLAM, an 8-billion parameter model, has been shown to outperform much larger models on specific function-calling benchmarks. This efficiency makes it cheaper and faster for developers to deploy autonomous agents across various industries, from logistics and warehousing to healthcare administration and customer service.[2][3]

The quantifiable impact and efficiency of modern agentic AI systems.

In software engineering, the impact of LAMs is particularly pronounced. AI developers and coding agents are no longer just suggesting snippets of code; they are independently writing, testing, and debugging entire applications. On rigorous industry benchmarks like OSWorld, which tests an AI's ability to complete real-world computer tasks, models like Anthropic's Claude Sonnet 4.5 have crossed the 60% success threshold, proving that autonomous digital workers are becoming viable for complex, multi-step engineering challenges that previously required dedicated human oversight.[2][3]

However, the rise of autonomous digital agents introduces significant security and reliability challenges. When an AI can click buttons and send emails on a user's behalf, the cost of a hallucination or an error skyrockets. A chatbot generating a factually incorrect paragraph is a nuisance; an AI agent accidentally deleting a database or sending a sensitive document to the wrong client is a catastrophe. Consequently, enterprise adoption heavily relies on "human-in-the-loop" constraints, where the LAM pauses to ask for explicit confirmation before executing high-stakes actions.[2][7]

Enterprise security teams mandate 'human-in-the-loop' checkpoints to prevent autonomous agents from making high-stakes errors.

The most pressing unsolved vulnerability in the LAM ecosystem is prompt injection. Because these agents constantly read external data—such as incoming emails or web pages—they can be tricked by malicious hidden text. An attacker could theoretically send an email containing invisible text that commands the user's AI assistant to forward all password reset links to an external server. Cybersecurity experts warn that until prompt injection is definitively solved, fully autonomous agents cannot be safely deployed in highly sensitive or financially critical environments.[2][7]

Despite these hurdles, the trajectory of artificial intelligence is clear. The era of the passive chatbot is giving way to the era of the active digital teammate. As Large Action Models continue to improve their reasoning capabilities and interface integrations, the friction of interacting with computers will plummet. Users will increasingly define their goals and step back, allowing their personal AI agents to navigate the digital world on their behalf, fundamentally reshaping productivity, software design, and daily life in the years to come.[4][6]

How we got here

2023–2024
Large Language Models (LLMs) like ChatGPT popularize conversational AI, but remain confined to text generation.
Early 2024
The term 'Large Action Model' gains traction with the debut of early consumer hardware devices attempting to automate app usage.
Late 2025
Major AI labs release models capable of autonomous computer use and complex software engineering tasks.
June 2026
Apple unveils Siri AI with On-Screen Awareness, integrating LAM capabilities directly into the iOS ecosystem.

Viewpoints in depth

Productivity Optimizers

Focuses on the massive time-saving potential of agents that can collapse fragmented software stacks.

For startup founders, executives, and knowledge workers, the primary value of LAMs is time reclamation. This camp views the modern software stack—spread across email, Slack, CRMs, and calendars—as a source of immense friction. By deploying AI agents that can autonomously orchestrate tasks across these platforms, they argue that workers can save 10 to 15 hours a week. The focus here is on seamless integration, proactive scheduling, and eliminating the 'busywork' that prevents deep, creative thinking.

Consumer Tech Giants

Views agentic AI as the new foundational layer of operating systems.

Companies like Apple and Google see LAMs not as standalone apps, but as the new interface for computing itself. By embedding 'On-Screen Awareness' and cross-app orchestration directly into the operating system, they aim to make traditional app navigation obsolete. Their strategy relies heavily on on-device processing to maintain user privacy while executing highly contextual, multi-step commands that draw on personal data like photos, messages, and emails.

Enterprise Security Teams

Warns that autonomous execution introduces severe risks, demanding strict human-in-the-loop constraints.

Cybersecurity professionals and enterprise IT administrators approach LAMs with high caution. While they acknowledge the efficiency gains, they emphasize that an AI capable of taking action is an AI capable of doing damage. This camp points to unsolved vulnerabilities like prompt injection, where an agent could be manipulated by hidden text in an external document. They advocate for strict 'human-in-the-loop' architectures, ensuring that no agent can delete data, move funds, or send external communications without explicit human approval.

What we don't know

Whether the industry can definitively solve prompt injection, which currently prevents fully autonomous deployment in high-security environments.
How quickly third-party app developers will standardize their APIs to allow seamless cross-app orchestration by OS-level agents.
The long-term impact of LAMs on entry-level administrative and data-entry jobs as automation becomes cheaper and more reliable.

Key terms

Large Action Model (LAM): An AI system designed to autonomously navigate digital interfaces, plan sequences, and execute real-world tasks rather than just generating text.
Agentic AI: Artificial intelligence that can pursue complex goals independently, making decisions and using software tools without step-by-step human guidance.
Neuro-symbolic AI: A hybrid approach combining the pattern recognition of neural networks with structured, logical reasoning to help AI plan and execute tasks reliably.
Prompt Injection: A security vulnerability where malicious hidden text tricks an AI agent into executing unauthorized commands.

Frequently asked

What is the difference between an LLM and a LAM?

A Large Language Model (LLM) generates text and answers questions, while a Large Action Model (LAM) interacts with software to execute multi-step tasks like booking flights or sending emails.

How do AI agents handle errors?

LAMs use a continuous loop of perception and adaptation. If a website times out or a button moves, the agent reads the screen and attempts an alternative route to complete the goal.

Are personal AI agents safe to use?

While highly capable, they carry risks like prompt injection, where malicious hidden text can trick the agent. Experts recommend using 'human-in-the-loop' settings for sensitive tasks.

What is On-Screen Awareness?

It is a feature that allows an AI agent to 'see' and understand the content currently displayed on a user's screen, enabling it to take relevant actions across different apps without manual copying and pasting.

Sources

[1]Apple NewsroomConsumer Tech Giants
Apple introduces Siri AI, a profoundly more capable and personal assistant
Read on Apple Newsroom →
[2]IdeaToMVPEnterprise Security Teams
Large Action Models (LAMs): The Complete Guide for Founders & Builders (2026)
Read on IdeaToMVP →
[3]Gradient FlowEnterprise Security Teams
Navigating the Large Action Model Landscape
Read on Gradient Flow →
[4]Mastra AIProductivity Optimizers
The top AI personal assistants in 2026
Read on Mastra AI →
[5]TechTargetConsumer Tech Giants
What is a large action model (LAM)?
Read on TechTarget →
[6]MediumProductivity Optimizers
LAMs: The Doers of AI
Read on Medium →
[7]Factlen Editorial TeamEnterprise Security Teams
Synthesis by Factlen editorial team
Read on Factlen Editorial Team →

Up next

Animal Cognition

AI Decodes Sperm Whale 'Phonetic Alphabet,' Revealing Complex Language Parallels

Using advanced machine learning, marine biologists and AI researchers have discovered that sperm whale vocalizations contain a phonetic alphabet with vowel-like structures. The breakthrough reveals striking parallels to human speech and brings scientists closer to translating interspecies communication.

Every angle. Every day.

Get ai stories with full source coverage and perspective breakdowns delivered to your inbox.

Get the briefing →Browse ai