Agentic AIExplainerJun 12, 2026, 3:02 AM· 8 min read· #4 of 54 in technology

How Microsoft and Xiaomi Are Turning Plain Text Into Trainable AI Parameters

New open-source frameworks like Microsoft’s SkillOpt and Xiaomi’s MiMo Code are replacing manual prompt engineering with mathematical optimization, allowing enterprise AI agents to self-improve without touching model weights.

By Factlen Editorial Team

Share this story

Enterprise AI Architects 40%Open-Source Developers 35%AI Researchers 25%

Enterprise AI Architects: Focuses on the cost savings and portability of optimizing the skill layer rather than fine-tuning base models.
Open-Source Developers: Values the democratization of agentic AI, allowing small teams to build industrial-grade agents on cheap hardware.
AI Researchers: Analyzes the mathematical rigor of applying deep learning optimization techniques directly to natural language text spaces.

What's not represented

· Proprietary AI model providers whose business relies on fine-tuning revenue
· Non-technical enterprise end-users

Why this matters

Fine-tuning AI models is expensive and rigid. By mathematically optimizing the 'skill layer' (plain text markdown files) instead, companies can boost agent accuracy by over 20% with zero additional compute costs, making custom AI workflows cheaper, more reliable, and highly portable across different model providers.

Key points

Microsoft's SkillOpt treats plain text Markdown files as trainable parameters, using deep learning techniques to optimize natural language instructions.
The framework boosts AI accuracy by up to 23.5% on complex tasks without requiring any changes to the underlying model's weights.
Xiaomi's newly released MiMo Code translates text-based skills into deterministic JavaScript to prevent language models from making inconsistent procedural errors.
MiMo Code utilizes a background subagent to continuously summarize and store context, allowing agents to complete 200-step tasks without forgetting early instructions.

+23.5 points

GPT-5.5 accuracy lift with SkillOpt

52 of 52

Benchmark wins for SkillOpt

62%

MiMo Code score on SWE-Bench Pro

200+

Multi-step tasks handled by MiMo Code

For the past two years, enterprise artificial intelligence has been bottlenecked by a surprisingly low-tech problem: the guessing game of prompt engineering. As companies deploy autonomous AI agents to handle complex workflows—from legal document review to multi-step software engineering—they quickly discover that frontier models lack procedural discipline. When an agent fails to follow a specific formatting rule or forgets a critical tool policy, developers typically respond by manually tweaking a massive system prompt, hoping the new phrasing fixes the error without breaking something else. This manual iteration is slow, unscientific, and notoriously fragile. But in June 2026, the AI engineering landscape is undergoing a fundamental shift. Two major open-source releases—Microsoft’s SkillOpt and Xiaomi’s MiMo Code—are replacing manual prompt tweaking with mathematical optimization. By treating plain text files as trainable parameters, these frameworks allow AI agents to systematically self-improve, boosting accuracy by over 20 points without the staggering costs of fine-tuning underlying model weights.[1][2]

To understand this breakthrough, one must first understand how modern AI agents store knowledge. The industry has rapidly coalesced around a standard known as "Agent Skills," pioneered by Anthropic and Vercel. Instead of stuffing a model’s context window with every conceivable instruction at the start of a session, developers package specific capabilities into self-contained folders. At the core of each folder is a `SKILL.md` file—a simple Markdown document containing YAML metadata (the skill's name and description) and natural-language instructions on how to execute a specific task. These folders can also bundle Python scripts, configuration templates, and reference materials. This modular approach allows teams to share domain expertise across different AI agents without rewriting core system prompts.[6][7]

The magic of the `SKILL.md` format lies in a concept called "progressive disclosure." When an enterprise agent boots up, it does not read the full text of every available skill. Instead, it only loads the metadata—just enough information to know what tools are available in its arsenal. If a user asks the agent to audit a codebase for security flaws, the agent scans its available skills, identifies the "Security Audit" metadata, and only then pulls the full Markdown instructions into its active memory. This on-demand loading prevents context bloat, saves massive amounts of inference tokens, and allows a single agent to theoretically possess thousands of specialized enterprise workflows without hallucinating or losing focus.[6]

The Agent Skill standard uses progressive disclosure to load instructions only when a specific task requires them.

However, while the architecture of Agent Skills is elegant, optimizing the actual text inside the `SKILL.md` file has remained a stubbornly manual process. If an agent consistently misinterprets step three of a data-cleaning skill, a human engineer must open the Markdown file, rewrite the instruction, and run tests to see if the model's behavior improves. Because large language models are highly sensitive to phrasing, a change that fixes one edge case often degrades performance on another. As Microsoft researcher Yifan Yang noted, an ungated, manual rewrite of a skill document once pushed a GPT-5.5 model's performance on the SpreadsheetBench evaluation down from 41.8 to 41.1. The industry needed a way to optimize these text files with the same rigor used to train neural networks.[1][8]

Enter Microsoft Research's SkillOpt, an open-source framework released under the MIT license that fundamentally changes how agent skills are refined. SkillOpt is the first "text-space optimizer" for AI agents. Instead of adjusting billions of floating-point numbers in a model's weight matrix—a process that requires massive GPU clusters and specialized data science teams—SkillOpt treats the plain Markdown skill document as the trainable "external state" of a frozen language model. It borrows the strict training discipline of deep learning, utilizing epochs, batch sizes, and learning rate schedulers, but applies them entirely to natural language.[1][3]

The SkillOpt training loop operates through a systematic, four-step process. First is the "Rollout" phase, where the frozen target agent executes a batch of tasks using the current version of the skill document. The system records the entire trajectory of the execution, including tool calls, verifier feedback, and a final numeric score. Next comes the "Reflect" phase. A separate optimizer model analyzes the scored trajectories, identifying exactly where the agent deviated from the desired workflow or hallucinated an output. Crucially, it reflects on both failures and successes to understand what phrasing works best for the specific base model being used.[3]

Following reflection, the optimizer proposes bounded edits to the `SKILL.md` file—suggesting specific additions, deletions, or replacements to the natural language instructions. This is where SkillOpt's most critical feature, the validation gate, comes into play. Before any edit is permanently merged into the skill document, the modified skill is tested against a held-out validation dataset. If the new phrasing does not strictly improve the agent's overall performance score, the edit is rejected. This mathematical gating ensures that the skill document only ever evolves in a positive direction, entirely eliminating the regression issues that plague manual prompt engineering.[3][4]

This is where SkillOpt's most critical feature, the validation gate, comes into play.

The performance gains delivered by this text-space optimization are staggering. Across six industry benchmarks and seven different target models, SkillOpt achieved 52 out of 52 wins against competing optimization methods. When applied to GPT-5.5, the optimized Markdown files lifted average accuracy by 23.5 percentage points in direct chat environments, and by 24.8 points when integrated into OpenAI's Codex harness. On complex multimodal extraction tasks, accuracy jumped from 0.73 to 0.93. The final artifact produced by this rigorous training loop is simply a highly refined `best_skill.md` file, typically ranging from 300 to 2,000 tokens in length.[4][8]

Microsoft's SkillOpt delivers massive accuracy improvements on GPT-5.5 with zero additional inference cost.

For enterprise AI architects, the most appealing aspect of SkillOpt is its cost profile: zero additional inference compute at deployment. Because the optimization happens entirely during the training phase, the production agent simply reads a better-written text file. There are no extra API calls, no complex retrieval-augmented generation (RAG) overhead, and no latency penalties. Furthermore, the optimized skills exhibit remarkable portability. A skill document mathematically refined for GPT-5.5 can often be transferred directly to an Anthropic Claude Code environment and still deliver superior performance without requiring retraining, effectively abstracting the skill layer away from the underlying model provider.[4][7]

While Microsoft is optimizing the text of the skills themselves, Chinese technology giant Xiaomi is tackling the agent execution layer with its newly open-sourced MiMo Code V0.1.0. Released in early June 2026, MiMo Code is a terminal-native AI coding assistant built on the open-source OpenCode project. Like SkillOpt, MiMo Code is designed to handle complex, long-horizon enterprise tasks that require strict procedural discipline. However, Xiaomi identified a different vulnerability in the Agent Skills ecosystem: the inherent inconsistency of language models when interpreting natural language instructions, even optimized ones.[2][5]

To solve this, MiMo Code introduces a novel translation mechanism for `SKILL.md` files. When an agent loads a natural language skill document, MiMo Code automatically generates deterministic JavaScript code based on those instructions. By converting the procedural steps from probabilistic text into hard-coded logic, the framework ensures that the agent executes the workflow exactly the same way every single time. This hybrid approach—using natural language for discovery and flexibility, but compiling it into code for execution—helped MiMo Code achieve a 62% score on the SWE-Bench Pro evaluation, outperforming Anthropic's Claude Code by roughly five percentage points using the same base model.[9]

Xiaomi’s framework also addresses the second major hurdle of long-horizon agent tasks: context window amnesia. Conventional coding assistants rely entirely on the model's active context window; once a project exceeds that token limit, the agent begins forgetting early architectural decisions or tool policies. MiMo Code circumvents this by deploying a dedicated background subagent. As the primary agent works through a 200-step software build, the subagent continuously monitors the context window. When the token limit approaches, the subagent automatically condenses previous interactions into structured summaries, storing them in a persistent memory repository while allowing the main agent to continue uninterrupted.[5][10]

Xiaomi's MiMo Code uses a background subagent to continuously summarize context, preventing the AI from forgetting early instructions.

To maintain the integrity of this persistent memory over weeks or months of enterprise development, Xiaomi introduced an automated maintenance feature called "/dream". Running automatically every seven days, this specialized maintenance agent reviews all stored session data. It removes duplicate memories, verifies that referenced file paths still exist, and compresses the stored information into an updated, highly efficient long-term memory repository. This ensures that an enterprise coding agent can maintain accurate context on a massive codebase indefinitely, without the memory bank degrading into a disorganized pile of outdated summaries.[10]

Together, Microsoft’s SkillOpt and Xiaomi’s MiMo Code represent a maturation of how businesses deploy artificial intelligence. The era of treating large language models as magical black boxes that simply need the right "prompt" is ending. Instead, the industry is recognizing that the "skill layer"—the external, version-controlled repository of procedural knowledge—is the highest-leverage optimization target in the AI stack. By applying the rigorous testing, validation, and compilation frameworks of traditional software engineering to natural language documents, companies can build AI systems that are both highly capable and strictly reliable.[1][3]

As these open-source tools gain traction, the barrier to entry for building industrial-grade AI agents is plummeting. A small development team can now download SkillOpt, run a few epochs of text-space optimization on a cheap virtual private server, and deploy a custom agent that outperforms a generalized frontier model. With the skill layer becoming portable, mathematically optimized, and deterministically executed, enterprise AI is finally moving out of the experimental sandbox and into the realm of robust, scalable infrastructure.[3][4]

How we got here

2024-2025
Anthropic and Vercel pioneer the open SKILL.md format, allowing AI agents to load instructions on demand.
May 2026
Microsoft Research releases SkillOpt, the first text-space optimizer for agent skills.
June 11, 2026
Xiaomi open-sources MiMo Code V0.1.0, introducing persistent memory and JavaScript translation for skills.
June 2026
Enterprise adoption accelerates as developers shift from fine-tuning models to optimizing the external skill layer.

Viewpoints in depth

Enterprise AI Architects

Focuses on the cost savings and portability of optimizing the skill layer rather than fine-tuning base models.

For enterprise architects, the primary appeal of text-space optimization is the decoupling of specialized workflows from proprietary model weights. Historically, if a company wanted an AI to strictly follow a proprietary legal review process, they had to pay a premium to fine-tune a model via an API, effectively locking themselves into that specific provider. By moving the optimization to the external 'skill layer,' architects argue they can now switch underlying LLM providers—from OpenAI to Anthropic to Google—without losing their custom, mathematically refined workflows. Furthermore, because the optimization happens entirely offline, the production agent incurs zero additional inference costs or latency penalties.

Open-Source Developers

Values the democratization of agentic AI, allowing small teams to build industrial-grade agents on cheap hardware.

The open-source community views the release of MIT-licensed tools like SkillOpt and MiMo Code as a massive democratizing force. Previously, achieving state-of-the-art reliability on complex procedural tasks required massive compute budgets that only tech giants could afford. Now, a solo developer or small startup can download these frameworks, run a few epochs of text-space optimization on an affordable virtual private server, and deploy a custom agent that outperforms generalized frontier models. Developers particularly praise MiMo Code's translation of Markdown into deterministic JavaScript, noting that it solves the inherent probabilistic unreliability of language models without requiring complex backend engineering.

Model Providers

Acknowledges the commoditization risk as the skill layer abstracts away the base model.

For companies that build and serve foundational models, the rise of the optimized skill layer presents a strategic challenge. As frameworks like SkillOpt prove that the highest-leverage performance gains come from refining external text documents rather than internal model weights, the base models themselves risk becoming interchangeable, commoditized utilities. If a highly optimized `SKILL.md` file can make a cheaper, open-source model perform just as well as a premium proprietary model on a specific enterprise task, the value in the AI stack shifts dramatically away from the model providers and toward the orchestration and skill-management platforms.

What we don't know

How proprietary model providers like OpenAI and Google will respond to an open-source ecosystem that commoditizes their base models by shifting value to the skill layer.
Whether text-space optimization will eventually hit a performance ceiling compared to traditional weight-based fine-tuning on highly specialized, non-procedural tasks.
How the automated '/dream' memory compression in MiMo Code will perform over multi-year enterprise projects with massive, evolving codebases.

Key terms

Agent Skill: A self-contained folder containing natural language instructions and resources that extend an AI agent's capabilities without altering its underlying model weights.
Progressive Disclosure: A context-management technique where an AI agent only loads the full instructions of a skill when a specific task requires it, saving memory and compute.
Text-Space Optimization: The process of mathematically refining plain text instructions (like a Markdown file) using deep learning techniques, rather than adjusting a model's numerical weights.
Context Window: The maximum amount of text (measured in tokens) that an AI model can process and "remember" at any given moment during a session.

Frequently asked

Do I need to fine-tune my AI model to use SkillOpt?

No. SkillOpt optimizes the text-based instruction files (skills) that the model reads, requiring zero changes to the model's actual weights.

Does optimizing skills increase the cost of running the AI?

No. Because the optimization happens beforehand, the production agent simply reads a better-written text file, adding zero inference cost or latency.

How does Xiaomi's MiMo Code prevent the AI from forgetting long tasks?

It uses a dedicated background subagent that continuously summarizes the conversation and stores it in a persistent memory repository, bypassing standard context window limits.

Can I use a skill optimized for OpenAI's models on Anthropic's Claude?

Yes. Because skills are written in natural language and stored externally, they are highly portable and often transfer seamlessly between different frontier models.

Sources

[1]VentureBeatEnterprise AI Architects
Microsoft’s open-source SkillOpt automatically upgrades AI agent skills without touching model weights
Read on VentureBeat →
[2]VentureBeatEnterprise AI Architects
Xiaomi's new open source, agentic AI coding harness MiMo Code beats Claude Code at ultra-long, 200+ step tasks
Read on VentureBeat →
[3]FlowtivityAI Researchers
Microsoft SkillOpt: How to Train AI Agent Skills Like Neural Networks
Read on Flowtivity →
[4]ExplainXAI Researchers
Microsoft SkillOpt: Self-Improving Agent Skills Guide 2026
Read on ExplainX →
[5]GizmochinaOpen-Source Developers
Xiaomi's MiMo Code is a free, open-source terminal AI coding agent
Read on Gizmochina →
[6]AgentSkillsEnterprise AI Architects
What are Agent Skills?
Read on AgentSkills →
[7]SpringEnterprise AI Architects
Spring AI Agentic Patterns: Agent Skills - Modular, Reusable Capabilities
Read on Spring →
[8]ToKnow AIAI Researchers
Microsoft's SkillOpt optimizes natural-language skill documents
Read on ToKnow AI →
[9]GigazineOpen-Source Developers
Xiaomi releases AI agent tool 'MiMo Code'
Read on Gigazine →
[10]Open Source For UOpen-Source Developers
Xiaomi Open-Sources MiMo Code V0.1.0
Read on Open Source For U →

Up next

Right to Repair

The Evidence Pack: Do 'Right to Repair' Mandates Actually Reduce E-Waste and Save Money?

As sweeping Right to Repair laws take effect globally in 2026, we examine the data behind the claims of consumer savings, environmental impact, and manufacturer safety concerns.

Every angle. Every day.

Get technology stories with full source coverage and perspective breakdowns delivered to your inbox.

Get the briefing →Browse technology