On-Device AIExplainerJun 14, 2026, 3:57 PM· 3 min read· #5 of 5 in ai

Apple Replaces Core ML with Core AI, Bringing Free On-Device LLMs to Developers

At WWDC 2026, Apple introduced Core AI, a new framework that allows developers to run Large Language Models directly on iPhones and Macs. The shift eliminates cloud API costs and guarantees user privacy by keeping data entirely on-device.

By Factlen Editorial Team

Share this story

App Developers 40%Open-Source AI Community 35%Privacy & Security Advocates 25%

App Developers: Focused on eliminating cloud API costs and building offline-capable AI features.
Open-Source AI Community: Focused on model conversion tools, PyTorch integration, and hardware optimization.
Privacy & Security Advocates: Focused on the data sovereignty benefits of keeping AI inference strictly on-device.

What's not represented

· Cloud AI Providers losing API revenue
· Android ecosystem developers

Why this matters

For years, adding AI to an app meant paying a cloud provider for every user interaction, making scale prohibitively expensive. Core AI shifts that compute to the user's device, democratizing AI development by making it free, private, and capable of running entirely offline.

Key points

Apple introduced Core AI at WWDC 2026, replacing the nine-year-old Core ML framework.
The new framework allows developers to run Large Language Models directly on Apple Silicon.
Local execution eliminates cloud API costs, network latency, and server dependencies.
A new toolchain, including coreai-torch, lets developers easily convert open-source PyTorch models.
Apple's Foundation Models framework provides a hybrid fallback to Private Cloud Compute for heavy tasks.
The shift guarantees user privacy by ensuring sensitive data never leaves the physical device.

9 years

Age of the outgoing Core ML framework

38 trillion

Operations per second on current Neural Engine

20 billion

Parameters in Apple's sparse on-device model

Per-token cost for local Core AI inference

The artificial intelligence industry has a fundamental scaling problem: every time a user prompts an app, a server somewhere burns expensive GPU compute, and the developer pays the bill. For independent creators and startups, building a highly engaging AI feature often means risking bankruptcy by API fees as usage grows.[3]

At its Worldwide Developers Conference (WWDC) in June 2026, Apple introduced a structural solution to this economic trap. The company unveiled Core AI, a ground-up rewrite of its machine learning stack designed to run large language models (LLMs) and generative AI entirely on-device.[1][2][3]

Core AI officially replaces Core ML, the framework Apple introduced nine years ago in 2017. While Core ML was revolutionary for its time, it was built for traditional, deterministic prediction tasks like image classification and object detection. It struggled to handle the autoregressive token generation, streaming responses, and multi-turn sessions required by modern generative AI.[4]

On-device AI shifts the compute burden from expensive cloud servers to the user's local hardware.

The new framework is purpose-built for Apple Silicon, orchestrating heavy AI workloads across the CPU, GPU, and the Neural Engine without requiring developers to write complex, device-specific code. By keeping inference strictly local, Core AI eliminates server dependencies, network latency, and per-token API costs.[1][2]

"Apple is making the cloud optional," notes the developer community, pointing out that local execution shifts the compute cost from the developer's cloud bill to the user's existing hardware. A simple text summarization or code generation request now travels from the user directly to the Apple Silicon chip, rather than making a round-trip to a remote server farm.[3][8]

Beyond economics, the shift fundamentally alters the privacy landscape. Because the model runs exactly where the data lives, applications can process highly sensitive information—like health records, personal messages, or financial receipts—without ever transmitting it over the internet, side-stepping massive regulatory and security hurdles.[1][7]

Beyond economics, the shift fundamentally alters the privacy landscape.

To make this ecosystem accessible, Apple is not forcing developers to exclusively use its proprietary models. Core AI includes a Python package called `coreai-torch`, which acts as a direct bridge to the open-source PyTorch ecosystem. Developers can convert existing open-source models into a new `.aimodel` format with just a few lines of code.[2][7]

Developers can convert open-source PyTorch models into Apple's native format using the new coreai-torch bridge.

Once converted, these models benefit deeply from Apple's unified memory architecture. Core AI utilizes zero-copy data paths and Metal 4 kernels optimized specifically for transformer architectures, allowing multi-billion parameter models to run efficiently even on the constrained thermal and battery budgets of an iPhone or iPad.[2][8]

Apple also detailed its own on-device AI architecture, revealing a 20-billion-parameter sparse model. This model uses a lazy-loaded Mixture of Experts (MoE) design, where expert selection happens per-prompt rather than per-token, minimizing the heavy data movement from storage to active memory that usually bottlenecks mobile AI.[8]

For tasks that exceed local hardware limits, Apple introduced a seamless hybrid fallback. The updated Foundation Models framework provides a unified Swift API that can automatically route complex requests to Apple's Private Cloud Compute servers, or even to third-party providers like Anthropic and Google, using a standardized Language Model protocol.[1][5]

Local inference eliminates per-token API costs and network latency.

To further entice adoption, Apple announced that developers in its Small Business Program—those with fewer than two million App Store downloads—will receive free access to Private Cloud Compute inference. This move directly challenges the API pricing models of established cloud AI giants by commoditizing server-side generation for smaller apps.[5][9]

Ultimately, Core AI signals a major strategic pivot for the tech giant. Rather than trying to be the sole provider of AI intelligence, Apple is positioning its 2.5 billion active devices as the premier arena for edge computing, giving developers the tools to build fast, private, and economically sustainable AI applications.[4][8]

How we got here

June 2017
Apple introduces Core ML, bringing traditional machine learning capabilities to iOS devices.
June 2024
Apple announces Apple Intelligence, beginning the integration of generative AI into its operating systems.
June 2025
The Foundation Models framework is released, giving developers limited access to Apple's on-device AI.
June 2026
Apple unveils Core AI at WWDC, officially replacing Core ML and opening native on-device LLM execution to all developers.

Viewpoints in depth

App Developers

Independent creators and startups focused on the economic and offline benefits of local AI.

For developers, the primary appeal of Core AI is margin protection. Cloud-based AI features incur a cost every time a user interacts with them, creating a penalty for high engagement. By shifting inference to the user's device, developers can offer unlimited AI features without bleeding API fees. Furthermore, local execution allows apps to function perfectly in offline environments like airplanes or remote locations, turning AI from a fragile cloud dependency into a robust native feature.

Open-Source AI Community

Machine learning engineers focused on model conversion, hardware optimization, and ecosystem interoperability.

The open-source community views Core AI as a critical bridge between Apple's walled garden and the broader AI ecosystem. Tools like `coreai-torch` allow engineers to take models trained in PyTorch—the industry standard—and deploy them natively on Apple Silicon. However, this camp is closely watching the hardware constraints. While Apple's unified memory architecture provides massive bandwidth, running multi-billion parameter models on mobile devices still requires aggressive quantization and careful memory management to avoid draining the battery.

Privacy Advocates

Security researchers and privacy focused users who prioritize data sovereignty.

Privacy advocates celebrate the shift toward edge computing as the only sustainable way to integrate AI into deeply personal software. When an LLM summarizes a user's medical records, financial receipts, or private messages, sending that data to a third-party server introduces massive security risks. Core AI ensures that sensitive context never leaves the physical device. This architectural guarantee allows developers to build highly personalized AI assistants without running afoul of strict data protection regulations.

What we don't know

How severely third-party models will drain the battery on older, non-Pro iPhone models.
Whether the open-source community will embrace the .aimodel format as enthusiastically as existing formats like GGUF.
How cloud AI providers will adjust their pricing models as developers shift inference to the edge.

Key terms

Core AI: Apple's 2026 framework for running generative AI and large language models directly on Apple devices.
Core ML: Apple's legacy machine learning framework, introduced in 2017, primarily used for traditional prediction and classification tasks.
Small Language Model (SLM): A compact version of an AI language model designed to run efficiently on consumer hardware like phones and laptops.
Quantization: A technique that reduces the precision of an AI model's numbers, shrinking its file size and memory usage so it can run on mobile devices.
Unified Memory: Apple's hardware architecture where the CPU, GPU, and Neural Engine share the same pool of memory, drastically speeding up AI processing.
Autoregressive Generation: The process by which an AI model generates text one word (or token) at a time, predicting the next word based on the previous ones.

Frequently asked

What is the difference between Core ML and Core AI?

Core ML was designed in 2017 for traditional machine learning tasks like image classification. Core AI is a modern replacement built specifically for generative AI, large language models, and autoregressive token generation.

Do developers have to pay to use Core AI?

No. Core AI runs models locally on the user's device, meaning there are no cloud server costs or per-token API fees for the developer.

Can I run any AI model on my iPhone?

Developers can convert many open-source PyTorch models using the `coreai-torch` tool, but the model must be small enough to fit within the device's available unified memory and thermal constraints.

What happens if a task is too complex for the local device?

Core AI integrates with Apple's Foundation Models framework, which can automatically route heavy requests to Apple's Private Cloud Compute servers or third-party cloud models when local hardware is insufficient.

Sources

[1]Apple DeveloperPrivacy & Security Advocates
What's new in AI & machine learning - WWDC26
Read on Apple Developer →
[2]AppcircleApp Developers
WWDC26: Apple's Core AI Framework Explained
Read on Appcircle →
[3]Code CoupApp Developers
Apple Just Made On-Device AI a Reality With Core AI
Read on Code Coup →
[4]byteiotaApp Developers
Apple Core AI Replaces Core ML in iOS 27: Act Now
Read on byteiota →
[5]CallstackOpen-Source AI Community
On-device AI after WWDC 2026: What's new?
Read on Callstack →
[6]AI CERTs NewsPrivacy & Security Advocates
Apple Unveils AI Development Frameworks for On-Device Apps
Read on AI CERTs News →
[7]Simon Willison's WeblogPrivacy & Security Advocates
Siri AI at WWDC 2026
Read on Simon Willison's Weblog →
[8]r/LocalLLaMAOpen-Source AI Community
Apple announced new on device inference engine for Apple Silicon
Read on r/LocalLLaMA →
[9]The Core TLDRPrivacy & Security Advocates
Apple introduces new AI APIs and smart tools to boost app development
Read on The Core TLDR →

Up next

On-Device AI

Small Language Models and On-Device AI: How Artificial Intelligence is Moving to Your Pocket

As massive cloud-based AI models face privacy and cost concerns, the tech industry is pivoting toward Small Language Models (SLMs). These highly efficient, compact systems run directly on smartphones and laptops, offering offline access and strict data security.

Every angle. Every day.

Get ai stories with full source coverage and perspective breakdowns delivered to your inbox.

Get the briefing →Browse ai