AI HardwareIndustry ShiftJun 25, 2026, 1:35 PM· 6 min read· #3 of 5 in ai

OpenAI and Broadcom Unveil 'Jalapeño' Custom AI Chip to Slash Inference Costs

OpenAI has partnered with Broadcom to reveal its first custom silicon processor, an inference-focused chip designed to cut the cost of running AI models by 50%. Developed in a record nine months with the help of AI, the hardware signals a major shift toward vertical integration in the AI industry.

By Factlen Editorial Team

Share this story

OpenAI and Hardware Partners 40%Enterprise AI Consumers 35%Semiconductor Industry Analysts 25%

OpenAI and Hardware Partners: Focuses on vertical integration and the necessity of custom silicon to scale AI sustainably.
Enterprise AI Consumers: Prioritizes the reduction of 'tokenomics' costs to enable large-scale AI deployment.
Semiconductor Industry Analysts: Analyzes the shift in market power away from a single GPU monopoly toward diversified ASIC ecosystems.

What's not represented

· Environmental Advocates
· Smaller AI Startups

Why this matters

As AI models become more complex, the cost of generating responses—known as inference—has skyrocketed. By slashing these costs in half, OpenAI's custom hardware paves the way for cheaper AI tools for businesses and more capable autonomous agents for everyday users.

Key points

OpenAI and Broadcom have co-developed 'Jalapeño,' a custom AI inference chip designed to run large language models.
Early testing shows the new processor reduces inference costs by roughly 50 percent compared to standard AI GPUs.
The chip moved from initial design to manufacturing readiness in an unprecedented nine months, aided by OpenAI's own models.
Jalapeño is built specifically for inference—serving responses to users—rather than the computationally heavier training phase.
Initial deployment is scheduled for late 2026 in gigawatt-scale data centers operated by Microsoft and other partners.

50%

Estimated reduction in inference costs

9 months

Development time from design to tape-out

1.3 gigawatts

Broadcom's baseline 2027 AI chip deployment target

2026

Target year for initial data center deployment

On Wednesday, OpenAI and semiconductor giant Broadcom officially unveiled "Jalapeño," a custom-built artificial intelligence processor designed to fundamentally alter the economics of running large language models. The highly anticipated silicon represents OpenAI’s first major foray into proprietary hardware, marking a strategic pivot from relying entirely on off-the-shelf graphics processing units (GPUs) to designing its own vertical stack. Early engineering samples of the chip are already operating in OpenAI’s labs, running advanced machine learning workloads like the GPT-5.3-Codex-Spark model at target power and performance levels.[1][2]

The primary objective of the Jalapeño architecture is dramatic cost reduction. According to Broadcom CEO Hock Tan, early testing indicates that the new accelerator delivers cost savings of roughly 50 percent compared to the typical AI GPUs currently dominating the market. For a company that processes billions of queries daily across ChatGPT, its API, and enterprise deployments, halving the cost of compute could translate to billions of dollars in saved operational expenditures, while simultaneously allowing the company to lower prices for developers.[1][4]

To understand Jalapeño’s significance, it is crucial to distinguish between the two phases of artificial intelligence compute: training and inference. Training is the computationally massive, months-long process of teaching a model how to understand language and patterns, a domain still overwhelmingly ruled by Nvidia’s versatile GPUs. Inference, however, is the everyday execution phase—the split-second process of the finished model generating a response to a user’s prompt. Jalapeño is an Application-Specific Integrated Circuit (ASIC) purpose-built exclusively for this inference phase, trading the broad flexibility of a GPU for hyper-optimized efficiency in a single task.[1][2][5]

How purpose-built inference chips differ from traditional training hardware.

Richard Ho, the head of OpenAI’s hardware program, emphasized that Jalapeño was architected from the ground up to address the specific bottlenecks that plague large language model inference at scale. Traditional hardware often struggles with the costly movement of data between compute cores and memory banks. The new custom chip mitigates this by balancing compute, high-bandwidth memory (HBM), and networking resources to minimize data travel, thereby maximizing throughput while maintaining the low latency required for real-time conversational agents and autonomous AI workflows.[3][5]

Perhaps the most remarkable aspect of the Jalapeño project is its blistering development timeline. In the semiconductor industry, the cycle from initial schematics to a manufacturing-ready "tape-out" typically spans several years. OpenAI and Broadcom managed to compress this entire process into a mere nine months. This unprecedented speed was achieved through a deep software-hardware co-development strategy, but it also relied on a novel catalyst: artificial intelligence itself.[2][6]

OpenAI engineers actively utilized their own prior-generation AI models to accelerate complex parts of the chip’s design and optimization process. By deploying AI to assist in routing billions of microscopic transistors and simulating hardware performance, the team bypassed traditional engineering bottlenecks. This creates a powerful feedback loop: the very models served to users are now helping to design the physical infrastructure required to run the next generation of models even faster and more efficiently.[2][6]

OpenAI engineers actively utilized their own prior-generation AI models to accelerate complex parts of the chip’s design and optimization process.

The physical realization of the Jalapeño chip relies on a global coalition of hardware partners. The silicon itself is being fabricated by Taiwan Semiconductor Manufacturing Company (TSMC), the world’s leading foundry for advanced nodes. Once manufactured, Canadian electronics manufacturing services company Celestica will integrate the chips into specialized server boards and rack systems. Finally, the completed server racks are slated for initial deployment by the end of 2026 within massive data centers operated by Microsoft and other strategic infrastructure partners.[1][6]

For enterprise customers, the arrival of custom inference silicon cannot come soon enough. As businesses integrate generative AI deeper into their daily operations, they are encountering what industry analysts call "explosive cost switches" associated with tokenomics. Every word generated by an AI model consumes a fraction of a cent in compute power, known as a token. At enterprise scale, these token consumption costs are compounding rapidly, creating a financial barrier to deploying autonomous AI agents that require thousands of background reasoning steps.[3]

Early testing indicates the new custom silicon reduces inference costs by roughly 50 percent.

By drastically reducing the cost of inference, Jalapeño fundamentally changes the economics of converting electrical watts into AI tokens. This efficiency gain provides OpenAI with significant margin expansion. More importantly, if the enterprise AI market descends into a price war—a likely scenario given the aggressive moves by rivals like Anthropic and Google—OpenAI will have the financial bandwidth to slash its API pricing without destroying its own path to profitability.[3]

The scale of the planned deployment underscores the massive capital requirements of the generative AI era. Broadcom and OpenAI are targeting the rollout of gigawatt-scale data centers—facilities that consume as much electricity as a mid-sized city. Broadcom CEO Hock Tan recently noted that demand from OpenAI and other hyperscalers is so intense that his previous projection of deploying 1.3 gigawatts worth of custom AI chips next year may prove to be overly conservative.[1][4]

The custom chips are slated for deployment in gigawatt-scale data centers starting in late 2026.

Despite the launch of Jalapeño, OpenAI is not abandoning its relationship with Nvidia. The startup recently participated in a massive funding round that secured 10 gigawatts of computing systems from Nvidia, heavily utilizing the next-generation Vera Rubin platform for future model training. However, by offloading the high-volume, lower-margin inference workloads to its own custom Broadcom silicon, OpenAI is diversifying its supply chain and reducing its vulnerability to the pricing power of a single dominant hardware vendor.[2][4]

Jalapeño represents merely the opening salvo in a long-term hardware strategy. OpenAI and Broadcom have explicitly framed this release as the first step in a multi-generation compute platform. The companies have already established a roadmap for future chip iterations, with the next major architectural leap planned for 2028, followed by an annual release cadence. As AI models evolve from text generators into autonomous digital workers, the underlying silicon will continue to co-evolve, ensuring that the physical infrastructure can support the software's expanding ambitions.[1][6]

Ultimately, the push for custom silicon is about democratizing access to frontier intelligence. By controlling the full stack—from the foundational model weights down to the physical routing of data on the silicon die—OpenAI aims to make advanced AI more abundant and reliable. Whether it manifests as a faster response in a consumer app, a cheaper API call for a startup, or a more capable reasoning agent for medical researchers, driving down the cost of compute is the prerequisite for integrating artificial intelligence into the fabric of the global economy.[6]

How we got here

October 2025
OpenAI and Broadcom publicly announce their partnership to develop custom AI accelerators.
February 2026
OpenAI secures 10 gigawatts of computing systems, including massive Nvidia GPU clusters, in a $110 billion funding round.
June 24, 2026
OpenAI and Broadcom officially unveil the Jalapeño inference chip, showcasing early engineering samples.
Late 2026
Targeted initial deployment of Jalapeño chips in Microsoft-operated data centers.
2028
Planned release window for the second generation of the OpenAI-Broadcom custom silicon platform.

Viewpoints in depth

OpenAI and Hardware Partners

Focuses on vertical integration and the necessity of custom silicon to scale AI sustainably.

For the developers of frontier models, custom silicon is an existential requirement for long-term survival. OpenAI and Broadcom argue that relying on general-purpose GPUs for inference is fundamentally inefficient, as those chips carry architectural overhead designed for training. By co-designing the hardware and software, they claim they can push the silicon closer to its theoretical physical limits, ensuring that the exponential growth in AI usage does not result in an equally exponential growth in operational costs.

Enterprise AI Consumers

Prioritizes the reduction of 'tokenomics' costs to enable large-scale AI deployment.

Corporate IT departments and software developers view inference costs as the primary bottleneck to AI adoption. As applications move from simple chatbots to complex, multi-step autonomous agents, the number of tokens consumed per task skyrockets. This camp welcomes the Jalapeño chip not for its engineering marvels, but for its potential to trigger an industry-wide price war. Lower inference costs mean businesses can deploy more sophisticated AI tools across their operations without breaking their cloud computing budgets.

Semiconductor Industry Analysts

Analyzes the shift in market power away from a single GPU monopoly toward diversified ASIC ecosystems.

Market analysts view the OpenAI-Broadcom partnership as a critical inflection point in the semiconductor landscape. While Nvidia remains the undisputed king of AI training, the inference market is fragmenting. Analysts note that Broadcom's ability to deliver a custom chip in just nine months proves that hyperscalers can successfully bypass traditional GPU monopolies for specific workloads. This camp closely watches the multi-year roadmap, predicting that custom ASICs will eventually capture the majority of the total addressable market for AI inference.

What we don't know

The exact power consumption metrics and thermal requirements of the Jalapeño chip under real-world, gigawatt-scale loads.
How Nvidia will respond to its largest customers increasingly designing their own custom silicon for inference workloads.
Whether the 50 percent cost savings will be fully passed on to developers and end-users, or primarily retained by OpenAI to improve profit margins.

Key terms

Inference: The phase of artificial intelligence where a trained model processes new data to generate a response, prediction, or decision.
ASIC (Application-Specific Integrated Circuit): A microchip designed for a very specific use case—such as running a specific AI model—rather than for general-purpose computing.
Tape-out: The final stage of the chip design process where the completed schematic is sent to a manufacturing facility to be physically printed onto silicon.
Tokenomics: The economic model of charging for AI usage based on 'tokens,' which are fragments of words processed by the AI model.
Gigawatt-scale data center: A massive computing facility that consumes one billion watts of electricity, roughly equivalent to the power draw of a mid-sized city.

Frequently asked

What is the difference between AI training and AI inference?

Training is the computationally heavy process of teaching an AI model to understand patterns, while inference is the process of the finished model generating a response to a user's prompt. Jalapeño is built specifically for inference.

How much money will the Jalapeño chip save?

Early testing indicates the custom chip will reduce the cost of running AI inference by roughly 50 percent compared to standard graphics processing units (GPUs).

Did OpenAI stop working with Nvidia?

No. OpenAI still relies heavily on Nvidia GPUs for the massive compute required to train its frontier models, but is using Broadcom's custom chips to handle the daily operational costs of serving those models to users.

When will the new chips be deployed?

The initial deployment of Jalapeño chips in gigawatt-scale data centers operated by Microsoft and other partners is targeted for the end of 2026.

Sources

[1]QuartzSemiconductor Industry Analysts
OpenAI and Broadcom unveil their first custom AI chip
Read on Quartz →
[2]VentureBeatSemiconductor Industry Analysts
OpenAI unveils first custom AI inference chip, Jalapeño, with Broadcom
Read on VentureBeat →
[3]AI BusinessEnterprise AI Consumers
OpenAI, Broadcom Unveil LLM-Optimized Inference Chip
Read on AI Business →
[4]The Business TimesOpenAI and Hardware Partners
OpenAI, Broadcom unveil chip to run models faster, cheaper
Read on The Business Times →
[5]Tom's HardwareSemiconductor Industry Analysts
OpenAI and Broadcom introduce Jalapeño custom AI processor
Read on Tom's Hardware →
[6]OpenAIOpenAI and Hardware Partners
OpenAI and Broadcom introduce Jalapeño
Read on OpenAI →
[7]ITPEnterprise AI Consumers
OpenAI unveils custom AI inference chip, Jalapeño
Read on ITP →
[8]The StreetSemiconductor Industry Analysts
Broadcom and OpenAI unveil Jalapeño chip
Read on The Street →

Up next

Interpretability

Researchers Prove LLMs Are 'Perfect Recording Devices,' Ending the AI Black Box Era

Researchers have definitively cracked the AI 'black box' by proving that large language models store exact copies of training data. While the discovery triggers immediate privacy challenges, it unlocks the revolutionary ability to surgically delete sensitive information from AI systems without retraining them.

Every angle. Every day.

Get ai stories with full source coverage and perspective breakdowns delivered to your inbox.

Get the briefing →Browse ai