Tech

OpenAI and Broadcom unveil a custom chip built for AI inference at scale

Ars Technica2 h ago
Close-up of a computer chip on a circuit board
Close-up of a computer chip on a circuit boardPhoto: Jakub Pabis / Pexels

OpenAI and Broadcom have announced a custom chip designed specifically to run large language models at scale, according to Ars Technica, marking one of the most significant moves yet by an AI developer to build its own silicon. The collaboration pairs OpenAI's knowledge of how its models behave with Broadcom's experience designing and manufacturing high-end semiconductors.

The chip is aimed squarely at inference, the term for the work a model does when it generates a response to a user, as opposed to training, the resource-intensive process of building the model in the first place. As AI services reach hundreds of millions of users, inference has become the dominant ongoing cost, and the economics of serving each query now shape the entire business.

That distinction explains the strategy. A chip optimised purely for inference can strip away features needed for training and concentrate transistors on the specific operations that running a model requires. In principle, that focus can deliver more performance per watt and per dollar than a general-purpose processor handling the same task.

The move also speaks to the industry's dependence on Nvidia, whose graphics processors have powered the bulk of the AI boom. That reliance has left major AI companies exposed to supply constraints and high prices, and several have responded by designing their own accelerators to gain leverage and predictability over their most important input.

Broadcom's role is central. The company is not a household name in consumer technology, but it is a powerhouse in custom chip design, helping large customers turn their requirements into working silicon. Partnering with Broadcom lets OpenAI pursue bespoke hardware without building a semiconductor operation from scratch.

Designing a chip is only part of the challenge. Custom silicon needs a software stack that lets models actually run on it efficiently, and much of the industry's tooling has grown up around Nvidia's ecosystem. Making an in-house chip competitive means investing heavily in the compilers and libraries that translate AI workloads onto the new hardware.

The potential payoff is substantial. If OpenAI can run its models on chips tailored to its exact needs, it could lower the cost of every interaction, ease its dependence on a single supplier, and gain the freedom to optimise hardware and models together. Those advantages compound at the scale OpenAI now operates.

There are risks as well. Cutting-edge chip projects are expensive and slow, and the field moves quickly enough that a design can be overtaken before it ships in volume. Custom silicon also locks in particular assumptions about how models work, which could become a constraint if the underlying technology shifts.

The announcement fits a broader pattern across big technology firms, several of which have built their own AI accelerators rather than relying entirely on off-the-shelf parts. The trend reflects a recognition that, at sufficient scale, controlling the hardware becomes as strategically important as the models themselves.

For the wider market, the Ars Technica report suggests, the significance lies less in any single specification than in the direction of travel. As inference costs dominate the economics of AI, the companies that run the largest services are increasingly determined to design the chips those services depend on, reshaping the balance of power in the semiconductor industry.

This article is an AI-curated summary based on Ars Technica. The illustration is a stock photo by Jakub Pabis from Pexels.

Read next