OpenAI has entered a $10 billion, multi-year compute partnership with Cerebras, strengthening its infrastructure to support real-time AI systems at global scale. The deal secures 750MW of specialised compute capacity through 2028, adding a high-performance, low-latency layer to OpenAI’s growing compute stack.
As AI use cases shift toward live conversations, real-time agents, and instant decision-making, inference speed has become critical. Cerebras’ architecture is purpose-built for low-latency inference, enabling faster responses and more natural human–AI interactions. This makes it a strong complement to OpenAI’s existing infrastructure, which already spans multiple vendors and system types.
The partnership also reflects OpenAI’s deliberate move toward a resilient, multi-vendor compute strategy. Rather than relying on a single provider, OpenAI is aligning specific workloads with the most suitable hardware, improving performance while reducing infrastructure risk.
Cerebras CEO Andrew Feldman compared the shift to the early days of broadband, noting that just as faster internet transformed digital experiences, real-time inference will redefine how AI is used and perceived. The focus is no longer only on model size, but on responsiveness and reliability.
According to Sachin Katti of OpenAI, Cerebras adds a dedicated inference layer that enables quicker responses, smoother interactions, and a stronger foundation for scaling real-time AI to more users worldwide.
Overall, the deal signals a broader industry transition. The next phase of AI growth will be driven by speed, resilience, and real-time capability, and OpenAI’s partnership with Cerebras positions it firmly at the centre of that shift.

