C

Engineering Manager, Inference ML Runtime

Cerebras

toronto, on, Canada Full-time May 20, 2026

Opportunity Description

Company Overview

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer‑scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry‑leading training and inference speeds and empowers machine learning users to effortlessly run large‑scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.

About the Role

The Inference ML Engineering team at Cerebras builds the runtime, APIs, and systems that power the fastest generative AI inference platform in the world.

As an Engineering Manager, Inference ML Runtime, you will lead a team responsible for designing and scaling the systems that enable seamless execution of state‑of‑the‑art AI models on Cerebras hardware. You will operate at the intersection of machine learning, distributed systems, and high‑performance runtime en...

Full-time Other-General

Interested in this opportunity? Apply now through Expertini.

Apply for this Position