ML Efficiency Engineer

Role Overview

As an ML Efficiency Engineer, you’ll be the engine that accelerates how quickly we can push the boundaries of LLM capabilities. Your work will make cutting-edge research move faster -dramatically increasing the pace at which we can iterate, test hypotheses, and discover breakthroughs. You’ll empower both our Applied Scientists & customers to explore more ideas in the same amount of time & unlocking progress that simply isn’t possible e.g. super long horizon agents, etc.

In this role, you’ll build and optimize the tooling, pipelines, and systems that make large-scale experimentation seamless. You’ll improve training and inference performance, streamline data and model workflows, and identify bottlenecks that slow iteration. Every efficiency gain - whether in cluster utilization, memory usage, parallelization, or algorithmic throughput - translates directly into more experiments, richer exploration, and faster discovery.

Ideal Profile

Deep Systems & Optimization Expertise. You have hands-on experience optimizing large-scale ML workloads across training and inference. You’re comfortable profiling GPU/TPU performance, identifying bottlenecks, and writing high-performance code.
Strong Foundations in ML Compute. You understand how transformers work at a systems level—attention mechanisms, parallelism strategies, quantization, distillation, and memory layouts. You can reason about trade-offs across precision, throughput, and accuracy.
Experience with Distributed Training. You’ve worked with frameworks like DeepSpeed, Megatron-LM, FSDP, XLA, or custom parallelism strategies. You know how to scale models across nodes efficiently and debug tricky distributed failures.
Bonus: Shipped LLM fine-tuning at scale. Experience with KL divergence management, reward hacking, mode collapse. Optimized for GPU efficiency or multi-node setups.