Role Overview:
We are hiring a highly motivated Lead Machine Learning Engineer to build and scale production ML systems across text and image modalities. This is a hands-on individual contributor role for someone who can independently design and ship robust inference backends, automate training and deployment workflows, and improve model performance across both traditional ML and modern deep learning systems.
You will work to productionize models ranging from LLMs, transformers, embeddings, retrieval systems, and classical ML models (such as XGBoost). This role will balance focus between scaling inference backends and training/deployment automation. We are looking for someone who is comfortable operating with a high degree of autonomy, mentoring other engineers, and making strong technical decisions in a fast-moving environment.
Key Responsibilities:
Design, build, and scale the infrastructure and pipelines for serving machine learning models for both online and batch inference across various modalities/workloads. Technologies include LLMs, vision models, embedding models, reranking models, and other classical ML models across dataset size of terabyte and petabyte scale.
Build reliable, production-grade services and APIs for serving models for both internal and external products.
Automate training, evaluation, deployment, rollback, monitoring, and retraining workflows.
Improve latency, throughput, reliability, and cost efficiency of inference systems.
Profile and optimize model execution utilizing batching, caching, parallelism, quantization, and architecture-aware improvements.
Improve the engineering rigor and quality through testing, CI/CD, observability, reproducibility, and incident response.
Collaborate with product, platform, and software teams to turn ambiguous business problems into production ML systems.