Senior Staff Machine Learning Software Engineer
Location: San Diego, CA
Job Type: Full-Time
Salary Range: $202K โ 215K
Sr. Staff ML/Software Engineer
We are building a product where learned models and compute-heavy inference components have to run inside a tight local runtime budget. Research code is only the starting point. This role owns the path from a working prototype to production inference that is measured, packaged, tested, and ready for repeated use in the field.
You will work closely with the people developing the underlying algorithms, but your ownership is different: production readiness, performance, reliability, and the engineering boundary between exploratory model work and shipped execution. The strongest fit is someone who can explain the bottleneck they found, the number they moved, the tradeoff they accepted, and the test that kept the fix from regressing.
If your best work is making inference faster, smaller, more predictable, and easier to ship, this role is likely a good match.
What You'll Own
Turning research prototypes into production inference components with explicit latency, throughput, memory, and accuracy budgets
Optimizing the execution path: tensor layout, host/device transfers, batching strategy, kernel launch overhead, mixed precision, quantization, and memory reuse
Writing or tuning Rust, C++, and CUDA where framework-level optimization is not enough, then validating the improvement with profiler output and release-facing tests
Building inference-adjacent evaluation machinery: calibration checks, confidence behavior, regression detection, dataset slices, and failure-mode reporting tied to product metrics
Maintaining the deployment contract: model artifacts, runtime integration, versioning, reproducibility, and performance gates that block unsafe changes
ยท Algorithm research and novel model design live on a separate track. You will collaborate with that team, translate prototypes into production constraints, and surface shipping risks early when a design needs to change.
Education and Experience
A PhD (6+ years), MS (10+ years) or BS/BA (12+ years) of experience in life sciences or technology.
Must have demonstrated leadership or ownership with 2 of the 5 areas referenced below successfully:
Shipped constrained inference. You have personally moved a model or learned component from prototype to deployed runtime with a real latency, throughput, memory, or power budget. You can name the target, the bottleneck, and the change that closed the gap.
Rust/C++ at shipping depth. You have written production code in Rust or modern C++ where correctness, latency, memory layout, and ownership boundaries mattered. You can reason about the runtime behavior of the code you ship, not just its API surface.
CUDA and accelerator-aware execution. You are comfortable below Python: custom CUDA extensions or kernels, host/device memory movement, launch overhead, profiler traces, and the practical tradeoffs between framework convenience and a purpose-built implementation.
Performance-native judgment. You reason in wall-clock time, memory movement, launch overhead, bandwidth, numerical precision, and error budgets without needing those constraints added late in review.
Production engineering discipline. You define typed interfaces, deterministic behavior, reproducible artifacts, meaningful tests, and clean handoffs with upstream research code.
Strongly Preferred
Rust at shipping depth, especially FFI boundaries, pyo3 / maturin, async runtimes, or performance-sensitive service code
Inference on constrained local hardware, embedded systems, edge devices, or budget-bound accelerator deployments
Quantization, mixed precision, model compression, or kernel fusion that shipped beyond a benchmark notebook
Calibration or confidence estimation used on production outputs, with monitoring or regression checks attached
Public or shareable evidence of engineering quality: code, technical writing, postmortems, talks, or a concrete shipped system you can discuss
Comfort using AI-assisted development tools while still owning correctness, tests, and review quality
Nice to Have
Real-time or near-real-time signal-processing systems
Products that combine learned models with deterministic numerical code
Rust- or C++-based inference or numerical pipelines, including custom FFI to CUDA, cuDNN, TensorRT, or similar accelerator libraries
We are an equal opportunity employer. We thrive on diversity and collaboration.
foresite-labs-fl2024-006