As a Senior Machine learning Engineer in the Perception team, you will play a pivotal role in building and maintaining the backbone of our L2+ ADAS stack. This senior role calls for an experienced engineer who can think critically, execute independently, and deliver results on scalable deep learning infrastructure, optimize massive data ingestion pipelines, and ensure maximum efficiency across our compute clusters.
You will be responsible for the entire DL infrastructure lifecycle鈥攆rom managing Azure storage and hybrid Kubernetes clusters to designing efficient data loaders for multimodal training. You will work at the intersection of infrastructure, data engineering, and deep learning, enabling feature teams to train complex models (single frame, temporal, and multimodal) with speed and reliability. Your ability to solve abstract infrastructure challenges and apply "T-shaped" expertise鈥攇oing deep in areas like infrastructure, multitask deep learning among others while maintaining breadth in software design鈥攚ill be key to our success.
Responsibilities
Deep Learning Infrastructure & Compute:
鈼廙anage and optimize the entire DL infrastructure, including Azure Blob Storage integration, VNET setups, and hybrid compute resources (Cloud and On-premise/Frankfurt clusters).
鈼廘ead performance investigations and benchmarking for next-gen hardware (e.g., comparing H200 vs. H100, Azure native vs. deployment nodes) to ensure cost and speed efficiency.
鈼廙aintain and scale Kubernetes clusters for training and inference workloads.
Data Pipelines & Efficient Loading:
鈼廇rchitect and develop high-performance data loaders for complex multimodal datasets (camera, radar, temporal/non-temporal data).
鈼廙odernize data processing pipelines using Ray and Kubernetes to parallelize data caching, shuffling, and oversampling.
鈼廘everage PyArrow and SQL to optimize data consumption and integration with data loops.
鈼廔mplement efficient dataset update strategies (handling deltas) and ensure seamless integration of new tasks into the multimodal multi-task network.
CI/CD, Monitoring & Quality:
鈼廌esign and maintain robust GitHub Workflows and CI pipelines for new and existing feature teams.
鈼廌evelop KPI dashboards using Grafana to monitor compute usage, GPU efficiency, unit test durations, and overall system health.
鈼廙anage dependency updates (Torch upgrades, Ubuntu updates, Hydra maintenance, Dependabots) to ensure a secure and modern stack.
鈼廌rive software design excellence by performing thorough code reviews (PRs) and enforcing high standards in software architecture.
Embedded & Evaluation:
鈼廍stablish scalable evaluation pipelines for embedded targets, specifically for QNN boards and other edge devices.
鈼廋ollaborate with feature teams to support model compression experiments and on-target performance verification.
Required Qualifications
鈼廍ducation: Bachelor鈥檚 degree in Computer Science, Electrical Engineering, or a related field. An advanced degree is an advantage.
鈼廍xperience: 5+ years of industry experience in MLOps, Data Engineering, or Software Infrastructure, with a focus on Deep Learning systems.
鈼廝rogramming & Software Design: Expert-level proficiency in Python with a strong emphasis on clean software design, object-oriented programming, and architectural patterns.
鈼廔nfrastructure & Orchestration: Deep hands-on experience with Kubernetes, Docker, and Cloud platforms (specifically Azure ML, Azure Storage/Networking).
鈼廈ig Data & Optimization: Proficiency with high-performance data processing tools such as Ray, PyArrow, and SQL. Experience optimizing data loading bottlenecks for
GPU training.
鈼廌evOps & Monitoring: Experience setting up complex CI/CD pipelines (GitHub Actions) and observability stacks (Grafana, Prometheus).
鈼廠oft Skills: Strong problem-solving abilities, proactiveness, and ownership of complex topics. Ability to adapt quickly to new technologies and work collaboratively in a supportive, high-performance team.