Get to Know the Team
You will join a team building production robotics and autonomy systems for urban environments across Southeast Asia. We develop perception, planning, and control capabilities step by step, using safety evidence to guide each release. The team focuses on robust systems, quality code, and technical depth maintained in-house. You'll work with a senior engineering group that values clean interfaces and reproducible development workflows.
Get to Know the Role
As a Lead Data Engineer, you'll report into the Head of Engineering and be based at Grab One North Singapore office. You'll lead the data pipeline that transforms raw vehicle logs into training-ready datasets for autonomy and simulation engineers. You'll build systems to ingest, validate, transform, and version multimodal data from cameras, lidar, radar, and vehicle telemetry. Your work enables machine learning engineers to train models reproducibly and at scale.
The Critical Tasks You will Perform
- You'll design and maintain ingestion pipelines that transfer vehicle log data from onboard storage to AWS, including coordination of physical SSD offload, upload tooling, and data integrity verification.
- You'll build automated workflows to validate, synchronize, and transform raw multimodal logs into structured training datasets, detecting corruption and ensuring schema consistency throughout the pipeline.
- You'll develop systems for dataset versioning and lineage tracking that allow ML engineers to reproduce training runs and trace model inputs back to specific vehicle logs.
- You'll collaborate with perception, planning, and simulation engineers to define data schemas, synchronization requirements, and quality gates that determine when data is ready for training.
- You'll maintain AWS-based storage and compute infrastructure for petabyte-scale autonomy datasets, implementing monitoring and fault-tolerant recovery mechanisms for pipeline failures.
- You'll optimize data pipelines for throughput and cost efficiency, implementing storage tiering, compression strategies, and compute scaling policies to manage infrastructure expenses.