About the role
D-Wave is seeking a Senior DevOps Engineer to join our DevOps team in New Haven, reporting to the DevOps Engineering Manager. In this role, you will design, build, and operate hybrid infrastructure platforms spanning on-premises environments, Kubernetes clusters, cloud services, and CI/CD pipelines.
Your primary focus will be on developing and operating on-prem Kubernetes platforms, while supporting cloud environments such as AWS. You will also play a key role in advancing our observability capabilities, improving system visibility through logging, metrics, dashboards, and alerting.
Working closely with hardware, software, and DevOps teams, you will own systems end-to-end and drive improvements in how infrastructure is provisioned, automated, and maintained. This role is ideal for an engineer who enjoys working across the stack, solving complex operational challenges, and building reliable, scalable platforms.
What you'll do
- Design, build, and operate Kubernetes platforms, primarily on-prem, including cluster architecture, networking, storage, and lifecycle management
- Develop and maintain hybrid infrastructure across on-prem and cloud environments with a focus on reliability, scalability, and maintainability
- Design and optimize CI/CD pipelines (e.g., GitHub Actions) to enable automated, low-touch build and release processes
- Manage AWS infrastructure, including multi-account environments, IAM, networking, and shared services
- Implement and maintain infrastructure-as-code and automation solutions (e.g., Terraform, Ansible)
- Design and operate virtualized platforms and VM-based workloads
- Define and enforce standards for containerization, including image management, versioning, and security practices
- Build and support containerized applications running on Kubernetes (on-prem and cloud-based)
- Develop and mature observability platforms, including metrics, logging, dashboards, and alerting
- Design actionable monitoring and alerting systems to improve reliability and incident response
- Support infrastructure for hardware-integrated systems, including OS deployment and lifecycle management
- Improve operational efficiency by automating manual processes and enhancing system reliability
- Collaborate cross-functionally to ensure consistent practices across hybrid environments
- Participate in incident response and root-cause analysis, driving continuous improvement