Key Responsibilities
- Platform Architecture: Lead the design, architecture, and evolution of our centralized DevOps platform on AWS EKS, ensuring it is scalable, resilient, and cost-effective.
- CI/CD Strategy: Own and optimize our end-to-end continuous delivery strategy, advancing our use of Argo CD and GitLab CI to perfect the developer experience.
- Advanced Observability: Architect and manage our comprehensive monitoring stack (Prometheus, Grafana, Loki, Thanos), setting the standards for logging, metrics, and alerting across the organization.
- Security Automation: Design and maintain the security posture of our HashiCorp Vault cluster and automate secure secret injection across all applications and EKS clusters.
- Infrastructure as Code (IaC) Excellence: Champion IaC principles, establishing best practices and reusable Terraform modules to ensure consistency and standardization across all development tribes.
- Mentorship & Expertise: Mentor other engineers on the team and act as the ultimate subject matter expert on the platform, guiding development teams on complex integrations and best practices.
- Tooling Innovation: Evaluate, prototype, and implement new tools (e.g., Karpenter, KEDA, OpenCost) to enhance platform capabilities and efficiency.
Required Skills & Qualifications
- Experience: 5-12 years in DevOps/SRE/Platform Engineering, with a proven track record of designing and building large-scale cloud platforms.
- Cloud & Containers: Expert-level, hands-on experience with AWS, especially EKS. Deep expertise in Kubernetes architecture and operations.
- Infrastructure as Code (IaC): Mastery of Terraform for provisioning and managing complex, multi-environment cloud infrastructure.
- CI/CD & GitOps: Proven, deep expertise in building sophisticated CI/CD pipelines (GitLab CI) and implementing GitOps at scale with Argo CD.
- Monitoring & Observability: Extensive experience designing and managing the Prometheus/Grafana stack, including long-term storage solutions like Thanos.
- Security: Deep, practical experience with HashiCorp Vault in a production environment.
- Nice to Have: Strong experience with the "Nice to Have" skills is highly expected at this level (Karpenter, KEDA, Thanos, Artifactory, SonarQube, OpenCost).