A part of global technology organization Tieto, MentorMate creates durable technical solutions that deliver digital transformation at scale by blending strategic insights and thoughtful design with brilliant engineering. With mature and established practices in enterprise web and mobile development, quality engineering, technical architecture, human-centered design, cloud, DevOps, data, and analytics, the company provides its people with the opportunity to work on impactful, global projects for recognizable brands.
We鈥檙e looking to hire a Tech Lead AWS DevOps Engineer to help modernize and scale a global digital health SaaS platform used for data鈥慸riven and AI-powered healthcare solutions. In this role, you鈥檒l lead the evolution of the cloud, ML, and Kubernetes infrastructure - driving reliability, automation, cost efficiency, and operational excellence across mission鈥慶ritical workloads. You鈥檒l work hands鈥憃n with a modern AWS stack (Fargate, EKS, Lambda, S3, RDS), ML platforms (SageMaker, Kubeflow, MLflow), and CI/CD tooling (GitHub Actions, ArgoCD), enabling engineering and data science teams to deliver faster, safer, and at scale. As part of your leadership responsibilities, you will collaborate closely with mid鈥憀evel DevOps engineers, providing guidance, technical mentorship, and support. You鈥檒l help shape best practices, uplift engineering maturity, and ensure the team grows both technically and operationally.
Key Responsibilities
- Manage and enhance CI/CD pipelines supporting both application and ML deployment workflows
- Lead operations for multi-tenant SaaS workloads on AWS, ensuring scalability, high availability, and cost efficiency
- Build and maintain infrastructure automation using Infrastructure as Code (AWS CDK or Terraform)
- Design, implement, and maintain reliable infrastructure for production, data, and AI/ML workloads
- Own incident response, postmortems, and operational runbooks to improve system reliability and reduce MTTR
- Enable self-service capabilities for engineering and data science teams
- Monitor and optimize cloud usage across compute, GPU, and storage resources, implementing cost controls and forecasting
- Support and automate ML pipelines, including training, testing, and deployment using AWS SageMaker, Kubeflow, or MLflow
- Manage GPU and compute clusters (EKS, ECS, EC2) for model training and inference workloads
- Develop and maintain monitoring, alerting, observability, and security best practices
- Collaborate closely with Engineering, Data, AI/ML, and PlatformOps temas to ensure smooth cross-team delivery
Required Experience & Qualifications
- 7+ years of experience in DevOps/ CloudOps/ SRE
- Infrastructure as Code (IaC) using AWS CDK
- Advanced CI/CD pipeline expertise (GitHub Actions, ArgoCD), including build optimization, tuning, and troubleshooting
- Hands-on experience with Amazon EKS (Kubernetes operations, deployments, scaling, cluster configuration)
- Strong troubleshooting skills across infrastructure, pipelines, runtime environments, observability, and incident management (Splunk, Grafana, OpenTelemetry)
- Deep operational experience with AWS (Fargate, EKS, EC2, S3, RDS, Lambda, IAM, CloudWatch, CloudTrail)
- Familiarity with GPU workloads and performance/cost tuning for AI pipelines
- Solid understanding of FinOps principles, cost monitoring, and right-sizing in AWS
- Strong collaboration, mentoring, and communication skills, with the ability to thrive in a fast-paced, evolving environment
- Excellent spoken and written English language skills
A significant advantage would be
- Subject Matter Expert in one of the following: AWS, EKS, or IaC
- Working experience with AI/ML platforms (AWS SageMaker, Kubeflow, MLflow, or equivalent)
- Knowledge of MongoDB operations and performance optimization
- Familiarity with Projen for automating CDK project configuration and management
- Hands-on experience with Helm charts and Kubernetes manifests
- Awareness of regulatory compliance frameworks (SOC 2, ISO 27001, NIST, HIPAA)