A part of global technology organization Tieto, MentorMate creates durable technical solutions that deliver digital transformation at scale by blending strategic insights and thoughtful design with brilliant engineering. With mature and established practices in enterprise web and mobile development, quality engineering, technical architecture, human-centered design, cloud, DevOps, data, and analytics, the company provides its people with the opportunity to work on impactful, global projects for recognizable brands.
We鈥檙e looking to hire a Senior DevOps Engineer with AWS to help modernize and scale a global digital health SaaS platform used for data鈥慸riven and AI-powered healthcare solutions. In this role, you鈥檒l lead the evolution of the cloud, ML, and Kubernetes infrastructure - driving reliability, automation, cost efficiency, and operational excellence across mission鈥慶ritical workloads. You鈥檒l work hands鈥憃n with a modern AWS stack (Fargate, EKS, Lambda, S3, RDS), ML platforms (SageMaker, Kubeflow, MLflow), and CI/CD tooling (ArgoCD, GitHub Actions), enabling engineering and data science teams to deliver faster, safer, and at scale.
Key Responsibilities
Lead operations for multi-tenant SaaS workloads on AWS, ensuring scalability, high availability, and cost efficiency
Design, implement, and maintain reliable infrastructure for production, data, and AI/ML workloads
Own incident response, postmortems, and operational runbooks to improve system reliability and reduce MTTR
Manage and enhance CI/CD pipelines supporting both application and ML deployment workflows
Build and maintain infrastructure automation using Infrastructure as Code (AWS CDK or Terraform)
Enable self-service capabilities for engineering and data science teams
Monitor and optimize cloud usage across compute, GPU, and storage resources, implementing cost controls and forecasting
Support and automate ML pipelines, including training, testing, and deployment using AWS SageMaker, Kubeflow, or MLflow
Manage GPU and compute clusters (EKS, ECS, EC2) for model training and inference workloads
Develop and maintain monitoring, alerting, observability, and security best practices
Collaborate closely with Engineering, Data, AI/ML, and PlatformOps temas to ensure smooth cross-team delivery
Required Experience & Qualifications
7+ years of experience in DevOps/ CloudOps/ SRE
Solid hands-on experience with AWS (Fargate, EKS, EC2, S3, RDS, Lambda, IAM, CloudWatch, CloudTrail), Kubernetes and containerized workloads
Proficiency with CI/CD tools, Infrastructure as Code (IaC), infrastructure automation, and scripting (Python, Bash, or similar)
Proven experience with AI/ML platforms (AWS SageMaker, Kubeflow, MLflow, or equivalent), and cost鈥慹fficient GPU/compute optimization
Working knowledge of MongoDB operations, monitoring, and performance tuning
Solid understanding of FinOps principles, cloud cost monitoring, and right-sizing strategies
Experience with production monitoring & incident management (Splunk, Grafana, OpenTelemetry)
Exposure to multi-tenant SaaS architectures and security or compliance frameworks is a plus
Strong collaboration, mentoring, and communication skills, with the ability to thrive in a fast-paced, evolving environment
Excellent spoken and written English language skills