We are looking for a DevOps Engineer to join our Data Department, which is building Big Data and AI Platforms from scratch.
You will play a key role in designing, deploying, and scaling the core infrastructure that powers big data processing, machine learning pipelines, and enterprise analytics.
This position involves working across Microsoft Azure, collaborating with data engineers, ML engineers, and analytics teams to build a secure, automated, and cost-efficient foundation for the organizationβs future Data & AI ecosystem.
Key Responsibilities
- Design, deploy, and operate core infrastructure for a new Data and AI Platform β covering data ingestion, transformation, ML model training, and analytical workloads.
- Architect and manage Azure resources β including subscriptions, IAM, networking, monitoring, and FinOps governance β to support large-scale data and ML environments.
- Build and operate Kubernetes-based platforms (AKS) to orchestrate microservices across data, ML, and analytics layers.
- Implement and maintain microservices and event-driven architectures, leveraging Ingress controllers, service meshes and distributed load balancing.
- Develop and maintain Infrastructure as Code (IaC) using Terraform and Terragrunt, building modular, reusable, and environment-specific components that follow the DRY principle.
- Establish GitOps workflows with Argo CD and Azure DevOps, ensuring fully automated, auditable, and consistent deployments across all environments.
- Implement monitoring and observability stacks (Prometheus, Grafana, Azure Monitor) for end-to-end visibility into data, compute, and network layers.
- Apply FinOps principles β perform cost analysis, tagging, budgeting, and optimization
- Collaborate with Data and ML teams to deploy and manage core platforms and tools such as Databricks, MLFlow and vector-enabled databases for AI workloads.
- Manage high-performance load balancers for real-time ML inference and large-scale data services.
- Ensure secure network architecture across hybrid environments, managing VNETs, subnets, private endpoints, DNS (Azure Private Resolver), and VPN routing.
- Contribute to the design of scalable, cost-effective, and reliable data infrastructure β from concept to production.