We are seeking a Principal SRE Engineer to lead our cloud transformation journey. This role will be responsible for migrating our on-premises IIS-hosted applications to AWS Cloud, ensuring reliability, scalability, security, and operational excellence at every stage of the migration and beyond.
Responsibilities
- Lead the design and end-to-end execution of migrating on-premises IIS-hosted applications in high-availability production and non-production environments to AWS Cloud.
- Architect and implement scalable, secure, and cost- efficient cloud infrastructure.
- Drive adoption and implementation of infrastructure-as-code.
- Establish best practices for monitoring, logging, alerting, and incident response.
- Collaborate with development, security, and operations teams to ensure seamless deployment pipelines (CI/CD).
- Own and improve system reliability, performance, and scalability across critical services.
- Champion disaster recovery, backup, and high-availability strategies in AWS.
- Proactively identify opportunities for process improvement and efficiency gains.
- Create and maintain technical documentation.
- Provide 2nd and 3rd line incident support to ensure prompt and efficient issue resolution.
- Participate in an on-call rotation.
- Facilitate effective communication between technical teams and key stakeholders, providing advice and support on relevant topics.
- Mentor and guide engineering teams on AWS and SRE best practices to foster their professional growth and development.
Experience
- 8+ years of proven experience in Site Reliability Engineering, DevOps, or Cloud Engineering roles.
- 5+ years of proven expertise with AWS services.
- Strong background in migrating complex applications (preferably IIS/.NET-based) from on-premises to AWS Cloud.
- Deep knowledge of infrastructure-as-code, CI/CD, and automation.
- Experience with Windows Server, IIS, and .NET application hosting.
- Experience with monitoring & observability tools (Datadog, OpenTelemetry, Elastic, etc.).
- Experience with container orchestration platforms, specifically Kubernetes, and experience managing large-scale clusters at the enterprise level.
- Ability to optimize cloud costs and maintain cost awareness.
- Knowledge of security best practices in cloud environments.
- Experience working in a high-availability, 24/7 production environment.
- Excellent leadership, communication, and cross-functional collaboration skills, including mentoring other team members.
- Passion for understanding problems and delivering effective solutions.
- Good networking skills.
- Fluent in English (both written and spoken).
- Self-driven personality with a strong eagerness to learn and deliver.