Exp 11 13 Years Key Responsibilities Design and Implementation robust Observability architectures for backend services
Drive adoption and integration of the OpenTelemetry standard across services Define and evolve best practices based on the 4 Golden Signals Lead setup tuning and production operation of OpenTelemetry Collector
Creating dashboards from customer requirements in monitoring and visualization tool like Grafana or similar Support to define observability strategies across services and present to both technical and non technical stakeholders Mentor teams and provide thought leadership in adopting OpenTelemetry standard
Skills Expertise in backend Observability Deep knowledge of OpenTelemetry and Specially the Collector Expertise with monitoring and visualization tools such as Grafana Loki Prometheus or similar Solid understanding of the 4 Golden Signals Strong communication and presentation skills
Hands on experience operating and optimizing large scale production systems
Working closely with our product development teams to understand their product requirements and how they build test deploy their software applications
Demonstrable experience in Containerization Docker and orchestration Kubernetes Mandatory
Demonstrable experience in CI CD tools such as Azure DevOps and helm Argo CD Mandatory Experience with Infrastructure As a Code preferably Terraform
Good to have experience in development and deployment of microservices
Knowledge and proven hands on experience in large scale databases and distributed technologies such as Kafka and Confluent Platform Kafka
Leverage expert knowledge of Azure Cloud to design deploy and manage cloud based infrastructure and utilize a deep understanding of Azure networking to optimize performance and security
Experience in monitoring and analysing infrastructure performance using standard performance monitoring tools including implementing and managing monitoring and observability solutions with tools like Grafana Prometheus Elasticsearch OpenTelemetry and Kibana
Serve as a primary point responsible for the overall health performance and capacity of customer facing services in cloud infrastructure
Ability to learn deep knowledge of our complex applications
Assist in the roll out and deployment of new product features and installations to new cloud Infrastructure
Good understanding of Network routing Load balancing and Networking protocols a base knowledge of TCP IP with an understanding of HTTP and DNS Kubernetes CKA or CKAD certification nice to have
Strong interpersonal communication skills including listening speaking and writing and ability to work well in a diverse team focused environment with other Developers Product Managers etc Agile methodologies with B Tech B E degree in Electronics Telecomm or Computer Science