As an Observability Engineer you will be part of a team that is responsible for managing and operate our observability stack, ensuring E2E monitoring, metrics collection, logging and tracing across our infrastructure and applications. You will collaborate with other stakeholders to improve system visibility, detect issues proactively and drive performance optimization. This role requires strong technical expertise and hands-on experience with infrastructure as code (IaC) tools such as Terraform and application analysis.
Your key tasks
Observability Stack Design & Deployment
- Design and Implement: build a robust observability stack encompassing logging, metrics collection, monitoring, alerting, and tracing systems tailored for cloud environments
- Integration: seamlessly connect observability tools with cloud services and infrastructure to achieve comprehensive monitoring and visibility
- IaC Development: use Terraform to automate the provisioning and deployment of observability tools and infrastructure, ensuring consistency and efficiency
Monitoring and Optimization
- Monitoring Standards: define and enforce organization-wide monitoring and alerting standards for real-time incident detection
- Optimization: continuously refine the observability stack to enhance system performance, minimize downtime, and optimize resource utilization
End-to-End Monitoring Practices
- Comprehensive Tracking: implement end-to-end monitoring solutions that provide insights into the performance, availability, and reliability of IT workloads
- Standardization: establish best practices for metrics, logs, and traces, ensuring holistic visibility across the technology stack
- Automated Alerts: develop automated alerting systems for proactive issue identification and resolution.
Technical Collaboration
- Cross-Team Integration: work closely with DevOps, SRE, and application development teams to align observability strategies with operational objectives.
Stakeholder Engagement: communicate complex technical insights clearly to stakeholders, enabling informed decision-making.