At Freshworks, uptime is sacred. As a Lead Site Reliability Engineer (SRE), you'll be the engineer behind the curtain鈥攄esigning for resilience, automating recovery, and ensuring our systems stay fast, stable, and observable at scale. You鈥檒l partner closely with engineering, platform, and product teams to shift reliability left and set the standard for performance and availability.
If you live for clean telemetry, root cause resolution, and engineering chaos into confidence, this is your playground.
Responsibilities
- Design and implement tools to improve availability, latency, scalability, and system health.
- Define SLIs/SLOs, manage error budgets, and drive performance engineering efforts.
- Build and maintain automated monitoring, alerting, and remediation pipelines.
- Collaborate with engineering teams to improve reliability by design.
- Lead incident response, root cause analysis, and blameless postmortems.
- Champion observability across services鈥攍ogs, metrics, traces.
- Contribute to infrastructure architecture, automation, and reliability roadmaps.
- Advocate for SRE best practices across teams and functions.