About the Role:
We’re looking for a seasoned Performance Test Engineer with 5+ years of hands-on experience in performance, reliability, and resilience engineering for modern cloud-native systems. You’ll design and execute end‑to‑end performance testing strategies across Load, Stress, Spike, Endurance, Capacity, Failover, DR, Chaos Engineering, Fault Injection, and Game Day exercises. You’ll be deeply technical and comfortable with tools like Apache JMeter, Locust, AWS Distributed Load Testing (DLT), and AWS Fault Injection Simulator (FIS).
Key Responsibilities Performance & Scalability
- Define and execute performance test strategies covering Load, Stress, Spike, Endurance, and Capacity testing.
- Build, maintain, and optimize test scripts using JMeter and Locust; parameterize, correlate, and modularize for reusability.
Reliability, Resilience & Recovery
- Design and run Failover, Disaster Recovery (DR), and Resilience tests to validate RTO/RPO, multi‑AZ/region failover, and recovery runbooks.
- Use AWS FIS to inject faults (network latency/packet loss, CPU/memory stress, AZ disruptions, instance terminations) and validate blast radius containment.
- Partner with SRE/Platform teams to define chaos hypotheses, measure steady state, and implement auto‑remediation and graceful degradation patterns.
Cloud & Observability
- Use AWS DLT to orchestrate scalable distributed load tests; automate workloads via CloudWatch Events, Step Functions, or pipelines.
- Instrument and analyze performance using Amazon CloudWatch, X‑Ray, and APM tools (Dynatrace/AppDynamics), plus Prometheus/Grafana.
Engineering & Automation
- Integrate performance tests into CI/CD (GitHub Actions/Azure DevOps/Jenkins/GitLab), with auto‑gating based on thresholds.
- Build utilities/frameworks for data generation, environment setup, and report automation (Python/Java).
- Analyze results (percentiles, tail latency, saturation), identify bottlenecks (DB, cache, network, I/O), and drive fixes with DevOps/Engineering.
Game Days & Readiness
- Plan and facilitate Game Day exercises simulating peak traffic, dependency failures, regional outages, and DR drills.
- Document test plans, hypotheses, playbooks, and post‑incident reviews with measurable improvements.
Nice to Have
- AWS Certification (Solutions Architect/Developer/SysOps/SRE) — plus.
- Experience with Kubernetes/EKS, service meshes (Istio), HPA/auto‑scaling, and pod/resource tuning.
- Knowledge of CDNs (CloudFront), Edge compute/Lambda@Edge, and WAF impacts on performance.
- Familiarity with security/perf trade‑offs, TLS tuning, connection pooling, and circuit breakers.