As a NOC Engineer, you will play a vital role in ensuring the health, stability, and uptime of our production systems. This is a hands-on, operational role requiring a deep understanding of system administration, networking, and incident response. You鈥檒l act as the first line of defense during outages and performance issues, with responsibility for real-time monitoring, troubleshooting, and driving incident resolution in a 24/7 environment. If you enjoy working with infrastructure at scale and thrive in fast-paced environments, this is the role for you.
Roles & Responsibilities
- Monitor production systems and applications to ensure consistent uptime, performance, and availability
- Respond to and manage incidents, alerts, and outages in real time, coordinating appropriate responses
- Conduct root cause analysis (RCA) and implement corrective and preventive actions
- Troubleshoot system, application, and network issues escalated by monitoring systems or support teams
- Participate in 24/7 shift rotations, including weekends and holidays, to ensure continuous support
- Collaborate with engineering and product teams to improve observability and monitoring frameworks
- Develop and update SOPs, runbooks, and internal knowledge bases to ensure process consistency
- Maintain compliance with internal security, audit, and operational standards
- Recommend and implement automation and monitoring improvements to increase efficiency and reduce incident frequency
- Engage in post-incident reviews and help drive blameless postmortems and process improvement initiatives