As a NOC Engineer, you will monitor, triage, and resolve infrastructure and application alerts across Windows and Linux/Ubuntu environments in on-prem and cloud platforms. You will ensure high availability, perform incident response, escalate effectively, and drive continuous improvement of monitoring and operational processes using PRTG and Jira Service Management.
- Monitor and respond to alerts/events using PRTG and other monitoring/logging tools; maintain SLA compliance and uptime targets.
- Perform L1/L2 incident triage and troubleshooting across:
- Windows Server services (AD/DNS/DHCP, IIS, RDP, services, event logs)
- Linux/Ubuntu (systemd, journald, networking, storage, permissions, cron, package management)
- Investigate connectivity and performance issues (latency, packet loss, CPU/memory, disk I/O, filesystem capacity).
- Operate and troubleshoot virtualization workloads in Hyper-V (VM state, snapshots/checkpoints, storage, replication, vSwitch/networking).
- Support cloud operations in AWS:
- Troubleshoot EC2, EBS, VPC, Security Groups/NACLs, IAM basics, CloudWatch alarms/logs.
- Assist with backup/restore and availability checks; coordinate escalations with cloud/platform teams.
- Use Jira Service Management for incident, service request, and problem workflows:
- Create/route tickets, document troubleshooting, communicate status, manage escalations and handoffs.
- Execute SOPs/runbooks; contribute updates and knowledge base articles.
- Perform routine operations tasks (patch coordination, health checks, certificate checks, job monitoring, backup verification).
- Participate in shift operations (24x7 rotation if applicable), on-call, and major incident bridges; provide timely stakeholder updates.