The Senior Cloud Engineer (Windows) is a key technical member of Rhapsodyâs global Cloud Operations team, responsible for the stability, v reliability, and operational excellence of our Windowsâbased workloads running on AWS. This role blends handsâon Windows systems engineering with cloud infrastructure automation, deep troubleshooting, deployment support, and customerâfacing collaboration. You will partner with Engineering, SRE, Security, and the Global Operations team to ensure high availability, rapid incident response, and continuous improvement across Windows services and supporting cloud components.
Key Responsibilities
- Deploy, operate, and harden Windows Server workloads on AWS (EC2, ASG, Launch Templates, EBS, AMIs).
- Manage Windows services: Active Directory/AD DS, AWS Directory Service/Managed AD, Group Policy, DNS, IIS, SMB file services (FSx for Windows), certificates/PKI, and RDP access.
- Implement infrastructure-as-code using Terraform, Windows scripting, and AWS CLI; maintain consistent golden images and configuration baselines.
- Use AWS Systems Manager (Run Command, Patch Manager, Inventory) to manage and standardize Windows fleets.
- Build and maintain PowerShell scripts/modules for provisioning, configuration, maintenance, and diagnostics.
- Contribute to shared Terraform modules and CI/CD deployments to reduce manual operations.
- Monitor Windows server performance and cloud metrics using CloudWatch, Datadog, Event Logs, and performance counters.
- Perform deep troubleshooting across Windows OS, IIS, Active Directory, Group Policy, Kerberos/NTLM authentication, TLS/certificates, DNS, and firewall/routing issues.
- Diagnose hybrid connectivity issues involving VPNs, load balancers, security groups, route tables, and TLS termination.
- Enforce secure configuration baselines and monthly patch/update cycles for Windows environments.
- Collaborate with the Security team on detections, log analysis, and endpoint protections.
- Maintain upâtoâdate documentation (SOPs, runbooks, standards, diagrams) and follow change/incident/problem management processes.
- Work directly with internal and external customers to troubleshoot application issues, connectivity, and environment configuration.
- Support onboarding, migrations, deployments, and postâincident reviews.
- Collaborate with SRE/Engineering on observability, tuning, resiliency, and cost optimization.