In this role, you鈥檒l make an impact in the following ways:
- Be hands-on with enterprise-grade NVIDIA AI infrastructure, supporting GPU-based compute, high-performance storage, and network systems designed for ML/AI at scale.
- Deploy, monitor, and troubleshoot containerized AI workloads using Kubernetes, Docker, and GPU orchestration tools like Run:AI and NVIDIA BCM.
- Own the observability of our AI platforms鈥攎onitor health, identify performance bottlenecks, and make strategic recommendations to drive platform reliability and maturity.
- Automate infrastructure operations and provisioning using Python, Bash, and tools like Terraform or Ansible to reduce manual toil and accelerate experimentation.
- Maintain and scale AI training and inference pipelines, integrating infrastructure workflows into CI/CD systems to enable seamless, automated deployment of AI workloads.
To be successful in this role, we鈥檙e seeking the following:
- Bachelor's degree in computer science or a related discipline, or equivalent work experience required; advanced degree preferred8-10 years of related experience required; experience in the securities or financial services industry is a plus.
- Experience with Linux administration (RHEL/Ubuntu), shell scripting, and system-level debugging.
- Proven experience running distributed systems in Kubernetes and containerized environments using Docker.
- Familiarity with GPU resource management, including NVIDIA GPU Operator and device plugin lifecycle.
- Experience with CI/CD workflows and infrastructure automation tools such as GitLab CI, Jenkins, Terraform, Helm, or Ansible.
- Knowledge of networking fundamentals and persistent storage systems.
- Exposure to cloud platforms (AWS, GCP, Azure) and hybrid GPU environments.
- Ability to read and support Python code focused on ML/AI pipeline integration.
- Strong analytical and troubleshooting skills with a collaborative mindset.
Effective communication skills and proactive ownership of platform reliability and performance.
Regards,
Mohammed ilyas,
PH - 229-264-4024 or Text - 229-469-1455 or you can share the updated resume at Mohammed@vtekis. com