We are looking for a Solutions Architect – AI Factory with experience in designing, building, and maintaining large scale AI factories to join our team at NVIDIA. As Solution Architects on the AI Factory team, we are actively helping NVIDIA AI Factory solutions bring the benefits of large-scale AI to leading enterprise customers.
Requirements
- MS, or PhD in Engineering, Computer Science, or a related field (or equivalent experience)
- Established track record working with AI and HPC clusters, both on-premises and cloud based
- 8+ years of proven experience with cluster management and related tools, including Docker Containers, Slurm, Kubernetes, and Ansible
- Hands-on experience with Datacenter MEP, network, storage, cluster configuration and debugging
- Strong analytical and problem-solving skills, along with an ability to articulate what you know to others
- Ability to multitask efficiently in a dynamic environment
- Strong coding and debugging skills, including experience with CUDA, Python, C/C++, Bash, AI frameworks and Linux utilities
- Demonstrated expertise through projects or Open Source contributions involving GPU workloads, Kubernetes, InfiniBand, Ethernet, or other areas related to high-performance clusters and hybrid cloud solutions
- Exhibit hands on experience with NVIDIA Enterprise software products, Base Command Manager, Run:ai and NVIDIA NIMs
- Willingness and ability to learn quickly and solve advanced problems
Benefits
- Generous Paid Time Off
- 401k Matching
- Retirement Plan
- Four Day Work Week
- Generous Parental Leave
- Tuition Reimbursement
- Relocation Assistance
Originally posted on Himalayas