Job Title:
HPC Admin / Cloud Engineer
Job ID:
26821
Location:
Chennai, India (Onsite)
Experience Required:
7–14 Years
Salary Range:
₹20 – ₹50 LPA
Work Type:
Full-Time
Notice Period:
Immediate to 60 Days
Role Summary
We are seeking an experienced
HPC Admin / Cloud Engineer
to lead the design, implementation, and support of high-performance computing (HPC) clusters. This role requires in-depth knowledge of Linux systems, cluster management, storage, networking, and automation tools. You will be part of a technical team driving innovation and performance at scale.
Key Responsibilities
- Design, deploy, and support high-performance compute (HPC) clusters
- Work with CPU/GPU architectures, scalable storage, high-speed interconnects, and cloud-based compute systems
- Create hardware BOMs for HPC clusters, manage vendor relationships, and oversee hardware release processes
- Configure and manage Linux-based systems (e.g., SuSE, RedHat, Rocky, Ubuntu) for HPC environments
- Ensure alignment of system design with performance and functional specifications
- Support new product releases to manufacturing and end-users, including golden images, scripts, documentation, and training
- Troubleshoot network and system-level issues and optimize cluster performance
Must-Have Qualifications
- Minimum 7 years of experience in HPC systems, cluster configuration, and Linux system administration
- Strong knowledge of:
- Linux systems (SuSE, RedHat, Rocky, Ubuntu)
- HPC hardware (servers, GPUs, networking, storage, BIOS, BMC)
- TCP/IP fundamentals and network protocols (DNS, DHCP, HTTP, LDAP, SMTP)
- Scripting with Shell and Python
- Experience with configuration management tools like Salt, Chef, or Puppet
- Degree Requirement:
- BE/BTech, MSc, MCA, or MS in Computer Engineering, Electrical Engineering, or related disciplines
- Candidates with only Diploma or 3-year degrees (BSc/BCA) will not be considered
Preferred Qualifications
- Exposure to DevOps practices (CI/CD pipelines, Git, Jenkins)
- Containerization experience (Docker, Singularity)
- Familiarity with Kubernetes, Prometheus, Grafana
- Experience with reverse proxies/load balancers (Apache, NGINX, HA Proxy)
- Proven ability to create and support scalable infrastructure in a production setting
Skills: networking,linux,kubernetes,tcp/ip fundamentals and network protocols (dns, dhcp, http, ldap, smtp),reverse proxies/load balancers (apache, nginx, ha proxy),scripting with shell and python,linux systems (suse, redhat, rocky, ubuntu),prometheus,cloud,hpc hardware (servers, gpus, networking, storage, bios, bmc),containerization (docker, singularity),management,devops practices (ci/cd pipelines, git, jenkins),configuration management tools (salt, chef, puppet),design,grafana