Lead HPC Systems Engineer

7 years

20 - 50 Lacs

Posted:5 days ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Key Responsibilities: 26821

  • Design, implement, and support HPC clusters (CPU/GPU-based) with scalable storage and high-bandwidth interconnects.
  • Generate hardware BOMs, manage vendors, and oversee hardware release and integration.
  • Use expert-level Linux system administration skills to configure and tune HPC environments (RedHat, SuSE, Ubuntu, Rocky, etc.).
  • Assemble project specifications and performance requirements at both system and subsystem levels.
  • Drive timely execution of project deliverables across cross-functional teams.
  • Develop and maintain shell/Python scripts, golden images, procedures, and automation for deployment and monitoring.
  • Support release of new hardware/software products into manufacturing with proper documentation and knowledge transfer.
  • Configure and maintain robust storage solutions, netboot/PXE environments, and Linux HA clusters.

Required Qualifications

  • Bachelor's or Master’s degree (BE/BTech/MS/MCA/MSc) in Computer Engineering or Electrical Engineering.
  • Minimum 7 years of experience in:
    • High Performance Computing (HPC) environments
    • Cluster management, deployment, and optimization
    • Linux Systems (SuSE, RedHat, Rocky, Ubuntu)
    • Server, GPU, BIOS, BMC, Networking, and Storage hardware
    • TCP/IP fundamentals, DNS, DHCP, HTTP, LDAP, SMTP
    • Shell and Python scripting
  • Strong experience with systemd, PXE boot, and high-availability clusters.
  • Familiarity with configuration management tools: Salt, Chef, Puppet, etc.

Preferred Qualifications

  • DevOps mindset with experience in CI/CD pipelines (Jenkins), Git-based repo systems.
  • Exposure to containerization tools (Singularity, Docker).
  • Working knowledge of Kubernetes, Prometheus, Grafana, and observability tools.
  • Understanding of web/proxy technologies like Apache/Nginx, reverse proxies, and HAProxy for load balancing.
  • Experience with cloud-based compute architectures and hybrid models (on-prem + cloud).

Skills & Abilities

  • Strong problem-solving skills and troubleshooting abilities.
  • Exceptional team collaboration and communication skills.
  • Ability to manage multiple tasks, prioritize efficiently, and meet project deadlines.
  • Adaptable in fast-paced, evolving technology environments.
  • Strong documentation and process-oriented mindset.
Skills: project,dhcp,smtp,grafana,web/proxy technologies (apache, nginx, haproxy),ldap,python scripting,documentation,high-availability clusters,configuration management tools (salt, chef, puppet),suse,pxe,ci/cd pipelines (jenkins),skills,tcp/ip fundamentals,cloud,linux,linux systems (suse, redhat, rocky, ubuntu),pxe boot,python,shell scripting,containerization tools (singularity, docker),dns,git-based repo systems,cluster management,storage,systemd,prometheus,high performance computing (hpc),management,http,kubernetes

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You