Home
Jobs

11 Infiniband Jobs

Setup a job Alert
JobPe aggregates results for easy application access, but you actually apply on the job portal directly.

6.0 - 11.0 years

8 - 13 Lacs

Chennai, Bengaluru

Work from Office

Your Opportunity As an HPC Architect , you will get the opportunity to architect high-performance computing solutions from scratch and design/ optimize all aspects (Compute , Memory, Network ing , Storage) for better cost of Ownership. Roles and Responsibility As an architect, you will be responsible fordesigning HPC infrastructure solutions, including compute, networking, storage, and workload management components. You will work closely with cross-functional teams, including Hardware, Software, product management, and business stakeholders, to understandcomputeworkload and translate theminto Platformarchitecture and designs that meet business needs. You will create and maintain detailed system architecture diagrams and specifications. You will evaluate and select appropriate hardware and software components for HPC environments You will Install, configure, and maintain HPC systems, including hardware, software, and networking components You will develop and implement automation scripts for system management and deployment. You will be a subject Matter expert to unblock dependent teams in the HPC domain. You will be expected to develop system benchmarks, profile systems to understand bottlenecks, optimize workflows and processes to improve cost of ownership. Identify and mitigate technical risks and issues throughout the HPC development life cycle. Ensure that ComputeCluster is resilient, reliable, and maintainable. You will be expected to stay abreast of the latest HPC technologies, including Hardware, Software and Networking Solutions Your primary focus will be to understand thecomputeworkload and design HPC cluster with right combination of Nodes, CPU/GPU, Memory, Interconnects and storageto have optimum performance at minimum cost of Ownership. Our Ideal Candidate Someone who has the drive and passion to learn quickly , has the ability to multi- task and switch contexts based on business needs . Qualifications In-depth experience with Linux System administration and Hardware/Software Configuration. Strong knowledge of HPC technologies including cluster computing, high speed interconnects (InfiniBand, RoCE), parallel filesystems (Lustre, GPFS, BeeGFSetc) Experience in creating, maintaining Operating System images with different installation and boot schemes Extremely good with automation tools like Ansible, Chef, Salt-Stack and Scripting languages (Python and Bash) Experience in Creating,maintaining Storage Solutions with different RAID configuration. Ability to design storage solution for different IOPS, Access patterns (Random vs Sequential RW) and tune storage and filesystemsfor better performance. Good of knowledgeNetworking concepts including IP addressing, routing, protocols and Switch configuration for RDMA, VLAN configuration, network bonding etc. Good Knowledge Virtualization, Hardware and Software Hypervisors Good knowledge of containerization technologies like docker, singularity. Experience in Software Defined Networking and Storage. Experience in setting-up remote management protocols like IPMI, Redfish etc. Experience in setting-up and using monitoring systems like Prometheus, Grafana. Experience System profiling and custom tuningfor targetworkloadfor higher performance and low cost of ownership Very good written and verbal communication skills. Very goodinTechnical documentation meant to serve as manuals for non-experts in the field. Additional Qualifications: Experience in HPC Cluster management and Work-load orchestration software (e.g.SLURM, Torque, LSF) Experience in Setting-up Deep-learning training/inference solutions. Experience in Private cloud infrastructure like Kubernetes, OpenStack,CloudStack etc. Experience in DistributedHigh Performance Computing and Parallel programming frameworks Good knowledge of Low-latency and high-throughput data transfer technologies(RDMA on RoCE, InfiniBand) Education : Bachelor's Degree or higher in Computer science or related Disciplines.

Posted 1 week ago

Apply

10.0 - 20.0 years

40 - 80 Lacs

Bengaluru, Delhi / NCR, Mumbai (All Areas)

Work from Office

Bachelor's degree in Computer Science with 10+ years of experience with HPC environments Experience in HPC architecture and design, with a proven track record of delivering complex HPC solutions. Experience in designing and implementing HPC solutions on public cloud, private cloud, and on-premises infrastructure. Knowledge of HPC technologies, including M PI, OpenMP, Infiniband, GPFS, Lustre , and other file systems, cluster management tools such as Slurm , Torque, or LSF, and scheduling software such as PBSPro. Excellent communication skills, including the ability to communicate technical concepts to both technical and non-technical audiences. Experience with virtualization and containerization technologies such as Docker, Kubernetes, and Singularity. Strong understanding of networking technologies and protocols, including TCP/IP, Infiniband, and RDMA. Familiarity with one or more programming languages such as C, C++, Fortran, Python, or Java. Experience working in a multi-vendor, multi-cloud environment. Strong problem-solving skills and the ability to work under pressure in a fast-paced environment. Suitable candidates may forward their updated profiles in strict confidence to hr33@hectorandstreak.com or call on 9699224920

Posted 1 week ago

Apply

6.0 - 12.0 years

0 - 40 Lacs

Delhi, India

On-site

Operating systems : Linux: RHEL, CentOS, SuSE, ? Languages : C, C++, bash, Python? Schedulers & Resource Managers : PBS , LSF, SLURM, Grid Engine? Provisioning: xCAT, Bright Cluster Manager, Warwolf? Monitoring : Grafana, Nagios, Zabbix, Gangalia? Compilers & Libraries : GNU, Intel, OpenMPI, Cuda,MPI? Networking : InfiniBand/ ROCE? AIMLOPS: R-studio, JupyterHub NVIDIA AI Ent? Parallel File System : GPFS, Lustre, Weka. BeeGFS? Benchmarking tool s : IOR ,Limpack?

Posted 1 week ago

Apply

8.0 - 12.0 years

8 - 12 Lacs

Bengaluru / Bangalore, Karnataka, India

On-site

8+ years of experience in managing Linux setup. 4+ years of Experience in HPC/ Linux clusters. Install, administer, and maintain hardware, system software, networking, accounts, and security measures on VMWare configuration. Diagnose and correct system issues, whether these be issues with correct operation or performance. Reinstate integrity of system as quickly as possible following an outage in order to minimize downtime. Triage and solve user-submitted tickets, especially when they relate to the infrastructure. Track resource usage using monitoring and queuing software. Actively participate in Knowledge Management by creating new technical documents. Patch system firmware and software as needed. Peer assistance is an added trait. What you need to bring: Technical Skills: Demonstrated expertise with Linux system administration, including OS, networking, storage, and security. Expertise with high-speed networking such as InfiniBand and 10/40 Gigabit Ethernet. Expertise with high speed file transfer tools such as file catalyst Familiarity with large storage systems Some experience in scripting language Proven expertise in Hypervisor Knowledge of Horizon is preferred Experience with Linux clusters Troubleshooting Knowledge on ESXi and vCenter performance issues. Knowledge on Virtual Machine snapshots and VMware VDP Understanding of VMware Site Recovery Manager for disaster recovery Business Skills: Demonstrate strong written and verbal communication skills. Interacting and collaborating across different technology teams within HPE. Must work towards achieving HPEs vision for our customers. Affinity and a thorough understanding of support processes defined within HPE. Ability to work in a 24x7 environment in rotation shifts Exhibit Customer First and Customer Last Attitude consistently. Ability to drive cases to closure and provide Case Summary. Demonstrate high level of technical & communication skills. Takes responsibility for end-to-end problem ownership and its solutions. Mandatory Key Skills Ethernet, file catalyst, Hypervisor, VMware ESXi, VMware VDP, VMware Site Recovery Manager, networking, Linux system administration*,InfiniBand*,High-Performance Computing

Posted 2 weeks ago

Apply

3.0 - 5.0 years

9 - 19 Lacs

Bengaluru

Work from Office

Job Summary As a Storage Support Engineer, you provide support to customers, customer support personnel, and field support staff that is focused on diagnosing, troubleshooting, repairing and debugging NetApp products. You respond to situations where first-line product support has failed to isolate or fix problems in hardware or software products, and you ensure delivery of optimal results. You must be a “take charge” professional with demonstrated technical problem-solving skills; and a subject matter expert; and have a strong customer service orientation and experience. You'll also be happy to come into the office either full time or hybrid (minimum two days a week). • Respond to situations where NetApp product support has been unable to solve customer’s technical issues. • Collaborate with or escalate cases with other NetApp Technical Support teams and/or Escalation Engineers when the problem is too complex or falls out of your specific area of expertise in order to most quickly facilitate solutions for customers. • Work collaboratively with customers in potentially stressful situations, while providing professional and courteous technical expertise. • Create new knowledge base articles to share information and best practices for reuse throughout the Technical Support Center • Focus on E-series and StorageGRID specialization and build deep technical expertise in these areas. Job Requirements • Storage and Object based storage experience • Ability to troubleshoot difficult technical issues with strong commitment to deliver excellent customer service experience. • Passion and ability to learn new technologies in a fast-pace environment. • Work well in a team environment and be a proactive contributor to team development projects. • Creative approach to problem solving and demonstrate a ‘can-do’ attitude. • High ability to multi-task, manage workload and define priorities based on business impact of issues. • Strong aptitude for learning new technologies and understanding how to utilize them in a customer facing environment. • Ability to follow standard engineering principles and practices. Strong Understanding of the Following: Hardware principles, RAID, and iSCSI Object-based storage and S3 protocol Distributed databases (Cassandra) Network troubleshooting experience (Wireshark, TCP) Linux administration and scripting Virtualization (VMware, Docker-containers) Data protection and understanding of T10 Protection Information (PI) concepts Experience with NetApp Data ONTAP 9.0+ (a plus), Fiber Channel, Infiniband, NVMe-oF, HTTP/RestAP Education • Typically requires a minimum of 2-5 years related experience within a similar Technical Support role • A Bachelor of Science in Engineering or Computer Science; or equivalent related experience is required

Posted 1 month ago

Apply

4.0 - 6.0 years

10 - 12 Lacs

Hyderabad

Work from Office

Seeking a Senior HPC Administrator to manage and optimize high-performance computing systems. Required Candidate profile Notice Period : Immediate or 30 days max Responsibilities include cluster management, performance tuning, and user support. Requires 5+ years' experience with HPC, Linux, and job schedulers.

Posted 1 month ago

Apply

2.0 - 4.0 years

3 - 6 Lacs

Mumbai, Hyderabad, Bengaluru

Work from Office

Hiring an HPC Administrator to manage and support high-performance computing systems. Responsibilities include cluster setup, maintenance, monitoring, and user support. Requires 3+ years' experience with HPC environments, Linux, and schedulers. Required Candidate profile Notice Period : Immediate or 30 days max

Posted 1 month ago

Apply

3 - 6 years

9 - 14 Lacs

Bengaluru

Work from Office

Job TitleLead Engineer – Virtualization LocationBengaluru Work EmploymentFull time DepartmentWireless DomainTesting Reporting toStaff Engineer About Us: Tejas Networks is a global broadband, optical and wireless networking company, with a focus on technology, innovation and R&D. We design and manufacture high-performance wireline and wireless networking products for telecommunications service providers, internet service providers, utilities, defence and government entities in over 75 countries. Tejas has an extensive portfolio of leading-edge telecom products for building end-to-end telecom networks based on the latest technologies and global standards with IPR ownership. We are a part of the Tata Group, with Panatone Finvest Ltd. (a subsidiary of Tata Sons Pvt. Ltd.) being the majority shareholder. Tejas has a rich portfolio of patents and has shipped more than 900,000 systems across the globe with an uptime of 99.999%. Our product portfolio encompasses wireless technologies (4G/5G based on 3GPP and O-RAN standards), fiber broadband (GPON/XGS-PON), carrier-grade optical transmission (DWDM/OTN), packet switching and routing (Ethernet, PTN, IP/MPLS) and Direct-to-Mobile and Satellite-IoT communication platforms. Our unified network management suite simplifies network deployments and service implementation across all our products with advanced capabilities for predictive fault detection and resolution. As an R&D-driven company, we recognize that human intelligence is a core asset that drives the organization’s long-term success. Over 60% of our employees are in R&D, we are reshaping telecom networks, one innovation at a time. Why join Tejas: We are on a journey to connect the world with some of the most innovative products and solutions in the wireless and wireline optical networking domains. Would you like to be part of this journey and do something truly meaningful? Challenge yourself by working in Tejas’ fast-paced, autonomous learning environment and see your output and contributions become a part of live products worldwide. At Tejas, you will have the unique opportunity to work with cutting-edge technologies, alongside some of the industry’s brightest minds. From 5G to DWDM/ OTN, Switching and Routing, we work on technologies and solutions that create a connected society. Our solutions power over 500 networks across 75+ countries worldwide, and we’re constantly pushing boundaries to achieve more. If you thrive on taking ownership, have a passion for learning and enjoy challenging the status quo, we want to hear from you! Who we are This team is responsible for Platform and software validation for the entire product portfolio. They will develop automation Framework for the entire product portfolio. Team will develop and deliver customer documentation and training solutions. Compliance with technical certifications such as TL9000 and TSEC is essential for ensuring industry standards and regulatory requirements are met. Team works closely with PLM, HW and SW architects, sales and customer account teams to innovate and develop network deployment strategy for a broad spectrum of networking products and software solutions. As part of this team, you will get an opportunity to validate, demonstrate and influence new technologies to shape future optical, routing, fiber broadband and wireless networks. What you work: Design, build, and manage private cloud environments using technologies such as OpenStack, VMware ESXi, or CloudStack. Implement and maintain virtualization infrastructure, ensuring high availability, scalability, and disaster recovery. Develop and maintain automation scripts using Ansible and Python to streamline routine tasks including provisioning, monitoring, and configuration management. Implement Infrastructure-as-Code principles to ensure consistent and repeatable deployments. Monitor the performance of virtualization infrastructure, identifying and addressing bottlenecks. Optimize resource allocation and implement proactive measures to maintain optimal system performance. Implement security best practices in virtualized environments and conduct regular security audits. Address vulnerabilities and ensure compliance with relevant standards and policies. Provide expert technical support and troubleshooting for complex virtualization and infrastructure issues. Collaborate with cross-functional teams to resolve escalated incidents and ensure smooth operation of virtualized systems. Mandatory skills: BE/B.Tech./M.Tech. (EC/EE/CS) Degree with 6+ yrs. Building private clouds using platforms such as OpenStack, VMware ESXi, CloudStack, etc. Strong background in Linux/Unix system administration. Proven experience in managing and maintaining server environments. Hands-on experience with server virtualization technologies such as VMware, Xen, or KVM. Solid understanding of cloud-related concepts including virtualization, hypervisors, networking, and storage. Extensive knowledge of IP networking in both physical and virtual environments. Proficiency in scripting languages such as Python and Shell. Experience with automation tools like Ansible for provisioning, monitoring, and configuration management. Experience in monitoring and optimizing virtualization infrastructure performance. Familiarity with implementing security best practices and conducting regular security audits. Proven ability to troubleshoot complex virtualization and infrastructure issues. Relevant certifications such as RHCE, Red Hat OpenStack Administration, or Red Hat Certified OpenShift Administrator. Desired skills: Experience with 5G Core or similar telco-scale network environments. Knowledge of advanced networking technologies such as OVS, SRIOV, QEMU, InfiniBand Preferred Qualifications Experience: 6 to 10 years’ experience from Telecommunication or Networking background. Education: B.Tech/BE (CSE/ECE/EEE/IS) or any other equivalent degree Diversity and Inclusion Statement : Tejas Networks is an equal opportunity employer. We celebrate diversity and are committed to creating all-inclusive environment for all employees. We welcome applicants of all backgrounds regardless of race color, religion, gender, sexual orientation, age or veteran status. Our goal is to build a workforce that reflects the diverse communities we serve and to ensure every employee feels valued and respected.

Posted 1 month ago

Apply

3 - 6 years

8 - 12 Lacs

Bengaluru

Work from Office

Job TitleSenior Engineer – Virtualization LocationBengaluru Work EmploymentFull time DepartmentWireless DomainTesting Reporting toLead Engineer About Us: Tejas Networks is a global broadband, optical and wireless networking company, with a focus on technology, innovation and R&D. We design and manufacture high-performance wireline and wireless networking products for telecommunications service providers, internet service providers, utilities, defence and government entities in over 75 countries. Tejas has an extensive portfolio of leading-edge telecom products for building end-to-end telecom networks based on the latest technologies and global standards with IPR ownership. We are a part of the Tata Group, with Panatone Finvest Ltd. (a subsidiary of Tata Sons Pvt. Ltd.) being the majority shareholder. Tejas has a rich portfolio of patents and has shipped more than 900,000 systems across the globe with an uptime of 99.999%. Our product portfolio encompasses wireless technologies (4G/5G based on 3GPP and O-RAN standards), fiber broadband (GPON/XGS-PON), carrier-grade optical transmission (DWDM/OTN), packet switching and routing (Ethernet, PTN, IP/MPLS) and Direct-to-Mobile and Satellite-IoT communication platforms. Our unified network management suite simplifies network deployments and service implementation across all our products with advanced capabilities for predictive fault detection and resolution. As an R&D-driven company, we recognize that human intelligence is a core asset that drives the organization’s long-term success. Over 60% of our employees are in R&D, we are reshaping telecom networks, one innovation at a time. Why join Tejas: We are on a journey to connect the world with some of the most innovative products and solutions in the wireless and wireline optical networking domains. Would you like to be part of this journey and do something truly meaningful? Challenge yourself by working in Tejas’ fast-paced, autonomous learning environment and see your output and contributions become a part of live products worldwide. At Tejas, you will have the unique opportunity to work with cutting-edge technologies, alongside some of the industry’s brightest minds. From 5G to DWDM/ OTN, Switching and Routing, we work on technologies and solutions that create a connected society. Our solutions power over 500 networks across 75+ countries worldwide, and we’re constantly pushing boundaries to achieve more. If you thrive on taking ownership, have a passion for learning and enjoy challenging the status quo, we want to hear from you! Who we are This team is responsible for Platform and software validation for the entire product portfolio. They will develop automation Framework for the entire product portfolio. Team will develop and deliver customer documentation and training solutions. Compliance with technical certifications such as TL9000 and TSEC is essential for ensuring industry standards and regulatory requirements are met. Team works closely with PLM, HW and SW architects, sales and customer account teams to innovate and develop network deployment strategy for a broad spectrum of networking products and software solutions. As part of this team, you will get an opportunity to validate, demonstrate and influence new technologies to shape future optical, routing, fiber broadband and wireless networks. What you work: Design, build, and manage private cloud environments using technologies such as OpenStack, VMware ESXi, or CloudStack. Implement and maintain virtualization infrastructure, ensuring high availability, scalability, and disaster recovery. Develop and maintain automation scripts using Ansible and Python to streamline routine tasks including provisioning, monitoring, and configuration management. Implement Infrastructure-as-Code principles to ensure consistent and repeatable deployments. Monitor the performance of virtualization infrastructure, identifying and addressing bottlenecks. Optimize resource allocation and implement proactive measures to maintain optimal system performance. Implement security best practices in virtualized environments and conduct regular security audits. Address vulnerabilities and ensure compliance with relevant standards and policies. Provide expert technical support and troubleshooting for complex virtualization and infrastructure issues. Collaborate with cross-functional teams to resolve escalated incidents and ensure smooth operation of virtualized systems. Mandatory skills: BE/B.Tech./M.Tech. (EC/EE/CS) Degree with 3+ yrs. Building private clouds using platforms such as OpenStack, VMware ESXi, CloudStack, etc. Strong background in Linux/Unix system administration. Proven experience in managing and maintaining server environments. Hands-on experience with server virtualization technologies such as VMware, Xen, or KVM. Solid understanding of cloud-related concepts including virtualization, hypervisors, networking, and storage. Extensive knowledge of IP networking in both physical and virtual environments. Proficiency in scripting languages such as Python and Shell. Experience with automation tools like Ansible for provisioning, monitoring, and configuration management. Experience in monitoring and optimizing virtualization infrastructure performance. Familiarity with implementing security best practices and conducting regular security audits. Proven ability to troubleshoot complex virtualization and infrastructure issues. Relevant certifications such as RHCE, Red Hat OpenStack Administration, or Red Hat Certified OpenShift Administrator. Desired skills: Experience with 5G Core or similar telco-scale network environments. Knowledge of advanced networking technologies such as OVS, SRIOV, QEMU, InfiniBand Preferred Qualifications Experience: 3 to 6 years’ experience from Telecommunication or Networking background. Education: B.Tech/BE (CSE/ECE/EEE/IS) or any other equivalent degree Diversity and Inclusion Statement : Tejas Networks is an equal opportunity employer. We celebrate diversity and are committed to creating all-inclusive environment for all employees. We welcome applicants of all backgrounds regardless of race color, religion, gender, sexual orientation, age or veteran status. Our goal is to build a workforce that reflects the diverse communities we serve and to ensure every employee feels valued and respected.

Posted 1 month ago

Apply

5 - 10 years

5 - 9 Lacs

Gurugram

Work from Office

AHEAD builds platforms for digital business. By weaving together advances in cloud infrastructure, automation and analytics, and software delivery, we help enterprises deliver on the promise of digital transformation. AtAHEAD, we prioritize creating a culture of belonging,where all perspectives and voices are represented, valued, respected, and heard. We create spaces to empower everyone to speak up, make change, and drive the culture at AHEAD. We are an equal opportunity employer,anddo not discriminatebased onan individual's race, national origin, color, gender, gender identity, gender expression, sexual orientation, religion, age, disability, maritalstatus,or any other protected characteristic under applicable law, whether actual or perceived. We embraceall candidatesthatwillcontribute to the diversification and enrichment of ideas andperspectives atAHEAD. The High-Performance Computing Infrastructure Engineer is primarily responsible for the overall health and maintenance of storage technologies in our managed services customer's environments. Our HPC Infrastructure Engineers are a valued member of the Managed Services Infrastructure Practice responsible for Tier 3 incident management, service request management and change management infrastructure support for all Managed Services customers. Roles & Responsibilities Provide enterprise-level operational support to Managed Services customers for incident, problem, and change management activities Plan and perform maintenance activities Assess customer environments for performance and design issues and propose resolutions Work across technical teams to troubleshoot complex infrastructure issues Create and maintain detailed documentation Serve as a subject matter expert and escalation point for storage technologies Work with vendors to resolve storage issues Communicate with customers and internal team with transparency Participate in on-call rotation Completion of training and certification as assigned to further skills and knowledge Skills Required Bachelors degree or equivalent Information Systems or related field. Unique education, specialized experience, skills, knowledge, training, or certification may be substituted for education 5+ years of expert level experience managing infrastructure in high-performance computing environments including configuration, troubleshooting, and best practice. 1+ years of experience with Nvidia DGX preferred. Experience with high-performance computing (HPC) schedulers (e.g., SLURM, PBS, Torque) required. Experience configuring, maintaining and troubleshooting Kubernetes. Experience with storage technology (e.g., Ceph, Vast Data Platform) and distributed file systems (e.g., Lustre, GPFS, NFS, GlusterFS). Experience with machine learning or data science workflows in HPC/AI environments Advances experience with Linux operating systems. Experience configuring, maintaining and troubleshooting Nvidia/Mellanox (Cumulus OS) switches a plus Experience with both ethernet and InfiniBand networking a plus. 1+ years working with monitoring platforms (e.g., Prometheus, Grafana); Elastic Observability experience is a bonus 1+ years working with an enterprise ITSM systemService Now is a bonus Previous experience with automation tools such as Ansible, Puppet, or Chef a plus. Managed Services or consulting experience is required. Strong background with customer service High level problem-solving and communication skills Strong oral and written communications skills Related network certifications are a bonus. Why AHEAD: Through our daily work and internal groups like Moving Women AHEAD and RISE AHEAD, we value and benefit from diversity of people, ideas, experience, and everything in between. We fuel growth by stacking our office with top-notch technologies in a multi-million-dollar lab, by encouraging cross department training and development, sponsoring certifications and credentials for continued learning. USA Employment Benefits include - Medical, Dental, and Vision Insurance - 401(k) - Paid company holidays - Paid time off - Paid parental and caregiver leave - Plus more! See benefits https://www.aheadbenefits.com/ for additional details. The compensation range indicated in this posting reflects the On-Target Earnings (OTE) for this role, which includes a base salary and any applicable target bonus amount. This OTE range may vary based on the candidates relevant experience, qualifications, and geographic location.

Posted 1 month ago

Apply

5 - 10 years

4 - 8 Lacs

Gurugram

Work from Office

AHEAD builds platforms for digital business. By weaving together advances in cloud infrastructure, automation and analytics, and software delivery, we help enterprises deliver on the promise of digital transformation. AtAHEAD, we prioritize creating a culture of belonging,where all perspectives and voices are represented, valued, respected, and heard. We create spaces to empower everyone to speak up, make change, and drive the culture at AHEAD. We are an equal opportunity employer,anddo not discriminatebased onan individual's race, national origin, color, gender, gender identity, gender expression, sexual orientation, religion, age, disability, maritalstatus,or any other protected characteristic under applicable law, whether actual or perceived. We embraceall candidatesthatwillcontribute to the diversification and enrichment of ideas andperspectives atAHEAD. The High-Performance Computing Storage Engineer is primarily responsible for the overall health and maintenance of storage technologies in our managed services customer's environments. Our Storage Engineers are a valued member of the Managed Services Infrastructure Practice responsible for Tier 3 incident management, service request management and change management infrastructure support for all Managed Services customers. Key Responsibilities Provide enterprise-level operational support to Managed Services customers for incident, problem, and change management activities Plan and perform maintenance activities Assess customer environments for performance and design issues and propose resolutions Work across technical teams to troubleshoot complex infrastructure issues Create and maintain detailed documentation Serve as a subject matter expert and escalation point for storage technologies Work with vendors to resolve storage issues Communicate with customers and internal team with transparency Participate in on-call rotation Completion of training and certification as assigned to further skills and knowledge Skills Required Bachelors degree or equivalent Information Systems or related field. Unique education, specialized experience, skills, knowledge, training, or certification may be substituted for education 5+ years of expert level experience managing storage infrastructure in high-performance computing environments including, file systems, storage appliances, and data workflows. Experience configuring, maintaining, and tuning Ceph clusters. Experience configuring, maintaining, and tuning distributed file systems (e.g., Lustre, GPFS, NFS, GlusterFS) Experience with InfiniBand networking preferred. 1+ years working with monitoring platforms; Elastic Observability is a bonus 1+ years working with an enterprise ITSM systemService Now is a bonus Familiarity with high-performance computing (HPC) schedulers (e.g., SLURM, PBS, Torque) and their interaction with data storage systems. Understanding of data protection mechanisms, including data replication, backup strategies, and disaster recovery in HPC environments. Experience with containerization (Docker, Singularity) in an HPC context for data processing and application deployment. Solid working knowledge or Linux and scripting a plus. Experience with machine learning or data science workflows in HPC environments a plus. Managed Services or consulting experience is required. Strong background with customer service High level problem-solving and communication skills Strong oral and written communications skills Related Storage certifications are a bonus. Why AHEAD: Through our daily work and internal groups like Moving Women AHEAD and RISE AHEAD, we value and benefit from diversity of people, ideas, experience, and everything in between. We fuel growth by stacking our office with top-notch technologies in a multi-million-dollar lab, by encouraging cross department training and development, sponsoring certifications and credentials for continued learning. USA Employment Benefits include - Medical, Dental, and Vision Insurance - 401(k) - Paid company holidays - Paid time off - Paid parental and caregiver leave - Plus more! See benefits https://www.aheadbenefits.com/ for additional details. The compensation range indicated in this posting reflects the On-Target Earnings (OTE) for this role, which includes a base salary and any applicable target bonus amount. This OTE range may vary based on the candidates relevant experience, qualifications, and geographic location.

Posted 1 month ago

Apply
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Featured Companies