AI Infrastructure Engineer - GoLang, Kubernetes, Dockers, Linux

7 - 12 years

20 - 25 Lacs

Posted:1 day ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Impact

Cisco is seeking an experienced and innovative Control Plane Engineer to develop the control plane for the next-generation AI infrastructure. This role focuses on designing and implementing scalable, reliable, and efficient control plane components to manage AI workloads. The ideal candidate will have a strong background in microservices architecture, Kubernetes, and distributed systems, along with expertise in modern programming languages and cloud-native technologies.

As AI Control Plane Engineer your work will have a significant impact on:

  • Enabling seamless orchestration and management of AI workloads across distributed environments.
  • Improving the reliability, scalability, and performance of AI control plane services.
  • Ensuring operational simplicity and ease of debugging for control plane services.
  • Driving innovation in control plane architecture to improve efficiency and performance of AI infrastructure.
  • Your contributions will empower Cisco to deliver best-in-class AI infrastructure solutions and help customers scale AI workloads with confidence.

Key Responsibilities:

  • Design and implement control plane components using Golang AND Python,
  • Leverage Kubernetes (K8s) concepts, CRDs, and operator patterns (e.g., Kubebuilder) to build control plane services.
  • Develop scalable and highly available (HA) microservices that span across regions, ensuring reliability at scale.
  • Build and maintain GRPC, REST APIs, and CLI tools for seamless integration and control of AI infrastructure.
  • Address operational challenges of running applications as SaaS, focusing on ease of deployment and lifecycle management.
  • Establish and follow best practices for release management, including CI/CD pipelines and version control hygiene.
  • Design and implement strategies for live upgrades to minimize downtime and ensure service continuity.
  • Develop and implement telemetry collection mechanisms using eBPF and other tools to provide insights into system performance and health.
  • Define and monitor SLA/SLO metrics to ensure the reliability of control plane services.
  • Design and manage stateful applications, ensuring high performance and reliability of underlying databases.
  • Build systems with debuggability in mind, simplifying troubleshooting and remediation for operational teams.
  • Collaborate with other teams to ensure control plane compatibility with GPU/CUDA-related technologies.

Minimum Qualifications:

  • Proficiency in Golang and Python.
  • Strong expertise in Kubernetes (K8s), including CRDs, the operator pattern, and tools like Kubebuilder and Helm.
  • Experience with API design and implementation (GRPC, REST APIs, and CLI).
  • Proven track record in designing and scaling highly available microservices across regions. Strong understanding of distributed systems architecture and fundamentals.
  • Familiarity with telemetry tools and SLA/SLO monitoring for large-scale systems.
  • Strong debugging skills and experience building systems with easy remediation capabilities.
  • Passion for learning and staying updated on the latest trends in AI infrastructure and cloud-native technologies.
  • Bachelors degree+ and relevant 7+ years of Engineering work experience.

Preferred Qualifications:

  • Proficiency in programming languages such as C++, Golang.
  • Hands-on experience with eBPF for collecting insights and optimizing system performance.
  • Knowledge of GPU/CUDA technologies and their integration into infrastructure systems.
  • Knowledge of release management best practices, including versioning and rollback mechanisms.
  • Familiarity with SaaS operational models and challenges.

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Golang Skills

Practice Golang coding challenges to boost your skills

Start Practicing Golang Now
Cisco logo
Cisco

Software Development

San Jose CA

RecommendedJobs for You