Site Reliability Engineering Technical Leader, Network Assurance Data Platform Cisco ThousandEyes

8.0 - 12.0 years

0 Lacs

karnataka

On-site

As a Site Reliability Engineering (SRE) Technical Leader on the Network Assurance Data Platform (NADP) team at Cisco ThousandEyes, you will be responsible for ensuring the reliability, scalability, and security of the cloud and big data platforms. Your role will involve representing the NADP SRE team, contributing to the technical roadmap, and collaborating with cross-functional teams to design, build, and maintain SaaS systems operating at multi-region scale. Your efforts will be crucial in supporting machine learning (ML) and AI initiatives by ensuring the platform infrastructure is robust, efficient, and aligned with operational excellence. You will be tasked with designing, building, and optimizing cloud and data infrastructure to guarantee high availability, reliability, and scalability of big-data and ML/AI systems. This will involve implementing SRE principles such as monitoring, alerting, error budgets, and fault analysis. Additionally, you will collaborate with various teams to create secure and scalable solutions, troubleshoot technical problems, lead the architectural vision, and shape the technical strategy and roadmap. Your role will also encompass mentoring and guiding teams, fostering a culture of engineering and operational excellence, engaging with customers and stakeholders to understand use cases and feedback, and utilizing your strong programming skills to integrate software and systems engineering. Furthermore, you will develop strategic roadmaps, processes, plans, and infrastructure to efficiently deploy new software components at an enterprise scale while enforcing engineering best practices. To be successful in this role, you should have relevant experience (8-12 yrs) and a bachelor's engineering degree in computer science or its equivalent. You should possess the ability to design and implement scalable solutions, hands-on experience in Cloud (preferably AWS), Infrastructure as Code skills, experience with observability tools, proficiency in programming languages such as Python or Go, and a good understanding of Unix/Linux systems and client-server protocols. Experience in building Cloud, Big data, and/or ML/AI infrastructure is essential, along with a sense of ownership and accountability in architecting software and infrastructure at scale. Additional qualifications that would be advantageous include experience with the Hadoop Ecosystem, certifications in cloud and security domains, and experience in building/managing a cloud-based data platform. Cisco encourages individuals from diverse backgrounds to apply, as the company values perspectives and skills that emerge from employees with varied experiences. Cisco believes in unlocking potential and creating diverse teams that are better equipped to solve problems, innovate, and make a positive impact.,

Posted 1 week ago

Apply

Title Site Reliability Engineering Technical Leader Cisco ThousandEyes

8.0 - 12.0 years

0 Lacs

karnataka

On-site

As a Site Reliability Engineering (SRE) Technical Leader on the Network Assurance Data Platform (NADP) team at ThousandEyes, you will be responsible for ensuring the reliability, scalability, and security of cloud and big data platforms. Your role will involve representing the NADP SRE team, working in a dynamic environment, and providing technical leadership in defining and executing the team's technical roadmap. Collaborating with cross-functional teams, including software development, product management, customers, and security teams, is essential. Your contributions will directly impact the success of machine learning (ML) and AI initiatives by ensuring a robust and efficient platform infrastructure aligned with operational excellence. In this role, you will design, build, and optimize cloud and data infrastructure to ensure high availability, reliability, and scalability of big-data and ML/AI systems. Collaboration with cross-functional teams will be crucial in creating secure, scalable solutions that support ML/AI workloads and enhance operational efficiency through automation. Troubleshooting complex technical problems, conducting root cause analyses, and contributing to continuous improvement efforts are key responsibilities. You will lead the architectural vision, shape the team's technical strategy and roadmap, and act as a mentor and technical leader to foster a culture of engineering and operational excellence. Engaging with customers and stakeholders to understand use cases and feedback, translating them into actionable insights, and effectively influencing stakeholders at all levels are essential aspects of the role. Utilizing strong programming skills to integrate software and systems engineering, building core data platform capabilities and automation to meet enterprise customer needs, is a crucial requirement. Developing strategic roadmaps, processes, plans, and infrastructure to efficiently deploy new software components at an enterprise scale while enforcing engineering best practices is also part of the role. Qualifications for this position include 8-12 years of relevant experience and a bachelor's engineering degree in computer science or its equivalent. Candidates should have the ability to design and implement scalable solutions with a focus on streamlining operations. Strong hands-on experience in Cloud, preferably AWS, is required, along with Infrastructure as a Code skills, ideally with Terraform and EKS or Kubernetes. Proficiency in observability tools like Prometheus, Grafana, Thanos, CloudWatch, OpenTelemetry, and the ELK stack is necessary. Writing high-quality code in Python, Go, or equivalent programming languages is essential, as well as a good understanding of Unix/Linux systems, system libraries, file systems, and client-server protocols. Experience in building Cloud, Big data, and/or ML/AI infrastructure, architecting software and infrastructure at scale, and certifications in cloud and security domains are beneficial qualifications for this role. Cisco emphasizes diversity and encourages candidates to apply even if they do not meet every single qualification. Diverse perspectives and skills are valued, and Cisco believes that diverse teams are better equipped to solve problems, innovate, and create a positive impact.,

Posted 2 weeks ago

Apply

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.