Manager, Site Reliability Engineering

5 - 12 years

0 Lacs

Posted:5 days ago| Platform: Shine logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Role Overview: As an experienced hands-on Cloud SRE Manager at Palo Alto Networks, you will lead high-severity incident and problem management across GCP-centric platforms. Your role involves a combination of deep technical troubleshooting and process ownership to ensure rapid recovery, root cause elimination, and long-term reliability improvements. You will be responsible for L3 OnCall duties, driving post-incident learning, and advocating for automation and operational excellence. Key Responsibilities: - Implement and lead post-mortem processes within SLAs, identify root causes, and drive corrective actions to reduce repeat incidents. - Rapidly diagnose and resolve failures across Kubernetes, Terraform, and GCP using advanced troubleshooting frameworks. - Implement automation and enhanced monitoring to proactively detect issues and reduce incident frequency. - Work with GCP/AWS TAMs and other vendors to request new features or follow-ups for updates. - Coach and elevate SRE and DevOps teams, promoting best practices in reliability and incident/problem management. - Envision the future of SRE with AI/ML by leveraging modern technologies. Qualifications Required: - 12+ years of experience in SRE/DevOps/Infrastructure roles, with a strong foundation in GCP cloud-based environments. - 5+ years of proven experience managing SRE/DevOps teams, preferably focusing on Google Cloud Platform (GCP). - Deep hands-on knowledge of Terraform, Kubernetes (GKE), GitLab CI/CD, and modern observability practices (e.g., Prometheus, OpenTelemetry). - Strong knowledge in Data Platforms like BigQuery, Cassandra, Kafka, PostgreSQL, and MySQL is mandatory. - Proficiency with cloud platforms such as GCP & AWS. - Experience in managing incident response and postmortems, reducing MTTR, and driving proactive reliability improvements. - Expertise in SLI/SLO/SLA design and implementation, and driving operational maturity through data. - Strong interpersonal and leadership skills, with the ability to coach, mentor, and inspire teams. - Effective communicator, capable of translating complex technical concepts to non-technical stakeholders. - Committed to inclusion, collaboration, and creating a culture where every voice is heard and respected.,

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
Palo Alto Networks logo
Palo Alto Networks

Cybersecurity

Santa Clara

RecommendedJobs for You