Posted:1 week ago|
Platform:
Work from Office
Full Time
Are you an SRE Operations specialist with automation in your DNA? Do you thrive in fast-paced SaaS environments where cloud meets global infrastructure? We are looking for a top-tier SRE to drive Logs, Metrics, and Alerting, with a deep focus on Alerting automation at massive scale.
Why This Role is Unique:Our SaaS is hybrid running across public cloud and a global network of 50+ PoPs, delivering terabits of capacity. Our infrastructure spans cloud-native services and physical networking gear (routers, switches, firewalls), creating a uniquely challenging and exciting observability landscape. The Analytics & Observability platform will have deep reach across these layers, ensuring reliability, security, and performance at a massive scale.
What You'll Do:-- Be the Force Behind Observability & Stability
-- Own & Automate Operations
-- Lead Incident Response & Operational Excellence
-- Collaborate & Mentor
What Makes You a Great Fit-
- Deep expertise in Logs, Metrics, and Alerting, with a strong focus on Alerting automation.
- Experience in hybrid SaaS environments spanning cloud-native and global infrastructure.
- Strong background in Kubernetes, Infrastructure-as-Code (Terraform), Golang, AWS/GCP, and networking observability.
- Proven track record of eliminating toil and improving operational efficiency through automation.
- Passion for deep observability, networking-scale analytics, and automation at the edge.
If you love solving reliability challenges at global scale, automating everything, and working in a hybrid cloud + networking environment, we want to talk to you!
Must-Have:
Observability & Alerting Expertise Strong experience with Logs, Metrics, and Alerts, with a focus on high-fidelity alerting and automation. Automation & Infrastructure as Code Deep knowledge of Terraform, ArgoCD, Helm, Kubernetes, and Golang for automation. Cloud & Hybrid SaaS Experience Hands-on experience managing cloud-native (AWS/GCP) and edge infrastructure. Incident Response & Reliability Engineering Strong on-call experience, with a track record of reducing MTTR through automation Kubernetes Mastery Hands-on experience deploying, managing, and troubleshooting Kubernetes in production environments.
Nice-to-Have:
Networking & Edge Observability Familiarity with monitoring routers, switches, and firewalls in a global PoP environment. Data & Analytics in Observability Experience with time-series databases (Prometheus, Grafana, OpenTelemetry, etc.). Security & Compliance Awareness Understanding of secure-by-design principles for monitoring & alerting. Mentorship & Collaboration Ability to mentor junior engineers and work cross-functionally with SREs, application teams, and network engineers. High Availability Disaster Recovery: Experience with HA/DR and Migration
Qualifications
Environment
F5
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Practice Golang coding challenges to boost your skills
Start Practicing Golang Now17.0 - 22.5 Lacs P.A.
17.0 - 22.5 Lacs P.A.
17.0 - 22.5 Lacs P.A.
Faridabad
17.0 - 22.5 Lacs P.A.
Ghaziabad
17.0 - 22.5 Lacs P.A.
17.0 - 22.5 Lacs P.A.
17.0 - 22.5 Lacs P.A.
17.0 - 22.5 Lacs P.A.
17.0 - 22.5 Lacs P.A.
Navi Mumbai
17.0 - 22.5 Lacs P.A.