Associate Manager SRE

5 years

0 Lacs

Posted:2 days ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Overview

We are seeking a self-driven, inquisitive, and curious Site Reliability Engineer (SRE) to drive reliability, availability, performance, and security across our global digital product ecosystem. This role is central to ensuring a seamless and resilient experience for our users by blending deep engineering expertise with operational excellence and automation.You will be part of a global SRE practice supporting a portfolio of 260+ modern cloud-native applications across consumer, commercial, supply chain, and enablement functions. Your mission: prevent incidents before they occur, ensure rapid recovery when they do, and build scalable systems that evolve with our growing business.

Responsibilities

Champion reliability, observability, and operational excellence across mission-critical applications.
  • Develop and maintain service-level indicators (SLIs), objectives (SLOs), and error budgets to measure and improve system performance.
  • Implement automated monitoring, alerting, and recovery mechanisms to reduce manual intervention and improve response times.
  • Collaborate closely with software engineering, platform, and operations teams to embed SRE practices across the development lifecycle.
  • Lead and participate in incident response, root cause analysis, and postmortem reviews to drive long-term improvements.
  • Identify and eliminate sources of toil through automation, tooling, and process refinement.
  • Continuously improve resiliency design, capacity planning, and release management in production systems.
  • Influence engineering teams with best practices on cloud-native architecture, observability, and deployment strategies.

Qualifications

Required Skills:
  • 5+ years of experience in production engineering, DevOps, or SRE roles.
  • Strong foundation in Linux systems, networking, and cloud platforms (Azure, AWS, or GCP).
  • Hands-on experience with observability tools (e.g., AppDynamics, Prometheus, Grafana, ELK, FullStory).
  • Proficiency in scripting or programming (e.g., Python, Bash, Go) and automation frameworks (e.g., Ansible, Terraform).
  • Deep understanding of CI/CD pipelines, release strategies, and deployment automation.
  • Experience in managing high-scale, distributed systems in cloud-native environments.
  • Strong analytical skills and a passion for continuous improvement.
Preferred Skills:
  • Familiarity with microservices, Kubernetes, containers, and service mesh architecture.
  • Exposure to incident and problem management frameworks (e.g., ITIL, RCA practices).
  • Experience working in global teams supporting mission-critical applications.

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You

Hyderabad, Telangana, India

Hyderabad, Telangana, India