Home
Jobs

Site Reliability Engineer (SRE)

1 - 6 years

7 - 17 Lacs

Posted:1 day ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Job Summary

Site Reliability Engineers (SRE's) cover the intersection of Software Engineer and Systems Administrator. In other words, they can both create code and manage the infrastructure on which the code runs. This is a very wide skillset, but the end goal of an SRE is always the same: to ensure that all SLAs are met, but not exceeded, so as to balance performance and reliability with operational costs.

As a Site Reliability Engineer II, you will be learning our systems, improving your craft as an engineer, and taking on tasks that improve the overall reliability of the VP platform.

Key Responsibilities:

  • Design, implement, and maintain robust monitoring and alerting systems.
  • Lead observability initiatives by improving metrics, logging, and tracing across services and infrastructure.
  • Collaborate with development and infrastructure teams to instrument applications and ensure visibility into system health and performance.
  • Write Python scripts and tools for automation, infrastructure management, and incident response.
  • Participate in and improve the incident management and on-call process, driving down Mean Time to Resolution (MTTR).
  • Conduct root cause analysis and postmortems following incidents and champion efforts to prevent recurrence.
  • Optimize systems for scalability, performance, and cost-efficiency in cloud and containerized environments.
  • Advocate and implement SRE best practices, including SLOs/SLIs, capacity planning, and reliability reviews.

Required Skills & Qualifications:

  • 1+ years of experience in a Site Reliability Engineer or similar role.
  • Excellent communicaiton skills in English.
  • Proficiency in Python for automation and tooling.
  • Hands-on experience with monitoring and observability tools such as Prometheus, Grafana, Datadog, New Relic, Open Telemetry, etc.
  • Experience with log aggregation and analysis tools like ELK Stack (Elasticsearch, Logstash, Kibana) or Fluentd.
  • Good understanding of cloud platforms (AWS, GCP, or Azure) and container orchestration (Kubernetes).
  • Familiarity with infrastructure-as-code (Terraform, Ansible, or similar).
  • Strong debugging and incident response skills.
  • Knowledge of CI/CD pipelines and release engineering practices.

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
Cloud Angles Digital Transformation
Cloud Angles Digital Transformation

Information Technology

Tech City

RecommendedJobs for You