Sr. Site Reliability Engineer (SRE)

4 - 6 years

15 - 16 Lacs

Posted:5 hours ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

About the Role
We are seeking a highly skilled Sr. Site Reliability Engineer (SRE) to lead the implementation, optimization, and management of our observability stack across cloud infrastructure. You will play a key role in ensuring the reliability, scalability, and performance of our platform, spanning microservices on Kubernetes/EC2 and mission-critical systems. This role requires strong problem-solving, automation mindset, and a proactive approach to incident management.Key ResponsibilitiesDesign, implement, and manage monitoring, logging, and alerting systems across production and non-production environments.Lead incident response, root cause analysis, and post-mortem practices for continuous improvement.Define and implement disaster recovery strategies with regular testing.Collaborate with development teams to define and track SLAs/SLOs for critical services.Optimize AWS cloud infrastructure for cost efficiency, reliability, and scalability.Build and maintain automation frameworks for deployment, scaling, and recovery using Terraform, GitLab CI/CD, and Kubernetes.Administer Kubernetes clusters, troubleshoot performance bottlenecks, and ensure high availability.Manage databases (PostgreSQL or similar), including replication and disaster recovery strategies.Contribute to infrastructure security, compliance, and best practices.Participate in the on-call rotation and handle high-priority incidents under pressure.Required Skills & Experience4+ years of experience as an SRE, DevOps, or similar role.Strong hands-on experience with AWS services: EC2, EKS, RDS, Cognito, CloudWatch, etc.Proven expertise in Kubernetes administration in production environments.Proficiency in scripting/programming: Python, Bash, Chef (recipes, cookbooks), Ansible.Strong knowledge of Infrastructure as Code (Terraform/CloudFormation).Deep experience with observability tools: Prometheus, Grafana, ELK stack, distributed tracing.Database administration experience with PostgreSQL or similar systems.Understanding of network protocols, load balancing, and security best practices.Experience in CI/CD pipelines and GitOps workflows.Ability to handle multiple incidents and prioritize effectively under pressure.Exposure to monitoring solutions like Splunk, Datadog, Dynatrace.Preferred QualificationsAWS Certified Solutions Architect or AWS DevOps Engineer certification.Certified Kubernetes Administrator (CKA).Why Join UsBe part of a fast-growing HealthTech startup transforming healthcare technology.Work with modern tools, cutting-edge infrastructure, and a collaborative team.Opportunity to own end-to-end infrastructure reliability and automation.Competitive salary and growth opportunities.

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
Wits Innovation Lab logo
Wits Innovation Lab

Technology/Innovation

Johannesburg

RecommendedJobs for You