We are seeking a highly skilled Senior Site Reliability Engineer (SRE) to join our team. This role involves ensuring the reliability, scalability, and efficiency of cloud infrastructure and applications while implementing SRE best practices for deployment, monitoring, and automation. As a senior member, you will lead efforts in system reliability, mentor junior engineers, and drive improvements in infrastructure automation.

Key Responsibilities:

Design, build, and maintain scalable and reliable cloud infrastructure.

Ensure System Reliability: Maintain uptime, scalability, and performance across production environments.

Monitor & Alerting Setup: Configure real-time monitoring and observability dashboards.

Automate Everything: Reduce toil by scripting repetitive tasks, CI/CD, and self-healing mechanisms.

Incident Response & RCA: Own on-call rotations, resolve P1/P2 incidents, and create blameless postmortems.

Optimize Costs & Performance: Work on cloud cost optimization (FinOps), database tuning, and caching strategies.

Security & Compliance: Implement least privilege access, encryption, and vulnerability assessments.

Infrastructure as Code (IaC): Deploy and manage infra with Terraform, Ansible, Helm.

Capacity Planning & Scaling: Ensure load balancing, horizontal scaling, and traffic routing.

Process Documentation: Maintain detailed SOPs, incident response guides, and architecture diagrams.

Lead the implementation of CI/CD pipelines for application deployments.

Manage and optimize Kubernetes clusters and containerized workloads.

Collaborate with development and operations teams to ensure smooth deployment of applications.

Troubleshoot and resolve incidents, ensuring minimal downtime for production services.

Mentor and provide guidance to junior engineers, fostering a culture of reliability and automation.

Required Skills & Qualifications:

7+ years of experience in Site Reliability Engineering (SRE), DevOps, or cloud infrastructure roles.

Hands-on experience with cloud platforms (Azure).

Strong experience with CI/CD tools (GitHub Actions, Jenkins, or Azure Pipelines).

Proficiency in Python, Bash, or PowerShell for automation.

Extensive experience with Infrastructure as Code (Terraform).

Expertise in monitoring tools such as Datadog.

Strong understanding of networking, security, and containerization (Docker, Kubernetes).

Proven track record in leading and mentoring teams.

More Jobs at Harbinger Systems Private Limited

Sr. Tech Lead - Site Reliability

Pune, Maharashtra, India

8.0 - 13.0 yrs

INR 15 - 30 Lacs

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

Harbinger Systems Private Limited

Login to

Please Verify Your Phone or Email

Confirm Action

Search

Profile

Upskill and Grow with AI

Sr. Tech Lead - Site Reliability