Home
Jobs

Sr. Tech Lead - Site Reliability

8 - 13 years

15 - 30 Lacs

Posted:4 days ago| Platform: Foundit logo

Apply

Skills Required

Work Mode

On-site

Job Type

Full Time

Job Description

We are seeking a highly skilled Senior Site Reliability Engineer (SRE) to join our team. This role involves ensuring the reliability, scalability, and efficiency of cloud infrastructure and applications while implementing SRE best practices for deployment, monitoring, and automation. As a senior member, you will lead efforts in system reliability, mentor junior engineers, and drive improvements in infrastructure automation.

Key Responsibilities:

Design, build, and maintain scalable and reliable cloud infrastructure.

Ensure System Reliability: Maintain uptime, scalability, and performance across production environments.

Monitor & Alerting Setup: Configure real-time monitoring and observability dashboards.

Automate Everything: Reduce toil by scripting repetitive tasks, CI/CD, and self-healing mechanisms.

Incident Response & RCA: Own on-call rotations, resolve P1/P2 incidents, and create blameless postmortems.

Optimize Costs & Performance: Work on cloud cost optimization (FinOps), database tuning, and caching strategies.

Security & Compliance: Implement least privilege access, encryption, and vulnerability assessments.

Infrastructure as Code (IaC): Deploy and manage infra with Terraform, Ansible, Helm.

Capacity Planning & Scaling: Ensure load balancing, horizontal scaling, and traffic routing.

Process Documentation: Maintain detailed SOPs, incident response guides, and architecture diagrams.

Lead the implementation of CI/CD pipelines for application deployments.

Manage and optimize Kubernetes clusters and containerized workloads.

Collaborate with development and operations teams to ensure smooth deployment of applications.

Troubleshoot and resolve incidents, ensuring minimal downtime for production services.

Mentor and provide guidance to junior engineers, fostering a culture of reliability and automation.

Required Skills & Qualifications:

7+ years of experience in Site Reliability Engineering (SRE), DevOps, or cloud infrastructure roles.

Hands-on experience with cloud platforms (Azure).

Strong experience with CI/CD tools (GitHub Actions, Jenkins, or Azure Pipelines).

Proficiency in Python, Bash, or PowerShell for automation.

Extensive experience with Infrastructure as Code (Terraform).

Expertise in monitoring tools such as Datadog.

Strong understanding of networking, security, and containerization (Docker, Kubernetes).

Proven track record in leading and mentoring teams.

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You

Bengaluru, Karnataka, India

Mumbai, Maharashtra, India

Mumbai, Maharashtra, India

Mumbai, Maharashtra, India

Noida, Uttar Pradesh, India