Lead Software Engineer - Site Reliability

10 - 12 years

32 - 37 Lacs

Posted:1 week ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Job Description

Key Responsibilities

  • Design and implement tools to improve availability, latency, scalability, and system health.
  • Define SLIs/SLOs, manage error budgets, and drive performance engineering efforts.
  • Build and maintain automated monitoring, alerting, and remediation pipelines.
  • Collaborate with engineering teams to improve reliability by design.
  • Lead incident response, root cause analysis, and blameless postmortems.
  • Champion observability across services logs, metrics, traces.
  • Contribute to infrastructure architecture, automation, and reliability roadmaps.
  • Advocate for SRE best practices across teams and functions.

Requirements

  • 4-12 years of experience in SRE, DevOps, or Production Engineering roles.
  • Coding Proficiency

    : Develop clear, efficient, and well-structured code.
  • Linux Expertise

    : In-depth knowledge of Linux for system administration and advanced troubleshooting.
  • Containerization & Orchestration

    : Practical experience with Docker and Kubernetes for application deployment and management.
  • CI/CD Management

    : Design, implement, and maintain Continuous Integration and Continuous Delivery pipelines.
  • Security & Compliance

    : Understand security best practices and compliance in infrastructure.
  • High Availability & Scalability

    : Design and implement highly available, scalable, and resilient distributed systems.
  • Infrastructure as Code (IaC) & Automation

    : Proficient in IaC tools and automating infrastructure provisioning and management.
  • Disaster Recovery (DR) & High Availability (HA)

    : Deep knowledge and practical experience with various DR and HA strategies.
  • Observability

    : Implement and utilize monitoring, logging, and tracing tools for system health.
  • System Design (Distributed Systems)

    : Design complex distributed systems with a focus on reliability and operations.
  • Problem-Solving & Troubleshooting

    : Excellent analytical and diagnostic skills for resolving complex system issues.

Qualifications

Technical Skills & Experience

  • Extensive hands-on experience of 7-12 Years with relational databases (e.g., MySQL, PostgreSQL, SQL Server) and distributed NoSQL systems

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
Freshworks logo
Freshworks

Software / SaaS

Chennai

RecommendedJobs for You