8 - 10 years

14 - 15 Lacs

Posted:1 week ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

We are looking for an experienced Lead Site Reliability Engineer (SRE) based in India to embed into our U-S--based SRE and development teams- This is a hands-on engineering role, with at least 50% of the time expected to be spent building automation, writing scripts, reviewing logs, improving infrastructure code, or analyzing system performance- You will be accountable for platform observability, uptime, and incident response across a suite of high-scale applications- This role demands operational excellence, deep systems knowledge, and the ability to collaborate with U-S--based engineers during critical situations- In addition to traditional SRE responsibilities, candidates must be capable of reading and analyzing application code (Java/Spring) during root cause investigations, providing a deeper level of diagnostic capability than typical offshore support models- Experience with Salesforce systems is a strong plus-

 

Key Responsibilities:

  • Collaborate with U-S--based counterparts to define and monitor service SLOs, SLAs, and key performance indicators-
  • Root cause analysis, blameless postmortems, and reliability improvements across environments-
  • Serve as the senior technical contributor in a cross-border team focused on Loyalty, Mobile App, CRM, and Customer Pickup-
  • Review application code (primarily Java/Spring) to assist in identifying defects and systemic performance issues-
  • Automate deployment pipelines, recovery workflows, and runbook processes to minimize manual intervention-
  • Build and manage dashboards, alerts, and health checks using tools like DataDog, Azure Monitor, Prometheus, and Grafana-
  • Contribute to architectural decisions with a lens on performance and operability-
  • Guide and mentor offshore team members in incident response and production readiness-
  • Participate in 24x7 support rotations aligned with EST coverage expectations-

Required Experience & Skills:

  • 8-10 years in SRE, DevOps, or platform engineering experience, ideally supporting U-S- enterprise systems-
  • Strong hands-on experience with Java/Spring Boot applications, with the ability to assist in code-level troubleshooting-
  • Cloud infrastructure knowledge (Azure preferred) and container orchestration (Kubernetes)-
  • Proficient with logging/monitoring stacks (DataDog, ELK, Azure Monitor, Dynatrace, Splunk)-
  • Experience with ServiceNow (SNOW) for ITSM processes-
  • Experience with Terraform or ARM templates, CI/CD automation, and scripting (Python, Bash)-
  • Familiarity with Salesforce systems highly preferred-
  • Excellent communication skills and outstanding problem-solving ability in distributed environments-
  • Demonstrated history of improving stability, availability, and delivery velocity for large-scale platforms-

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
Systems Plus Solutions Pvt Ltd logo
Systems Plus Solutions Pvt Ltd

IT Services and IT Consulting

Dallas Texas

RecommendedJobs for You

Pune, Chennai, Bengaluru