Site Reliability Engineer

5 - 10 years

7 - 12 Lacs

Posted:6 days ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Overview
We are seeking a forward-thinking Cloud Ops Observability Lead to drive the strategy, implementation, and continuous improvement of observability across our cloud environments (AWS and Azure). This role will lead efforts to ensure our systems are measurable, reliable, and transparent, enabling proactive operations and rapid incident response.

Key Responsibilities
  • Lead the design and implementation of observability frameworks including logging, metrics, tracing, and event correlation.
  • Own the strategy and tooling for cloud-native monitoring across AWS and Azure, integrating with operational workflows.
  • Collaborate with Cloud Engineering, Platform, and Security teams to ensure observability is embedded in infrastructure and applications.
  • Establish and maintain SLOs, SLIs, and error budgets to drive reliability and performance improvements.
  • Drive incident response readiness, including alerting strategies, runbooks, and post-incident analysis.
  • Champion a culture of proactive operations, using data to identify trends, prevent outages, and optimize performance.
Required Qualifications
  • 5+ years of experience in cloud operations, site reliability engineering, or observability roles.
  • Strong expertise in monitoring and observability tools (e.g., Datadog, Prometheus, Grafana, CloudWatch, Azure Monitor).
  • Deep understanding of AWS and Azure architectures, including networking, compute, and managed services.
  • Experience with SRE principles, incident management, and operational analytics.
  • Proficiency in scripting and automation (e.g., Python, PowerShell, Bash).
  • Strong communication and stakeholder engagement skills.
Preferred Qualifications
  • Experience implementing OpenTelemetry, distributed tracing, and log aggregation pipelines.
  • Familiarity with AIOps, anomaly detection, and predictive analytics.
  • Exposure to FinOps and cost-aware observability practices.
  • Experience with chaos engineering and resilience testing.

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
Globalfoundries logo
Globalfoundries

Semiconductor Manufacturing

Malta NY

RecommendedJobs for You

hyderabad, bengaluru, mumbai (all areas)