Observability Engineer/AWS Devops Engineer

7 - 12 years

9 - 19 Lacs

Posted:2 days ago| Platform: Naukri logo

Apply

Work Mode

Hybrid

Job Type

Full Time

Job Description

Job Summary:

Observability Engineer

Key Responsibilities:

  • Design & Implement Observability Solutions

    : Develop and maintain monitoring, logging, and tracing solutions using industry-leading tools (Prometheus, Grafana, Datadog, New Relic, Splunk, etc.).
  • Performance Monitoring & Optimization

    : Ensure proactive identification and resolution of performance bottlenecks in distributed systems.
  • Logging & Tracing

    : Set up and manage centralized logging solutions (ELK/EFK stack, Fluentd, OpenTelemetry).
  • Alerting & Incident Management

    : Configure alerting mechanisms using tools like PagerDuty, Opsgenie, or VictorOps for proactive issue detection.
  • SRE Practices

    : Implement Site Reliability Engineering (SRE) principles to enhance system reliability and reduce MTTR (Mean Time to Resolution).
  • Automation & Infrastructure as Code (IaC)

    : Automate observability setup and configurations using Terraform, Ansible, or similar tools.
  • Cloud & Kubernetes Monitoring

    : Implement observability best practices for cloud platforms (AWS, Azure, GCP) and containerized environments (Kubernetes, Docker).
  • Collaboration

    : Work closely with development, SRE, and operations teams to ensure end-to-end observability of applications and services.
  • Compliance & Security

    : Ensure logging and monitoring solutions adhere to security and compliance requirements.

Required Skills & Qualifications:

  • 6-10 years of experience

    in DevOps, SRE, or Observability engineering.
  • Strong hands-on experience with observability tools like

    Prometheus, Grafana, New Relic, Datadog, Splunk, ELK/EFK, OpenTelemetry, AppDynamics, etc.

  • Experience in setting up distributed tracing solutions (Jaeger, Zipkin, OpenTelemetry).
  • Expertise in

    Kubernetes monitoring

    using Prometheus, Thanos, Loki, or similar tools.
  • Strong proficiency in scripting (Python, Bash, Shell) for automation.
  • Hands-on experience with

    Terraform, Ansible, Helm, or CloudFormation

    for infrastructure automation.
  • Proficiency in

    CI/CD pipelines

    and GitOps methodologies using Jenkins, GitLab CI, ArgoCD, or Flux.
  • Experience in

    public cloud environments (AWS, Azure, GCP)

    and monitoring cloud-native services.
  • Strong troubleshooting and root cause analysis (RCA) skills.
  • Understanding of SLIs, SLOs, and error budgets as part of SRE best practices.
  • Familiarity with log management, anomaly detection, and AI-based observability solutions is a plus.

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
Data Economy logo
Data Economy

IT Services and IT Consulting

Data City

RecommendedJobs for You

hyderabad, chennai, bengaluru

mumbai, hyderabad, bengaluru