Posted:1 week ago| Platform: Foundit logo

Apply

Skills Required

Work Mode

On-site

Job Type

Full Time

Job Description

Key Responsibilities:

  • Monitoring & Alerting

    :
  • Develop, maintain, and enhance monitoring and alerting systems using

    Datadog

    to proactively identify and address potential issues, ensuring optimal system performance.
  • CI/CD Pipelines

    :
  • Participate in the design and implementation of

    CI/CD pipelines

    using

    Azure DevOps

    , enabling automated and reliable software delivery.
  • Incident Response

    :
  • Lead efforts in

    incident response

    and troubleshooting to quickly diagnose and resolve production incidents, minimizing downtime and impact on users.
  • Reliability Initiatives

    :
  • Take ownership of

    reliability initiatives

    by identifying areas for improvement, conducting

    root cause analysis

    , and implementing solutions to prevent recurrence of incidents.
  • Collaboration

    :
  • Collaborate with cross-functional teams to ensure

    security

    ,

    compliance

    , and

    performance standards

    are met throughout the development lifecycle.
  • On-call Support

    :
  • Participate in

    on-call rotations

    and provide

    24/7 support

    for critical incidents, ensuring rapid response and resolution.
  • SLOs & SLIs

    :
  • Work with development teams to define and establish

    Service Level Objectives (SLOs)

    and

    Service Level Indicators (SLIs)

    to measure and maintain system reliability.
  • Documentation

    :
  • Contribute to the

    documentation

    of processes, procedures, and best practices to enhance knowledge sharing within the team.

Qualifications:

  • Education

    :
  • Bachelor's degree in

    Computer Science

    ,

    Information Technology

    , or a related field, or equivalent work experience.
  • Experience

    :
  • Minimum of

    4 years

    of experience in a

    Site Reliability Engineer

    or similar role, managing

    cloud-based infrastructure on AWS with EKS

    .
  • AWS Expertise

    :
  • Strong expertise in

    AWS services

    , especially

    EKS

    , including

    cluster provisioning

    ,

    scaling

    , and

    management

    .
  • Monitoring & Observability

    :
  • Proficiency in using

    monitoring

    and

    observability tools

    , with hands-on experience in

    Datadog

    or similar tools for tracking system performance and generating meaningful alerts.
  • CI/CD Experience

    :
  • Experience in implementing

    CI/CD pipelines

    using

    Azure DevOps

    or similar tools to automate software deployment and testing.
  • Containerization & Orchestration

    :
  • Solid understanding of

    containerization

    and

    orchestration technologies

    (e.g.,

    Docker

    ,

    Kubernetes

    ) and their role in modern application architectures.
  • Troubleshooting

    :
  • Excellent troubleshooting skills and the ability to analyze complex issues, determine root causes, and implement effective solutions.
  • Scripting & Automation

    :
  • Strong

    scripting

    and

    automation skills

    (e.g.,

    Python

    ,

    Bash

    ).
  • IaC (Infrastructure as Code)

    :
  • Familiarity with

    Infrastructure as Code (IaC)

    tools such as

    Terraform

    or

    CloudFormation

    .
  • Incident Management

    :
  • Experience with

    incident management

    ,

    post-incident analysis

    , and implementing improvements based on lessons learned.
  • Security & Compliance

    :
  • Good understanding of

    security best practices

    and

    compliance standards

    in cloud environments.
  • Communication

    :
  • Exceptional

    communication skills

    and the ability to collaborate effectively with cross-functional teams.
  • On-call Rotations

    :
  • Willingness to participate in

    on-call rotations

    and provide off-hours support when necessary.

Preferred Qualifications:

  • Relevant certifications such as:
  • AWS Certified DevOps Engineer

  • AWS Certified SRE

  • Kubernetes certifications

  • Experience with other cloud platforms (e.g.,

    Azure

    ,

    Google Cloud Platform

    ).
  • Familiarity with

    microservices architecture

    and

    service mesh technologies

    .
  • Prior experience with

    application performance tuning

    and

    optimization

    .

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You

bhopal, madhya pradesh

hyderabad, telangana