Coffeebeans.io - Site Reliability Engineer II

8 - 12 years

0 Lacs

Posted:3 days ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

About The Job

We're looking for a highly skilled and self-driven Site Reliability Engineer (SRE-2) to join our team in Hyderabad. This is a full-time, work-from-office role (5 days a week) perfect for someone with 8-12 years of experience who thrives on challenges and is passionate about building robust, scalable, and highly available systems.You'll play a crucial role in ensuring the reliability, performance, and efficiency of our critical infrastructure and applications, with a particular focus on Kubernetes, DevOps, and observability. If you have hands-on experience with ML applications, GPU optimization, and Big Data systems, you'll be an ideal fit.

Key Responsibilities

As a Site Reliability Engineer (SRE-2), you will :
  • Design, deploy, and manage highly available and scalable Kubernetes clusters and robust DevOps pipelines.
  • Troubleshoot and resolve complex infrastructure and application issues across various environments.
  • Implement, maintain, and enhance comprehensive observability solutions, with a strong emphasis on Thanos and related monitoring and alerting tools.
  • Provide expert support for machine learning (ML) workflows, leveraging tools like MLflow and Kubeflow.
  • Optimize applications to maximize performance in GPU-accelerated environments.
  • Contribute individually to projects and proactively learn and adopt new technologies to stay ahead of industry trends.
  • Automate repetitive tasks and streamline operational processes using a diverse set of scripting and automation tools including Python, Ansible, Groovy, and Shell scripting.

Qualifications

To be successful in this role, you should have :
  • Strong, hands-on experience with Kubernetes and a deep understanding of core DevOps principles and tools.
  • Proven expertise in observability and monitoring solutions, with a strong preference for experience with Thanos.
  • Demonstrable experience working with ML platforms and optimizing applications for GPU-based environments.
  • CKS (Certified Kubernetes Security Specialist) certification is preferred.
  • Experience with Big Data systems is a significant plus.
  • Proficiency in multiple scripting and automation languages : Python, Ansible, Groovy, and Shell scripting.
  • Hands-on experience with CI/CD tools such as Jenkins, Ansible, and ArgoCD.
(ref:hirist.tech)

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You