Site Reliability Engineer

8 - 10 years

14 - 22 Lacs

Posted:16 hours ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

we have open requirement for "SRE LEAD Engineer"

client: MNC.

PRODUSCT BASE US COMPANY

Role & responsibilities

Responsibilities:

  • Architect, design, and deploy end-to-end infrastructure solutions for a multi-tenant

microservices-based SaaS application with a focus on AI/ML model integration.

  • Ensure system reliability, scalability, performance, and security, specifically enhancing

AI/ML processing pipelines and workflows.

  • Utilize Terraform scripting for on-demand environment provisioning within the AWS

cloud, optimized for AI/ML workloads.

  • Implement and refine monitoring and alerting systems across application, network, and

OS layers to support AI model operations and data processing.

  • Diagnose, support, and resolve production issues and alerts, participating in a 24/7

on-call rotation to maintain seamless AI/ML service operations.

Qualifications :

  • 8+ years of experience in Site Reliability Engineering (SRE) and DevOps roles with a

track record of managing large-scale enterprise SaaS services in production, including

1+ year in AI/ML infrastructure.

  • Demonstrated expertise with AWS public cloud technologies, including extensive

experience in deploying and managing large-scale container clusters using AWS, EKS.

Skilled in Infrastructure as Code (IaC) using Terraform, and container technologies such

as Docker and Kubernetes.

  • Proficient in scripting and programming for automation (Python, Bash, etc.), with strong

Linux OS and networking fundamentals relevant to AI/ML workloads.

  • Experience in establishing monitoring systems to ensure high availability, performance,

and security integrity, using tools like ELK Stack, CloudWatch, and others tailored for

AI/ML monitoring.

  • Hands-on experience managing microservices architecture SaaS products, enabling

RESTful web services, SSO integration (Okta, Auth0), and utilizing cloud databases like

EC2-RDS, MySQL, and Elasticsearch, especially in AI/ML deployments.

  • Proficient in backup and disaster recovery strategies specific to AI/ML data resources

like RDS and Elasticsearch.

  • AWS Certified Solutions Architect is strongly preferred.
  • Self-driven, proactive, and adaptable to thrive in an early-stage startup environment, with

a keen interest in integrating AI/ML technologies into modern SaaS solutions.

Preferred candidate profile

If interested candidates please share the your profiles to HR@TECHXLNC.AI

NP: Immediate to 30 days

loc:HYD

Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You