Home
Jobs

0 years

0 Lacs

Posted:2 days ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

*Who you are*


You’re the person whose fingertips know the difference between spinning up a GPU cluster and spinning down a stale inference node. You love the “infrastructure behind the magic” of LLMs. You've built CI/CD pipelines that automatically version models, log inference metrics, and alert on drift. You’ve containerized GenAI services in Docker, deployed them on Kubernetes clusters (AKS or EKS), and implemented terraform or ARM to manage infra-as-code. You monitor cloud costs like a hawk, optimize GPU workloads, and sometimes sacrifice cost for performance—but never vice versa. You’re fluent in Python and Bash, can script tests for REST endpoints, and build automated feedback loops for model retraining. You’re comfortable working in Azure — OpenAI, Azure ML, Azure DevOps Pipelines—but are cloud-agnostic enough to cover AWS or GCP if needed. You read MLOps/LLMOps blog posts or arXiv summaries on the weekend and implement improvements on Monday. You think of yourself as a self-driven engineer: no playbooks, no spoon-feeding—just solid automation, reliability, and a hunger to scale GenAI from prototype to production.


---


*What you will actually do*


You’ll architect and build deployment platforms for internal LLM services: start from containerizing models and building CI/CD pipelines for inference microservices. You’ll write IaC (Terraform or ARM) to spin up clusters, endpoints, GPUs, storage, and logging infrastructure. You’ll integrate Azure OpenAI and Azure ML endpoints, pushing models via pipelines, versioning them, and enabling automatic retraining triggers. You’ll build monitoring and observability around latency, cost, error rates, drift, and prompt health metrics. You’ll optimize deployments—autoscaling, use of spot/gpu nodes, invalidation policies—to balance cost and performance. You’ll set up automated QA pipelines that validate model outputs (e.g. semantic similarity, hallucination detection) before merging. You’ll collaborate with ML, backend, and frontend teams to package components into release-ready backend services. You’ll manage alerts, rollbacks on failure, and ensure 99% uptime. You'll create reusable tooling (CI templates, deployment scripts, infra modules) to make future projects plug-and-play.


---


*Skills and knowledge*


Strong scripting skills in Python and Bash for automation and pipelines

Fluent in Docker, Kubernetes (especially AKS), containerizing LLM workloads

Infrastructure-as-code expertise: Terraform (Azure provider) or ARM templates

Experience with Azure DevOps or GitHub Actions for CI/CD of models and services

Knowledge of Azure OpenAI, Azure ML, or equivalent cloud LLM endpoints

Familiar with setting up monitoring: Azure Monitor, Prometheus/Grafana—track latency, errors, drift, costs

Cost-optimization tactics: spot nodes, autoscaling, GPU utilization tracking

Basic LLM understanding: inference latency/cost, deployment patterns, model versioning

Ability to build lightweight QA checks or integrate with QA pipelines

Cloud-agnostic awareness—experience with AWS or GCP backup systems

Comfortable establishing production-grade Ops pipelines, automating deployments end-to-end

Self-starter mentality: no playbooks required, ability to pick up new tools and drive infrastructure independently

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You