Senior LLM Infrastructure & SRE Engineer

6 - 10 years

35 - 65 Lacs

Posted:8 hours ago| Platform: Naukri logo

Apply

Skills Required

sre rag & enterprise data integration ml / llm / nlp workloads llm infrastructure fine-tuning & data pipelines auth rest/grpc reliability & operations at scale peft mlops llama deepseek rbac lora mistral qlora rate limiting gateway

Work Mode

Hybrid

Job Type

Full Time

Job Description

Key Responsibilities

1. LLM Infrastructure, Deployment & Runtime

  • Design, deploy, and operate

    open-source LLMs

    (e.g., Llama, DeepSeek, Mistral, etc.) on

    cloud (AWS/Azure/GCP/Oracle)

    and

    on-prem

    infrastructure.
  • Build and maintain

    high-performance inference microservices

    and APIs for LLM workloads (REST/gRPC, gateway, auth, RBAC, rate limiting).
  • Implement

    efficient runtime optimizations

    : quantization, tensor parallelism, batching, caching, and appropriate serving frameworks (e.g., vLLM, TGI, TensorRT-LLM, Ray Serve, etc.).
  • Ensure

    secure network, storage, and data paths

    , aligned with enterprise requirements (VPCs, private links, firewalls, secrets management).

2. SRE, Reliability & Operations at Scale

  • Define and own

    SLOs/SLIs

    for LLM services (latency, availability, error rates, throughput, and cost targets).
  • Implement

    monitoring, logging, tracing, and alerting

    across the stack using modern observability tools.
  • Lead

    incident response

    , root-cause analysis, and postmortems for production issues affecting LLM workloads.
  • Design strategies for

    autoscaling, failover, blue/green / canary deployments

    , and

    capacity planning

    for large-scale inference.
  • Establish

    runbooks, playbooks, and operational standards

    for LLM services in production.

3. MLOps, Fine-Tuning & Data Pipelines

  • Collaborate with ML/LLM engineers to

    operationalize fine-tuning and retraining

    of LLMs using

    domain-specific data

    with techniques such as

    LoRA, QLoRA, PEFT, and full-parameter tuning

    where needed.
  • Build and maintain

    CI/CD and MLOps pipelines

    for training, validation, deployment, rollback, and monitoring of models and features.
  • Implement

    data pipelines

    for large training and evaluation datasets: ingestion, cleaning, labeling, augmentation, anonymization, and quality checks.
  • Support

    continuous delivery of models

    with robust versioning, promotion strategies (dev/stage/prod), and rollback.

4. RAG & Enterprise Data Integration

  • Support teams implementing

    RAG (Retrieval-Augmented Generation)

    by deploying and scaling

    vector databases

    and search infrastructure (e.g., pgvector, Milvus, Chroma, Pinecone, Weaviate, Elasticsearch/OpenSearch, etc.).
  • Ensure

    performance, reliability, and security

    of embedding generation and retrieval pipelines.
  • Work with architects to align retrieval and RAG architectures with

    enterprise IAM, data governance, and network constraints

    .

5. Leadership & Collaboration

  • Provide

    technical leadership and mentorship

    to junior engineers and LLM-focused developers.
  • Collaborate with solution architects, security/gov teams, and client stakeholders to design

    production-ready architectures

    .
  • Contribute to

    standards, best practices, and reference architectures

    for LLM operations across client engagements.

Required Qualifications

  • 10+ years

    of experience in

    infrastructure / platform engineering / SRE / DevOps

    roles.
  • 2+ years

    of hands-on experience with

    ML / LLM / NLP workloads

    (PoCs or production deployments).
  • Strong expertise in

    cloud platforms

    (Azure, AWS, or GCP) and

    on-prem / hybrid

    deployments.
  • Hands-on experience with

    containers and orchestration

    : Docker, Kubernetes (and related tooling like Helm, Kustomize, etc.).
  • Strong

    Python

    skills (plus familiarity with Go/Java/TypeScript for services is a plus).
  • Experience deploying

    high-throughput, low-latency services

    and tuning performance at the

    infrastructure + application

    layers.
  • Familiarity with

    LLM fine-tuning techniques

    (LoRA, QLoRA, etc.) and core deep learning stacks (PyTorch preferred).
  • Strong understanding of

    observability, SRE practices, and incident management

    .
  • Solid grasp of

    security and governance concepts

    : secrets management, RBAC, network segregation, logging and audit trails.

Nice to Have

  • Experience with

    MLOps tools

    : MLflow, Kubeflow, Weights & Biases, Airflow/Prefect, etc.
  • Experience with

    LLM orchestration frameworks

    (LangChain, LlamaIndex) from the infra perspective.
  • Exposure to

    industrial / OT / manufacturing / energy / regulated

    environments.
  • Prior experience in

    consulting or client-facing roles

    .

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
iBeris Software logo
iBeris Software

Software Development

Barcelona

RecommendedJobs for You