8 - 10 years
3 - 6 Lacs
Posted:2 weeks ago|
Platform:
Work from Office
Full Time
Design, implement, and maintain end-to-end MLOps pipelines for model training, validation, deployment, and monitoring. Build and manage LLMOps pipelines for fine-tuning, evaluating, and deploying large language models (e.g., OpenAI, HuggingFace Transformers, custom LLMs). Use Kubeflow and Kubernetes to orchestrate reproducible, scalable ML/LLM workflows. Implement CI/CD pipelines for ML projects using GitHub Actions , Argo Workflows , or Jenkins . Automate infrastructure provisioning using Terraform , Helm , or similar IaC tools. Integrate model registry and artifact management with tools like MLflow , Weights Biases , or DVC . Manage containerization with Docker and container orchestration via Kubernetes . Set up monitoring , logging , and alerting for production models using tools like Prometheus , Grafana , and ELK Stack . Collaborate closely with Data Scientists and DevOps engineers to ensure seamless integration of models into production systems. Ensure model governance, reproducibility, auditability, and compliance with enterprise and legal standards. Conduct performance profiling, load testing, and cost optimization for LLM inference endpoints. Required Skills and Experience Core MLOps/LLMOps Expertise 5+ years of hands-on experience in MLOps/DevOps for AI/ML. 2+ years working with LLMs in production (e.g., fine-tuning, inference optimization, safety evaluations). Strong experience with Kubeflow Pipelines , KServe , and MLflow . Deep knowledge of CI/CD pipelines with GitHub Actions , GitLab CI , or CircleCI . Expert in Kubernetes , Helm , and Terraform for container orchestration and infrastructure as code. Programming Frameworks Proficient in Python , with experience in ML libraries such as scikit-learn , TensorFlow , PyTorch , Hugging Face Transformers . Familiarity with FastAPI , Flask , or gRPC for building ML model APIs. Cloud DevOps Hands-on with AWS , Azure , or GCP (preferred: EKS, S3, SageMaker, Vertex AI, Azure ML). Knowledge of model serving using Triton Inference Server , TorchServe , or ONNX Runtime . Monitoring Logging Tools: Prometheus , Grafana , ELK , OpenTelemetry , Sentry . Model drift detection and A/B testing in production environments. Soft Skills Strong problem-solving and debugging skills. Ability to mentor junior engineers and collaborate with cross-functional teams. Clear communication, documentation, and Agile/Scrum proficiency. Preferred Qualifications Experience with LLMOps platforms like Weights Biases , TruEra , PromptLayer , LangSmith . Experience with multi-tenant LLM serving or agentic systems (LangChain, Semantic Kernel). Prior exposure to Responsible AI practices (bias detection, explainability, fairness)
Cirruslabs
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Practice Python coding challenges to boost your skills
Start Practicing Python NowKolkata, Mumbai, New Delhi, Hyderabad, Pune, Chennai, Bengaluru
3.0 - 6.0 Lacs P.A.
Bengaluru
20.0 - 30.0 Lacs P.A.
0.5 - 0.6 Lacs P.A.
Mumbai
7.0 - 11.0 Lacs P.A.
40.0 - 50.0 Lacs P.A.
Experience: Not specified
2.0 - 7.0 Lacs P.A.
Bengaluru
2.0 - 6.0 Lacs P.A.
7.0 - 11.0 Lacs P.A.
Kolkata, Mumbai, New Delhi, Hyderabad, Pune, Chennai, Bengaluru
8.0 - 12.0 Lacs P.A.
Ahmedabad
6.0 - 6.0 Lacs P.A.