MLOps & AI Infrastructure Engineer

5 years

0 Lacs

Posted:4 days ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description


Company:

Location:

Position Type:


About Calaxis by Credartha : The Future of AI is Built on Trust

The $15 trillion promise of artificial intelligence is currently being held hostage by a single, pervasive bottleneck: the quality of domain-specific data. Today, building trustworthy, specialized AI is an artisanal, slow, and prohibitively expensive process reserved for tech giants with billion-dollar budgets.  

Calaxis is on a mission to change this. We are a deep-tech venture building a foundational platform to automate the end-to-end creation of flawless, high-quality datasets for any AI application. Our core innovation is a proprietary, self-improving system that uses a cascade of specialized AI models to systematically validate data for accuracy, compliance, and insight. By solving the data quality problem at its core, we are moving the AI industry from a capital-intensive to a method-intensive paradigm, democratizing the development of high-stakes AI for every vertical.  


What You Will Do:


  • Architect the AI Flywheel:

    Design and build the end-to-end MLOps infrastructure for our entire platform. This includes creating automated pipelines for training, validation, deployment, and the crucial feedback loop that makes our system self-improving.  
  • Build a Multi-Tenant PaaS:

    Engineer a scalable, secure, and efficient multi-tenant architecture on AWS to support our customer-facing services. This includes managing on-demand compute for customer-driven fine-tuning (SFT & RL) and model deployment jobs.  
  • Automate Everything (CI/CD/CT):

    Implement and manage a sophisticated CI/CD/CT (Continuous Integration/Continuous Deployment/Continuous Training) system for our suite of AI models and backend services, ensuring rapid and reliable updates.  
  • Optimize LLM Serving:

    Deploy and manage high-throughput, low-latency model serving infrastructure for our internal AI validators and for customer-deployed models.
  • Master GPU Resources:

    Develop and manage systems for efficient scheduling, allocation, and monitoring of GPU resources across multiple training and inference workloads.
  • Ensure Production-Grade Reliability:

    Implement comprehensive monitoring, logging, and alerting for the entire platform using tools like AWS CloudWatch to ensure high availability and performance.  
  • Champion Infrastructure as Code (IaC):

    Use tools like Terraform or AWS CloudFormation to define and manage our infrastructure, ensuring it is version-controlled, repeatable, and scalable. 

 

Who You Are: The Expert We Need


  • Required Qualifications:


  • 5+ years of professional experience in a DevOps, SRE, or MLOps role, with a proven track record of building and managing production infrastructure for scalable applications.
  • Deep expertise in cloud services, particularly AWS (e.g., EC2, S3, EKS/ECS, Lambda, RDS, API Gateway).  
  • Strong, hands-on experience with containerization (Docker) and container orchestration (Kubernetes).
  • Proven experience designing and implementing CI/CD pipelines for complex applications (e.g., Jenkins, GitLab CI, AWS CodePipeline).  
  • Proficiency in scripting and automation, with strong skills in Python.
  • A deep understanding of networking, security, and infrastructure best practices.
  • Preferred Qualifications (Bonus Points):

  • Direct experience building MLOps pipelines for training and deploying Large Language Models (LLMs).
  • Familiarity with LLM-specific serving frameworks (e.g., vLLM, Text Generation Inference, Triton).
  • Experience with ML platforms and tools like Kubeflow, MLflow, or Airflow.
  • Experience building infrastructure for multi-tenant SaaS or PaaS products.
  • Knowledge of advanced fine-tuning techniques like Reinforcement Learning from Human Feedback (RLHF) or Direct Preference Optimization (DPO) and their infrastructure requirements.
  • AWS certifications (e.g., DevOps Engineer, Solutions Architect).


Why Join Credartha?


  • Build from the Ground Up:

    This is a rare greenfield opportunity to be the founding infrastructure architect for a deep-tech company. Your design choices will have a lasting impact on the entire platform.
  • Solve Mission-Critical Challenges:

    You will be working on complex, interesting problems at the intersection of distributed systems, cloud infrastructure, and cutting-edge AI.
  • Massive Impact and Ownership:

    You won't be maintaining legacy systems. You will have unparalleled ownership and the opportunity to build the operational foundation of a platform poised to disrupt a $15 trillion market.  
  • A Culture of Excellence:

    Join a passionate founding team that values technical rigor, innovation, and collaboration.
  • Competitive Compensation:

    We offer a highly competitive salary, significant equity, and comprehensive benefits to ensure you are rewarded for your foundational contributions.


If you are a world-class infrastructure engineer who is excited by the challenge of building the engine for the future of AI, we want to hear from you.


How to Apply:

Please submit your resume and a brief cover letter or message highlighting your experience building scalable, production-grade infrastructure and why you are excited about the mission at Calaxis.


Mock Interview

Practice Video Interview with JobPe AI

Start DevOps Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You