Senior AI ML Testing & Evaluation Engineer

5 years

0 Lacs

Posted:6 days ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

At Adeptiv.AI, we're building the most advanced AI Governance Platform for enterprises. Our flagship Real-Time Evaluation module empowers businesses to test, evaluate, and trust their AI systems.


Senior AI Evaluation Engineer


This role is ideal for someone who lives and breathes AI/ML evaluation, loves digging deep into models, and can bridge the gap between theory and production-grade software.


Key Responsibilities:

  • Design and lead the implementation of evaluation frameworks for ML and Gen AI systems.
  • Define and guide the evaluation of different AI/ML metrics, such as Accuracy, Precision, Recall, AUC, BLEU, ROUGE, METEOR, etc.
  • Develop strategies for model robustness, bias detection, and fairness evaluations.
  • Implement tools like SHAP, LIME, Captum, DeepChecks, Foolbox, Evidently AI, Alibi Detect, etc.
  • Define pipelines for automated test case execution, continuous evaluation, and report generation.
  • Guide and mentor full-stack and backend engineers in integrating AI/ML testing logic into production-ready services.
  • Establish standards for test dataset generation, edge-case simulation, and benchmarking.
  • Validate the correctness of evaluations across supported AI use cases.
  • Stay ahead of the curve on emerging research in AI evaluations and bring insights into the product.


Must-Have Skills & Experience:

  • 5+ years in AI/ML focused on the evaluation and testing of AI / ML systems
  • Deep expertise in traditional ML evaluation, Computer Vision, and Generative AI metrics
  • Strong familiarity with explainability tools (SHAP, LIME, Integrated Gradients, etc.)
  • Experience evaluating models in one or more domains: NLP, Computer Vision, Tabular Data, Reinforcement Learning
  • Hands-on experience with libraries like scikit-learn, huggingface, transformers, OpenAI, LangChain, TorchMetrics, Evidently, etc.
  • Experience working in collaboration with engineering teams to productize evaluation pipelines
  • Strong Python development and scripting capabilities
  • Solid understanding of AI reliability, robustness, fairness, and auditability


Good to Have:

  • Experience with LLM evaluation, hallucination detection, and prompt scoring
  • Prior contributions to AI testing or monitoring tools or open-source projects
  • Understanding of MLOps/LLMOps workflows
  • Familiarity with CI/CD of model evaluations in production
  • Awareness of AI compliance and audit frameworks (like EU AI Act, NIST AI RMF)


What You'll Bring

  • A rigorous scientific mindset, but with a builder's attitude
  • A passion to make AI trustworthy for enterprises
  • Strong communication skills to work cross-functionally with product & engineering
  • High ownership to shape a strategic product module from scratch


Why Join Us?

  • Be part of a cutting-edge product solving real challenges in AI Governance.
  • Work directly with the founding team and make a massive impact in enterprises.
  • Opportunity to influence the future of AI evaluation and reliability.


Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You