Home
Jobs

LLM Evaluation Engineer (GenAI QE)

6 years

0 Lacs

Posted:1 week ago| Platform: Linkedin logo

Apply

Work Mode

On-site

Job Type

Full Time

Job Description

Note:


Please apply only if you have


  • 6 years or more

    of relevant experience (excluding internship)
  • Comfortable working

    5-days a week

    from Gurugram, Haryana
  • Are an immediate joiner or currently serving your notice period


About Eucloid

At Eucloid, innovation meets impact. As a leader in AI and Data Science, we create solutions that redefine industries—from Hi-tech and D2C to Healthcare and SaaS. With partnerships with giants like Databricks, Google Cloud, and Adobe, we’re pushing boundaries and building next-gen technology.


Join our talented team of engineers, scientists, and visionaries from top institutes like IITs, IIMs, and NITs. At Eucloid, growth is a promise, and your work will drive transformative results for Fortune 100 clients.


What You’ll Do

  • Design and implement robust frameworks for evaluating large language models (LLMs) across dimensions like accuracy, safety, hallucination, and reasoning.
  • Build modular pipelines for automated, semi-automated, and human-in-the-loop evaluations.
  • Integrate GenAI testing tools such as Giskard, RAGAS, DeepEval, TruLens, Opik/Comet, and LangSmith.
  • Define and implement custom evaluation metrics tailored to use cases like RAG, agents, and safety guardrails.
  • Curate or generate high-quality evaluation datasets across domains (e.g., legal, medical, QA, coding).
  • Collaborate with developers to instrument tracing and logging for real-world model behavior capture.
  • Build dashboards and reporting mechanisms to visualize performance, regressions, and model comparisons.
  • Conduct prompt-based testing, chain-of-thought evaluations, adversarial testing, and A/B comparisons.
  • Contribute to red-teaming and stress-testing efforts to uncover vulnerabilities and ethical risks.


What Makes You a Fit

Academic Background:

  • Bachelor’s or Master’s degree in Computer Science, Data Science, Artificial Intelligence, or a related field.


Technical Expertise:

  • Minimum 6 years of hands-on experience in building, testing, or evaluating AI/ML systems

    , with a strong focus on LLMs or Generative AI applications.
  • Proficiency in

    Python

    , along with experience using

    ML/NLP libraries

    such as Hugging Face, LangChain, OpenAI SDK, or Cohere.
  • Experience in building

    evaluation pipelines

    or benchmarks for LLM performance across metrics like accuracy, robustness, safety, and hallucination.
  • Deep understanding of

    prompt engineering

    ,

    retrieval-augmented generation (RAG)

    , and

    agentic evaluation

    techniques.
  • Hands-on familiarity with

    evaluation tools

    such as Giskard, RAGAS, DeepEval, TruLens, LangSmith, Opik/Comet, or similar.
  • Working knowledge of

    vector databases

    like FAISS, Pinecone, or Weaviate, and embedding-based evaluation methods.
  • Experience with

    CI/CD pipelines

    , unit/integration testing for LLM apps, and model versioning for reproducibility.
  • Ability to define

    custom evaluation metrics

    tailored to specific use cases (e.g., RAG performance, guardrail compliance, hallucination detection).
  • Strong grasp of

    model instrumentation

    techniques for tracing/logging model behavior in real-world flows.


Extra Skills:

  • Experience in developing LLM-based applications such as chatbots, copilots, or RAG systems.
  • Exposure to designing or evaluating AI safety systems (e.g., jailbreaking prevention, content filters).
  • Open-source contributions to GenAI tooling or evaluation libraries.
  • Strong communication and documentation skills.
  • Comfort working in fast-paced, research-heavy environments.



Why You’ll Love It Here


  • Innovate with the Best Tech:

    Work on groundbreaking projects using AI, GenAI, LLMs, and massive-scale data platforms. Tackle challenges that push the boundaries of innovation.
  • Impact Industry Giants:

    Deliver business-critical solutions for Fortune 100 clients across Hi-tech, D2C, Healthcare, SaaS, and Retail. Partner with platforms like Databricks, Google Cloud, and Adobe to create high-impact products.
  • Collaborate with a World-Class Team:

    Join exceptional professionals from IITs, IIMs, NITs, and global leaders like Walmart, Amazon, Accenture, and ZS. Learn, grow, and lead in a team that values expertise and collaboration.
  • Accelerate Your Growth:

    Access our Centres of Excellence to upskill and work on industry-leading innovations. Your professional development is a top priority.
  • Work in a Culture of Excellence:

    Be part of a dynamic workplace that fosters creativity, teamwork, and a passion for building transformative solutions. Your contributions will be recognized and celebrated.


About Our Leadership


Anuj Gupta –

Raghvendra Kushwah


Key Benefits


  • Competitive salary and performance-based bonus.
  • Comprehensive benefits package, including health insurance and flexible work hours.
  • Opportunities for professional development and careers growth.


Location:


Application: Role Name.


Eucloid is an equal-opportunity employer. We celebrate diversity and are committed to creating an inclusive environment.

Mock Interview

Practice Video Interview with JobPe AI

Start Evaluation Interview Now
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now

RecommendedJobs for You