We are partnering with one of the foundational Large Language Model (LLM) companies to help enhance next-generation AI systems. As a Python Developer, you will play a critical role in generating high-quality proprietary datasets, designing evaluation frameworks, and refining AI outputs. This role focuses on data-driven contributions to model fine-tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF), enabling measurable improvements in LLM performance and reliability.

Please note this is a contractual position with immediate joining requirement.

Key Responsibilities

Develop and maintain high-quality Python code for dataset creation, evaluation, and automation.
Design and execute evaluation strategies (Evals) to benchmark AI model performance.
Generate, rank, and critique AI responses across technical and general domains.
Build task-specific datasets for
Supervised Fine-Tuning (SFT)
and support
RLHF pipelines
.
Collaborate with annotators, researchers, and product teams to refine reward models.
Provide clear, well-documented rationales for model evaluations and feedback.
Conduct peer reviews of code and documentation, driving adherence to best practices.
Continuously explore new tools and methods to enhance AI training workflows.

Required Skills and Experience

3+ years
of strong hands-on experience with Python.
Proficiency in
multi-threading, async programming, debugging concurrency/memory issues
.
Strong knowledge of
Python testing frameworks
(unit, integration, property-based testing).
Ability to
refactor code
and work with architectural patterns.
Industry experience in
maintaining code quality, formatting, and clean design
.
Excellent analytical and reasoning skills to evaluate LLM outputs.
Fluency in written and spoken English.

Type of Projects & Hands-On Experience

AI Training Data Generation
: Writing code, prompts, and responses for SFT.
Evaluation Frameworks
: Designing processes to measure and benchmark model accuracy, safety, and alignment.
RLHF Projects
: Comparing outputs of different LLM versions, ranking quality, and providing human feedback.
Production-Quality Coding
: Writing maintainable, tested, and scalable Python solutions.

Expected depth: hands-on coding, dataset design, evaluation strategy creation, and active contribution to LLM training loops.

Preferred Qualifications

Experience with
AI/ML workflows
(fine-tuning, eval pipelines, reward models).
Familiarity with
PyTorch, Hugging Face, or similar ML frameworks
.
Exposure to
AI ethics, alignment research, or model safety practices
.
Advanced degree in
Computer Science, Data Science, or related field
(optional).

Location & Shift Details

Remote (Global)
– fully distributed team.
Flexible engagement with required
4-hour overlap with PST
.
Options available:
20, 30, or 40 hours/week
.

More Jobs at CHC

Python Developer (AI & Data Evaluation)

india

3.0 - 3.0 yrs

Salary: Not disclosed

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.