Job Summary
We are partnering with one of the foundational Large Language Model (LLM) companies to help enhance next-generation AI systems. As a Python Developer, you will play a critical role in generating high-quality proprietary datasets, designing evaluation frameworks, and refining AI outputs. This role focuses on data-driven contributions to model fine-tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF), enabling measurable improvements in LLM performance and reliability.
Please note this is a contractual position with immediate joining requirement.
Key Responsibilities
- Develop and maintain high-quality Python code for dataset creation, evaluation, and automation.
- Design and execute evaluation strategies (Evals) to benchmark AI model performance.
- Generate, rank, and critique AI responses across technical and general domains.
- Build task-specific datasets for
Supervised Fine-Tuning (SFT)
and support RLHF pipelines
. - Collaborate with annotators, researchers, and product teams to refine reward models.
- Provide clear, well-documented rationales for model evaluations and feedback.
- Conduct peer reviews of code and documentation, driving adherence to best practices.
- Continuously explore new tools and methods to enhance AI training workflows.
Required Skills and Experience
3+ years
of strong hands-on experience with Python.- Proficiency in
multi-threading, async programming, debugging concurrency/memory issues
. - Strong knowledge of
Python testing frameworks
(unit, integration, property-based testing). - Ability to
refactor code
and work with architectural patterns. - Industry experience in
maintaining code quality, formatting, and clean design
. - Excellent analytical and reasoning skills to evaluate LLM outputs.
- Fluency in written and spoken English.
Type of Projects & Hands-On Experience
AI Training Data Generation
: Writing code, prompts, and responses for SFT.Evaluation Frameworks
: Designing processes to measure and benchmark model accuracy, safety, and alignment.RLHF Projects
: Comparing outputs of different LLM versions, ranking quality, and providing human feedback.Production-Quality Coding
: Writing maintainable, tested, and scalable Python solutions.
Expected depth: hands-on coding, dataset design, evaluation strategy creation, and active contribution to LLM training loops.
Preferred Qualifications
- Experience with
AI/ML workflows
(fine-tuning, eval pipelines, reward models). - Familiarity with
PyTorch, Hugging Face, or similar ML frameworks
. - Exposure to
AI ethics, alignment research, or model safety practices
. - Advanced degree in
Computer Science, Data Science, or related field
(optional).
Location & Shift Details
Remote (Global)
– fully distributed team.- Flexible engagement with required
4-hour overlap with PST
. - Options available:
20, 30, or 40 hours/week
.