AI Data Engineer - Synthetic Data Generation

1 - 5 years

15 - 25 Lacs

Posted:None| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Key Responsibilities

Synthetic Data Generation & Quality Assurance

  • Design and implement scalable synthetic data generation systems to support model training
  • Develop and maintain data quality validation pipelines ensuring synthetic data meets training requirements
  • Build automated testing frameworks for synthetic data generation workflows
  • Collaborate with ML teams to optimize synthetic data for model performance

APIs & Integration

  • Develop and maintain

    REST API integrations

    across multiple enterprise platforms
  • Implement robust

    data exchange, transformation, and synchronisation

    logic between systems
  • Ensure

    error handling, retries, and monitoring

    for all integration workflows

Data Quality & Testing

  • Implement

    automated data validation and testing frameworks

    for ETL and synthetic data workflows
  • Translate

    data quality feedback

    from stakeholders into pipeline or generation process improvements
  • Proactively monitor and maintain

    data consistency

    across systems

Multi-System Integration & MCP Development

  • Build and maintain tool registries for Model Control Protocol (MCP) integration across multiple enterprise systems
  • Develop robust APIs for multi-system communication through MCP frameworks
  • Design and implement workflows that coordinate multi-system interactions
  • Ensure reliable data flow and error handling across distributed system architectures

Cross-Functional Collaboration & Production Integration

  • Partner with domain specialists to translate plan execution feedback into actionable insights
  • Work closely with Product Managers to align synthetic data generation with business requirements
  • Collaborate with Core Engineering teams to ensure seamless production deployment
  • Establish feedback mechanisms between synthetic data systems and production environments

Required Qualifications

Technical Skills

  • Programming:

    Proficiency in Python, Typescript (optional)
  • Data Engineering:

    Experience in data engineering frameworks and libraries (Pandas, Apache Airflow, Prefect)
  • APIs & Integration:

    Strong background in REST APIs and system integration
  • Databases:

    Experience with relational and NoSQL databases (PostgreSQL, MongoDB)
  • Cloud Platforms:

    Hands on experience with AWS/GCP/Azure

Experience Requirements

  • 2+ years experience in building production-scale data pipelines and orchestration systems
  • Demonstrated success in cross-functional collaboration in technical environments

Preferred Qualifications

  • Familiarity with managing Kubernetes-based production workloads and workflow orchestration (Argo)
  • Familiarity with containerisation and orchestration with tools like Docker, Kubernetes etc.
  • Familiarity with synthetic or large-scale data generation
  • Background in enterprise software integration
  • Experience with Model Control Protocol (MCP) or similar orchestration frameworks
  • Knowledge of automated testing frameworks for data pipelines

What We Offer

  • Lots of learning

    many systems are being built from the ground up, with no existing references or open-source projects to rely on. This will be the first time not just for you, but for the industry as well.
  • Opportunity to work at the forefront of enterprise-scale synthetic data generation
  • Collaborative environment with product teams, engineering, and domain specialists
  • Competitive compensation and comprehensive benefits
  • Professional development opportunities in cutting-edge data engineering and Kubernetes orchestration

Team Structure

You'll report to the AI Engineering Lead and work closely with:

  • ML Engineers developing foundation models
  • Product Managers defining business requirements
  • Product Specialists providing domain expertise
  • Backend Engineers handling production infrastructure

This role offers significant impact on our data capabilities and the opportunity to shape how we generate and utilize synthetic data for training enterprise systems.

Mock Interview

Practice Video Interview with JobPe AI

Start Python Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

RecommendedJobs for You

chennai, tamil nadu, india