Lead Data Engineer

6 - 11 years

20 - 35 Lacs

Posted:1 day ago| Platform: Naukri logo

Apply

Work Mode

Work from Office

Job Type

Full Time

Job Description

Senior/Lead Data Engineer (Warehouse Architecture & PySpark)

Location:

Experience:

The opportunity

Own the end-to-end build of a modern analytics foundationdesign clean warehouse models, craft high-quality PySpark transformations, and ship reliable pipelines feeding BI/ML at scale.

What you’ll do

  • Model the warehouse:

    Define dimensional/star schemas (SCD1/2, snapshots), conformed dimensions, and clear grains across core domains.
  • Author robust transformations:

    Build performant

    PySpark

    jobs for batch/near-real-time; handle nested JSON, schema evolution, late data, and idempotent re-runs.
  • Ingestion & CDC:

    Operate change-data-capture from relational and document stores; incremental patterns, backfills, and auditability.
  • Orchestrate & automate:

    Own DAGs, retries, SLAs, and deployments in a modern scheduler with infra-as-code and CI/CD.
  • Quality, lineage, observability:

    Freshness/uniqueness/RI tests, lineage, monitoring for drift/skew and job SLAs.
  • Performance & cost:

    Partitioning, clustering/sort, join strategies, file sizing, columnar formats, workload management.
  • Security & governance:

    RBAC, masking/tokenization for PII, data contracts with producers/consumers.
  • Partner across functions:

    Work with product/engineering/finance/analytics to define SLAs, KPIs, and domain boundaries.

Must-have qualifications

  • Warehouse modeling depth:

    5+ years designing dimensional models at multi-TB scale; strong with grains, surrogate keys,

    SCD2

    , snapshots.
  • PySpark expertise:

    Solid grasp of Spark execution (shuffles, skew mitigation, AQE), windowing, and UDF/UDTF trade-offs.
  • Pipelines from mixed sources:

    Ingest from

    RDBMS and document/NoSQL

    systems; handle nested structures and schema evolution.
  • Cloud DW proficiency:

    Hands-on with a

    Redshift-class

    warehouse and lake/lakehouse table formats (

    Parquet/Delta/Iceberg

    ).
  • Orchestration & CI/CD:

    Production experience with an

    Airflow-class

    scheduler, Git workflows, environment promotion, and IaC (Terraform/CDK).
  • Data quality & lineage:

    Practical use of Great Expectations/Deequ (or equivalent) and lineage tooling; incident prevention mindset.
  • Streaming & CDC:

    Production experience with

    Kafka-class

    streams and

    Debezium-style

    CDC (topics/partitions, offset mgmt, schema registry, compaction).
  • Semantic layer & ELT:

    Working experience with

    dbt

    (or equivalent) and a metrics/semantic layer (e.g., MetricFlow/LookML-style).
  • Cost governance:

    Workload management, queue/WLM tuning, and price/perf optimization in cloud DWs.
  • Privacy & compliance:

    Exposure to GDPR/DPDP concepts; secure-by-design patterns for PII.
  • Ownership & communication:

    Clear docs/design reviews; ability to translate ambiguous asks into resilient datasets.

.

Interview signals

  • Can sketch a

    star schema

    from messy OLTP + event streams and justify grain/keys/SCD choices.
  • Reads a

    Spark plan

    and explains shuffle boundaries + a concrete skew fix.
  • Describes

    Kafka + Debezium

    CDC patterns (outbox, schema evolution, retries, exactly-once/at-least-once trade-offs).
  • Shows

    dbt

    modeling discipline (naming, tests, exposures, contracts) and how it fits with PySpark transforms.
  • Demonstrates a real

    cost/perf

    win (partitioning/sort keys/file sizing/WLM) with before/after metrics.

Tech familiarity

PySpark

How to apply:

Mock Interview

Practice Video Interview with JobPe AI

Start PySpark Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now
Akrivia HCM logo
Akrivia HCM

Software Development

San Francisco California

RecommendedJobs for You