A fast-scaling provider of analytics & data engineering services within the enterprise software and digital transformation sector in India seeks an onsite PySpark Engineer to build and optimize high-volume data pipelines on modern big-data platforms.Role & Responsibilities

Design, develop, and maintain PySpark-based batch and streaming pipelines for data ingestion, cleansing, transformation, and aggregation.
Optimize Spark jobs for performance and cost, tuning partitions, caching strategies, and join execution plans.
Integrate diverse data sources—RDBMS, NoSQL, cloud storage, and REST APIs—into unified, consumable datasets for analytics and reporting teams.
Implement robust data quality, error-handling, and lineage tracking using Spark SQL, Delta Lake, and metadata tools.
Collaborate with Data Architects and BI teams to translate analytical requirements into scalable data models.
Follow Agile delivery practices, write unit and integration tests, and automate deployments through Git-driven CI/CD pipelines.

Skills & Qualifications

Must-Have
3+ years hands-on PySpark development in production environments.
Deep knowledge of Spark SQL, DataFrames, RDD optimizations, and performance tuning.
Proficiency in Python 3, object-oriented design, and writing reusable modules.
Experience with Hadoop ecosystem, Hive/Impala, and cloud object storage such as S3, ADLS, or GCS.
Strong SQL skills and understanding of star/snowflake schema modeling.
Preferred
Exposure to Delta Lake, Apache Airflow, or Kafka for orchestration and streaming.
Experience deploying on Databricks or EMR and configuring autoscaling clusters.
Knowledge of Docker or Kubernetes for containerized data workloads.

Benefits & Culture Highlights

Hands-on work with modern open-source tech stacks and leading cloud platforms.
Mentorship from senior data engineers and architects, fostering rapid skill growth.
Performance-based bonuses, skill-development stipends, and a collaborative, innovation-driven environment.

Skills: sql,hadoop ecosystem,pyspark engineer,scala,python 3,performance tuning,problem solving,apache airflow,hive,emr,kubernetes,agile,impala,pyspark,dataframes,delta lake,rdd optimizations,object-oriented design,python,spark,data modeling,databricks,spark sql,docker,hadoop,etl,kafka

Mock Interview

Practice Video Interview with JobPe AI

Start PySpark Interview

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

Enhance Your Python Skills

Practice Python coding challenges to boost your skills

Start Practicing Python Now

Viraaj HR Solutions Private Limited

Login to

Please Verify Your Phone or Email

Confirm Action

Search

Profile

Upskill and Grow with AI

Pyspark Engineer