Get alerts for new jobs matching your selected skills, preferred locations, and experience range. Manage Job Alerts
6.0 - 10.0 years
30 - 35 Lacs
Bengaluru
Work from Office
We are seeking an experienced PySpark Developer / Data Engineer to design, develop, and optimize big data processing pipelines using Apache Spark and Python (PySpark). The ideal candidate should have expertise in distributed computing, ETL workflows, data lake architectures, and cloud-based big data solutions. Key Responsibilities: Develop and optimize ETL/ELT data pipelines using PySpark on distributed computing platforms (Hadoop, Databricks, EMR, HDInsight). Work with structured and unstructured data to perform data transformation, cleansing, and aggregation. Implement data lake and data warehouse solutions on AWS (S3, Glue, Redshift), Azure (ADLS, Synapse), or GCP (BigQuery, Dataflow). Optimize PySpark jobs for performance tuning, partitioning, and caching strategies. Design and implement real-time and batch data processing solutions. Integrate data pipelines with Kafka, Delta Lake, Iceberg, or Hudi for streaming and incremental updates. Ensure data security, governance, and compliance with industry best practices. Work with data scientists and analysts to prepare and process large-scale datasets for machine learning models. Collaborate with DevOps teams to deploy, monitor, and scale PySpark jobs using CI/CD pipelines, Kubernetes, and containerization. Perform unit testing and validation to ensure data integrity and reliability. Required Skills & Qualifications: 6+ years of experience in big data processing, ETL, and data engineering. Strong hands-on experience with PySpark (Apache Spark with Python). Expertise in SQL, DataFrame API, and RDD transformations. Experience with big data platforms (Hadoop, Hive, HDFS, Spark SQL). Knowledge of cloud data processing services (AWS Glue, EMR, Databricks, Azure Synapse, GCP Dataflow). Proficiency in writing optimized queries, partitioning, and indexing for performance tuning. Experience with workflow orchestration tools like Airflow, Oozie, or Prefect. Familiarity with containerization and deployment using Docker, Kubernetes, and CI/CD pipelines. Strong understanding of data governance, security, and compliance (GDPR, HIPAA, CCPA, etc.). Excellent problem-solving, debugging, and performance optimization skills.
Posted 4 days ago
Upload Resume
Drag or click to upload
Your data is secure with us, protected by advanced encryption.
Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.
We have sent an OTP to your contact. Please enter it below to verify.
Accenture
22645 Jobs | Dublin
Wipro
12405 Jobs | Bengaluru
EY
8519 Jobs | London
Accenture in India
7136 Jobs | Dublin 2
Uplers
6955 Jobs | Ahmedabad
Amazon
6685 Jobs | Seattle,WA
IBM
6478 Jobs | Armonk
Oracle
6281 Jobs | Redwood City
Muthoot FinCorp (MFL)
5249 Jobs | New Delhi
Capgemini
4637 Jobs | Paris,France