Design and implement complex ETL/ELT pipelines using PySpark and Airflow for large-scale data processing on GCP
Lead data migration initiatives, including automating the movement of Teradata tables to BigQuery, ensuring data accuracy and consistency
Develop robust frameworks to streamline batch and streaming data ingestion workflows, leveraging Kafka, Dataflow, and NiFi
Collaborate with data scientists to build ML-ready data layers and support analytics solutions
Conduct proof of concepts (POCs) and document performance benchmarking for data throughput and velocity, ensuring optimized data workflows
Enhance CI/CD pipelines using Jenkins and GitLab for efficient deployment and monitoring of data solutions
Collaborate in agile teams for product development and delivery
Ability to work independently and design data integrations and data quality framework
Responsibilities
Strong proficiency in Python and SQL for data engineering tasks
Strong understanding and experience with distributed computing principles and frameworks like Hadoop, Apache Spark etc
Advanced experience with GCP services, including BigQuery, Dataflow, Cloud Composer (Airflow), and Dataproc
Expertise in data modeling, ETL/ELT pipeline development, and workflow orchestration using Airflow DAGs
Hands-on experience with data migration from legacy systems (Teradata, Hive) to cloud platforms (BigQuery)
Familiarity with streaming data ingestion tools like Kafka and NiFi
Strong problem-solving skills and experience with performance optimization in large-scale data environments
Proficiency in CI/CD tools (Jenkins, GitLab) and version control systems (Git)
GCP Professional Data Engineer certification
Requirements
Design and implement complex ETL/ELT pipelines using PySpark and Airflow for large-scale data processing on GCP
Lead data migration initiatives, including automating the movement of Teradata tables to BigQuery, ensuring data accuracy and consistency
Develop robust frameworks to streamline batch and streaming data ingestion workflows, leveraging Kafka, Dataflow, and NiFi
Collaborate with data scientists to build ML-ready data layers and support analytics solutions
Conduct proof of concepts (POCs) and document performance benchmarking for data throughput and velocity, ensuring optimized data workflows
Enhance CI/CD pipelines using Jenkins and GitLab for efficient deployment and monitoring of data solutions
Collaborate in agile teams for product development and delivery
Ability to work independently and design data integrations and data quality framework
Strong proficiency in Python and SQL for data engineering tasks
Strong understanding and experience with distributed computing principles and frameworks like Hadoop, Apache Spark etc
Advanced experience with GCP services, including BigQuery, Dataflow, Cloud Composer (Airflow), and Dataproc
Expertise in data modeling, ETL/ELT pipeline development, and workflow orchestration using Airflow DAGs
Hands-on experience with data migration from legacy systems (Teradata, Hive) to cloud platforms (BigQuery)
Familiarity with streaming data ingestion tools like Kafka and NiFi
Strong problem-solving skills and experience with performance optimization in large-scale data environments
Proficiency in CI/CD tools (Jenkins, GitLab) and version control systems (Git)
GCP Professional Data Engineer certification