Senior Data Engineer

8 - 13 years

3 - 6 Lacs

Posted:3 days ago| Platform: Foundit logo

Apply

Skills Required

Work Mode

On-site

Job Type

Full Time

Job Description

  • We are looking for a highly skilled and experienced

    Senior Data Engineer

    to join our dynamic data engineering team.
  • The ideal candidate will be responsible for building and maintaining scalable, high-performance data pipelines and cloud infrastructure, with a focus on managing vast amounts of data efficiently in real-time and batch processing environments. The role requires expertise in advanced ETL processes, AWS services such as Glue, Lambda, S3, Redshift, and EMR, and hands-on experience with big data technologies like Apache Spark, Kafka, Kinesis, and Apache Airflow.
  • You will work closely with data scientists, software engineers, and analysts to ensure that data is accessible, clean, and reliable for business-critical operations and advanced analytics.

Key Responsibilities:

  • Design Architect Scalable Data Pipelines: Architect, build, and optimize high-throughput ETL pipelines using AWS Glue, Lambda, and EMR to handle large datasets and complex data workflows. Ensure the pipeline scales efficiently and handles real-time and batch processing.
  • Cloud Data Infrastructure Management: Implement, monitor, and maintain a cloud-native data infrastructure using AWS services like S3 for data storage, Redshift for data warehousing, and EMR for big data processing. Build robust, cost-effective solutions for storing, processing, and querying large datasets efficiently.
  • Data Transformation Processing: Develop highly performant data transformation processes using Apache Spark on EMR for distributed data processing and parallel computation. Write optimized Spark jobs in Python (PySpark) for efficient data transformation.
  • Real-time Data Streaming Solutions: Design and implement real-time data ingestion and streaming systems using AWS Kinesis or Apache Kafka to handle event-driven architectures, process continuous data streams, and support real-time analytics.
  • Orchestration Automation: Use Apache Airflow to schedule and orchestrate complex ETL workflows. Automate data pipeline processes, ensuring reliability, data integrity, and ease of monitoring. Implement self-healing workflows to recover from failures automatically.
  • Data Warehouse Optimization Management: Develop and optimize data models, schemas, and queries in Amazon Redshift to ensure low-latency querying and scalable analytics. Apply best practices for data partitioning, indexing, and query optimization to increase performance and minimize costs.
  • Containerization Orchestration:
  • Leverage Docker to containerize data engineering applications for better portability and consistent runtime environments. Use AWS Fargate for running containerized applications in a serverless environment, ensuring easy scaling and reduced operational overhead.
  • Monitoring Debugging: Build automated monitoring and alerting systems to proactively detect and troubleshoot pipeline issues, ensuring data quality and operational efficiency. Use tools like CloudWatch, Prometheus, or other logging frameworks to ensure end-to-end visibility of data pipelines.
  • Collaboration with Cross-functional Teams: Work closely with data scientists, analysts, and application developers to design data models and ensure proper data availability. Collaborate in the development of solutions that meet the business s data needs, from experimentation to production.
  • Security Compliance: Implement data governance policies, security protocols, and compliance measures for handling sensitive data, including encryption, auditing, and IAM role-based access control in AWS.
  • we are looking for 5+ years of hands-on experience in building, maintaining, and optimizing data pipelines, ideally in a cloud-native environment.
  • ETL Expertise: Solid understanding of ETL/ELT processes and experience with tools like AWS Glue for building serverless ETL pipelines. Expertise in designing data transformation logic to move and process data efficiently across systems.
  • AWS Services: Deep experience working with AWS cloud services:
  • S3: Designing data lakes, ensuring scalability and performance.
  • AWS Glue: Writing custom jobs for transforming data.
  • Lambda: Writing event-driven functions to process and transform data on-demand.
  • Redshift: Optimizing data warehousing operations for efficient query performance.
  • EMR (Elastic MapReduce): Running distributed processing frameworks like Apache Spark or Hadoop to process large datasets.
  • Big Data Technologies: Expertise in using Apache Spark for distributed data processing at scale. Experience with real-time data processing using Apache Kafka and AWS Kinesis for building streaming data pipelines.
  • Data Orchestration: Strong experience with Apache Airflow or similar workflow orchestration tools for scheduling, monitoring, and managing ETL jobs and data workflows.
  • Programming Scripting: Proficiency in Python programming language for building custom data pipelines and Spark jobs. Knowledge of standard processes in coding for high performance, maintainability, and reliability.
  • SQL Query Optimization: Advanced knowledge of SQL and experience in query optimization, partitioning, and indexing for working with large datasets in Redshift and other data platforms.
  • CI/CD DevOps Tools: Experience with version control systems like Git and implementing CI/CD pipelines using tools like Terraform or AWS CloudFormation to automate deployment and infrastructure management.

Preferred Qualifications:

  • Data Streaming:
  • Experience in designing and building real-time data streaming solutions using Kafka or Kinesis for real-time analytics and event processing.
  • Data Governance Security:
  • Familiarity with data governance practices, data cataloging, and data lineage tools to ensure the quality and security of data.
  • Advanced Data Analytics Support:
  • Knowledge of supporting machine learning pipelines and building data systems that can scale to meet the requirements of AI/ML workloads.

Certifications:

  • AWS certifications such as AWS Certified Big Data - Specialty or AWS Certified Solutions Architect are highly desirable.

Mock Interview

Practice Video Interview with JobPe AI

Start Job-Specific Interview
cta

Start Your Job Search Today

Browse through a variety of job opportunities tailored to your skills and preferences. Filter by location, experience, salary, and more to find your perfect fit.

Job Application AI Bot

Job Application AI Bot

Apply to 20+ Portals in one click

Download Now

Download the Mobile App

Instantly access job listings, apply easily, and track applications.

coding practice

Enhance Your Skills

Practice coding challenges to boost your skills

Start Practicing Now
Siemens logo
Siemens

Automation Machinery Manufacturing

Munich Brande

RecommendedJobs for You

Chennai, Tamil Nadu, India