ABOUT THE ROLE
Role Description:
We are seeking a highly skilled and detail-oriented Data Engineering Test Automation Engineer to ensure the quality, reliability, and performance of our data pipelines and platforms. The ideal candidate has strong experience in data testing, ETL validation, and automation frameworks, and will work closely with data engineers, analysts, and DevOps to build robust test suites. This role involves designing and executing both manual and automated tests for real-time and batch data pipelines across AWS and Databricks platforms, while applying QA best practices in test planning, defect tracking, and lifecycle management to ensure high-quality data delivery. In addition, the resource should have hands-on expertise in front-end Node.js testing and performance testing of infrastructure components, ensuring that user-facing systems are validated for responsiveness, scalability, and reliability. Experience with modern front-end test automation frameworks and load/performance testing tools is highly desirable.
Roles & Responsibilities:
- Design, develop, and maintain automated test scripts for data pipelines, ETL jobs, and data integrations.
- Validate data accuracy, completeness, transformations, and integrity across multiple systems.
- Collaborate with data engineers to define test cases and establish data quality metrics.
- Develop reusable test automation frameworks and CI/CD integrations (e.g., Selenium, Jenkins, GitHub Actions).
- Perform performance and load testing for data systems.
- Maintain test data management and data mocking strategies.
- Identify and track data quality issues, ensuring timely resolution.
- Perform root cause analysis and drive corrective actions.
- Contribute to QA ceremonies (standups, planning, retrospectives) and drive continuous improvement in QA processes and culture.
- Build and automate end-to-end data pipeline validations across ingestion, transformation, and consumption layers using Databricks, Apache Spark, and AWS services such as S3, Glue, Athena, and Lake Formation.
Basic Qualifications and Experience:
- Masters degree with 4 - 6 years of experience in Computer Science, IT or related field OR
- Bachelors degree with 6 - 8 years of experience in Computer Science, IT or related field
Functional Skills:
Must-Have Skills:
- Experience in QA roles, with strong exposure to data pipeline validation and ETL Testing.
- Hands-on expertise in front-end Node.js testing and performance testing of infrastructure components.
- Validate data accuracy, transformations, schema compliance, and completeness across systems using PySpark and SQL.
- Strong hands-on experience with Python, and optionallyPySpark, for developing automated data validation scripts.
- Proven experience invalidating ETL workflows, with a solid understanding ofdata transformation logic,schema comparison, andsource-to-target mapping.
- Experience working withdata integration and processing platformslikeDatabricks / Snowflake , AWS EMR,Redshift etc
- Experience inmanual and automated testingof data pipelines executions for bothbatchandreal-time data pipelines.
- Perform performance testing of large-scale complex data engineering pipelines.
- Ability totroubleshoot data issues independentlyand collaborate with engineering teams forroot cause analysis
- Strong understanding of QA methodologies, test planning, test case design, and defect lifecycle management.
- Hands-on experience with API testing using Postman, pytest, or custom automation scripts
- Experience integrating automated tests intoCI/CD pipelinesusing tools likeSelenium, JUnit, Jenkins,GitHub Actions, or similar.
- Knowledge of cloud platforms such as AWS, Azure, GCP.
Good-to-Have Skills:
- Certifications in Databricks, AWS, Azure, or data QA (e.g., ISTQB).
- Understanding of data privacy, compliance, and governance frameworks.
- Knowledge of UI automated testing frameworks like Selenium, JUnit, TestNG
- Familiarity with monitoring/observability tools such as Datadog, Prometheus, or Cloud Watch
- Frameworks like Mocha/Jest and performance testing tools like JMeter or k6
Professional Certifications (Preferred):
- AWS Certified Data Engineer / Data Analyst (preferred on Databricks or cloud environments)
Soft Skills:
- Excellent critical-thinking and problem-solving skills
- Strong communication and collaboration skills
- Demonstrated awareness of how to function in a team setting
- Demonstrated presentation skills