Responsibilities:
- Design, develop, and maintain automated data quality checks for complex data pipelines handling petabyte scale datasets.
- Implement scalable data validation frameworks using PySpark, PyTest , Python, SQL, and Hive, ensuring comprehensive test coverage.
- Collaborate with Data Engineers and DevOps teams to integrate automated tests into CI CD workflows.
- Analyze test results, identify data anomalies, and provide actionable insights to resolve data quality issues.
- Develop monitoring and alerting solutions for data quality in production environments.
- Document test processes, standards, and best practices mentor junior engineers on data quality automation.
- Continuously improve test frameworks and processes to optimize performance, scalability, and reliability.
This is a hybrid position. Expectation of days in office will be confirmed by your hiring manager.