The Data Integration Analyst will play a key role in developing scalable and automated data‑processing workflows to support ongoing data science initiatives. This role involves building ingestion pipelines in Azure Databricks, creating Python and PySpark‑based data‑cleaning and validation workflows, and implementing corporate standards to ensure full traceability, clear data lineage, and reproducible processes.
What you’re responsible for:
- Developing automated data cleaning and validation workflows using Python and PySpark notebooks, along with Databricks pipelines, to support a data science project.
- Building robust ingestion pipelines in Databricks to efficiently load, process, and prepare data for downstream analytics and modeling.
- Ensuring full traceability of data cleaning methodologies by designing workflows that follow the Medallion Architecture (Bronze → Silver → Gold), maintaining clear lineage and reproducibility.
- Implementing corporate standards for data‑cleaning notebooks to improve readability, consistency, maintainability, and ease of handoff across teams.
- Developing reusable, well‑documented functions (when necessary) that are readable, modular, and include strong error‑handling measures to support scalable and reliable data processing.