- Design, develop, and operate scalable and maintainable data pipelines in the Azure Databricks environment
- Develop all technical artefacts as code, implemented in professional IDEs, with full version control and CI/CD automation
- Enable data-driven decision-making in Human Resources (HR), Purchasing (PUR) and Finance (FIN) by ensuring high data availability, quality, and reliability
- Implement data products and analytical assets using software engineering principles in close alignment with business domains and functional IT
- Apply rigorous software engineering practices such as modular design, test-driven development, and artifact reuse in all implementations
- Global delivery footprint; cross-functional data engineering support across HR, PUR & FIN domains
- Collaboration with business stakeholders, functional IT partners, product owners, architects, ML/AI engineers, and Power BI developers
- Agile, product-team structure embedded in an enterprise-scale Azure environment
Main Tasks:
⢠Design scalable batch and streaming pipelines in Azure Databricks using PySpark and/or Scala
⢠Implement ingestion from structured and semi-structured sources (e.g., SAP, APIs, flat files)
⢠Build bronze/silver/gold data layers following the defined lakehouse layering architecture & governance
⢠Implement use-case driven dimensional models (star/snowflake schema) tailored to HR, PUR & FIN needs
⢠Ensure compatibility with reporting tools (e.g., Power BI) via curated data marts and semantic models
⢠Implement enterprise-level data warehouse models (domain-driven 3NF models) for HR, PUR & FIN data, closely aligned with data engineers for other business domains
⢠Develop and apply master data management strategies (e.g., Slowly Changing Dimensions)
⢠Develop automated data validation tests using frameworks
⢠Monitor pipeline health, identify anomalies, and implement quality thresholds
⢠Establish data quality transparency by defining and implementing meaningful data quality rules with source system and business stakeholders and implementing related reports
⢠Develop and structure pipelines using modular, reusable code in a professional IDE
⢠Apply test-driven development (TDD) principles with automated unit, integration, and validation tests
⢠Integrate tests into CI/CD pipelines to enable fail-fast deployment strategies
⢠Commit all artifacts to version control with peer review and CI/CD integration
⢠Work closely with Product Owners to refine user stories and define acceptance criteria
⢠Translate business requirements into data contracts and technical specifications
⢠Participate in agile events such as sprint planning, reviews, and retrospectives
⢠Document pipeline logic, data contracts, and technical decisions in markdown or auto-generated docs from code
⢠Align designs with governance and metadata standards (e.g., Unity Catalog)
⢠Track lineage and audit trails through integrated tooling
⢠Profile and tune data transformation performance
⢠Reduce job execution times and optimize cluster resource usage
⢠Refactor legacy pipelines or inefficient transformations to improve scalability
continental