Job Description
Position Overview
We are seeking an exceptional Principal / Head of Data Engineering toestablishand lead our data engineering function from the ground up. This role reports to theHead of Data and AI Engineeringandis responsible forthe complete design, development, and implementation of a world-class modern data platform. You will drive the strategic evolution of our data infrastructure, enabling both structured and unstructured data workflows atscale. You will spearhead the upgrade and modernization of our existing Azure Data Factory pipelines to next-generation orchestration tools, implement efficient data ingress and egress patterns, establish AI/LLM-native data capabilities through advanced vector indexing and streaming architectures, and build a strong data engineering organization from the ground up. You will collaborate closely with cloud engineering, network engineering, and data products teams to architect a unified data lake and comprehensive data governance framework that supports diverse analytical and operational needs across our portfolio.
Key Responsibilities
Organization Building & Team Leadership
Build and scale the data engineering organization frominception, defining team structure, roles, and responsibilities across the function
Establish engineering culture emphasizing technical excellence, collaboration, ownership, and continuous learning
Recruit, mentor, and develop high-performing data engineers withexpertisein modern data platforms, ETL/ELT, orchestration, streaming, and vector databases
Partner with Human Resources on recruitment strategy, hiring processes, and organizational scaling as the firm grows
Strategic Vision & Roadmap
Establish a comprehensive, multi-year data engineering strategy aligned with firm objectives
Define technical roadmaps for data infrastructure, platform capabilities, and technology adoption
Establish governance frameworks for data engineering decisions, standards, and best practices
Lead technology evaluation and vendor selection processes with clear ROI and strategic fit
Platform Architecture & Modernization
Design and architect a modern, scalable data platform leveraging Databricks on Azure that supports both structured and unstructured data at petabyte scale
Lead the modernization of legacy Azure Data Factory (ADF) pipelines to production-grade orchestration platforms such as Prefect, or Apache Airflow
Develop a comprehensive upgrade and migration roadmap for ETL/ELT pipelines, ensuring zero data loss, minimal downtime, and improved observability
Lead the implementation of serverless and Zero ETL patterns toeliminateinfrastructure management overhead and reduce time-to-insight
Own cost optimization initiatives across the data platform, balancing performance, reliability, and operational efficiency
ETL/ELT & Orchestration Excellence
Build deepexpertisein Directed Acyclic Graph (DAG) principles and modern workflow orchestration patterns for reliable, scalable pipeline management
Evaluate, select, and implement best-in-class orchestration tools (Prefect, Airflow) that provide superior visibility, error handling, and data lineage tracking
Establish patterns for dynamic DAG generation, conditional execution, and advanced error recovery strategies
Design and enforce data quality frameworks within orchestration tools to catch issues at the pipeline level
Create monitoring, alerting, and observability solutions for 100%+ visibility into pipeline health and data freshness
Data Movement & Integration Patterns
Architect efficient data ingress patterns supporting high-volume, real-time, and batch data inflows from diverse sources (APIs, databases, cloud services, SaaS platforms)
Design sophisticated data egress patterns enabling secure, efficient data distribution to downstream systems, analytics tools, and external stakeholders
Implement change data capture (CDC) patterns and incremental processing strategies tooptimizeresource usage and reduce latency
Establish governance frameworks for data movement including encryption, authentication, and audit trails
Streaming & Real-Time Data Capabilities
Evaluate and implement streaming platforms (Kafka, Event Hubs, Kinesis) to support real-time analytics and operational use cases
Design event-driven architectures that enable low-latency decision-making and automated workflows
Build streaming ingestion pipelines that efficiently funnel data into thelakehousewhilemaintainingdata quality and lineage
AI & LLM-Native Data Infrastructure
Design and build vector database infrastructure to support LLM applications, including efficient indexing, similarity search, and retrieval-augmented generation (RAG) workflows
Establish patterns for embedding generation, vector storage optimization, and integration with vector databases
Build data pipelines that prepare unstructured data (documents, images, audio) for embedding and LLM consumption
Create governance and provenance tracking for embeddings and vector data to ensure transparency and compliance
Data Lake & Catalog Implementation
Lead the development and governance of a unified data lake,establishingdata quality standards, lineage tracking, and compliance frameworks
Support implementation ofa modern data catalog solution that enables data discovery, governance, and self-service analytics across the enterprise
Establish data engineering best practices, testing frameworks, production deployment pipelines, and operational standards
Cross-Functional Collaboration & Stakeholder Management
Pa
aresmgmt