Data Pipeline Development
- Design, build, and maintain ETL/ELT pipelines in Databricks to ingest, clean, and transform data from diverse product sources.
- Construct gold layer tables in the Lakehouse architecture that serve both machine learning model training and real-time APIs.
- Monitor data quality, lineage, and reliability using Databricks best practices.
AI-Driven Data Access Enablement
- Collaborate with AI/ML teams to ensure data is modeled and structured to support natural language prompts and semantic retrieval using 1st and 3rd party data sources, vector search and Unity Catalog metadata.
- Help build data interfaces and agent tools to interact with structured data and AI agents to retrieve and analyze customer data with role-based permissions.
API & Serverless Backend Integration
- Work with backend engineers to design and implement serverless APIs (e.g., via AWS Lambda with TypeScript) that expose gold tables to frontend applications.
- Ensure APIs are performant, scalable, and designed with data security and compliance in mind.
- Utilize Databricks and other APIs to implement provisioning, deployment, security and monitoring frameworks for scaling up data pipelines, AI endpoints, and security models for multi-tenancy.