Our stack runs almost exclusively on Google Cloud Platform. You will work in an environment made up of Data Lakes (BigQuery, etc.), orchestration and scheduling platforms (Airflow), container-oriented deployment, and management platforms (Docker, K8S, JenkinsX), SQL, Data Quality Tools (DBT, Sifflet). You will also participate in data modeling activities and design of data flows until their implementation and support in production.
- Expose the data through various means such as datamarts, and flat files for both internal and external users.
- Build complex and efficient SQL queries to transform data within our data lake into reliable business entities and reporting aggregates. Identify and manage dependencies for these transformations, scheduling them using tools like Airflow.
- Investigate discrepancies and quality issues in the data, as well as addressing performance issues.
- Design optimized and cost-efficient data models in BQ while addressing business use cases.
- Ensure data cleanliness, consistency, and availability by performing data quality checks and implementing monitoring.
- Catalog and document various aspects of the data, including business entities, datamarts, dimensions, metrics, and business rules.
- Serve as a subject matter expert on business entities and datamarts, providing training to users on SQL and analytics best practices (collaboration with Business Insight).
- Innovate by proposing new tools, processes, documentation, and exploring emerging technologies during designated cool-down periods.