ABOUT THE JOB
You’re a problem solver at heart. You thrive at the intersection of engineering, computer science, and data, you’re motivated by questions that don’t have obvious answers, and you love to architect and build reliable systems that enable whole organizations to build world class analytics. You bring a background in engineering, computer science, physics, applied math, or another hard science discipline, and you enjoy applying that technical foundation to real-world data challenges.
You are energized by ambiguity, obsessed with understanding how complex systems behave, and capable of breaking down big problems into tractable iterations. You ask great questions, validate assumptions with data, and are relentless in your pursuit of signal over noise.
WHAT YOU’LL DO
- Build and maintain ETL/ELT pipelines using Python and SQL.
- Develop ingestion workflows with AWS Firehose, S3, and related services.
- Create and optimize dbt models, tests, and incremental logic.
- Tune Snowflake queries and warehouse usage for cost and performance.
- Operate and improve Airflow DAGs for reliable execution and monitoring.
- Maintain high data quality, data integrity, and pipeline SLA commitments.
- Bring clarity to ambiguous requirements and propose practical solutions.
- Build feature pipelines to support ML workflows.
- Support model deployment, monitoring, and automated retraining.
- Add data validation and quality checks across ML and analytics pipelines.
REQUIRED QUALIFICATIONS
- 3+ years of experience in data engineering or software engineering.
- Strong Python and SQL skills.
- Hands-on experience with Snowflake, AWS Firehose/S3, Airflow, and dbt.
- Ability to work independently and execute in a dynamic environment.
- Strong problem-solving skills and attention to detail.
PREFERRED QUALIFICATIONS
- Experience with geospatial data (e.g., spatial joins, geometry processing, or geospatial libraries).
- Experience with ML or MLOps pipelines.
- Knowledge of Snowflake streams, tasks, and performance tuning.
- Experience with large-scale or semi-structured datasets.