🚀 Data Engineer (Python, SQL, ETL, Airflow, Snowflake, BigQuery)

Full-Time | Remote | U.S. Business Hours

💡 About the Role

We’re hiring a highly technical Data Engineer to build and maintain scalable data pipelines, cloud data infrastructure, and analytics-ready datasets that power business decision-making.

This role is focused on:
✅ ETL/ELT pipeline development
✅ Data warehouse architecture
✅ SQL optimization
✅ Cloud-based data infrastructure
✅ Pipeline reliability & monitoring
✅ Scalable analytics systems

You’ll work closely with:

Data Analysts
Data Scientists
Engineering Teams
BI & Leadership Teams

to ensure the organization always has accurate, clean, and trustworthy data.

If you:

enjoy building robust data systems,
love optimizing pipelines and queries,
and care deeply about data quality and scalability,

this role is a strong fit.

🔥 What You’ll Own

ETL / ELT Pipeline Development

Build and maintain scalable ETL/ELT pipelines using:

Python
SQL
Scala

Ingest data from:

APIs
SaaS platforms
relational databases
cloud applications
streaming systems

Develop reliable workflows for:

data extraction
transformation
loading
validation

Workflow Orchestration & Automation

Manage orchestration platforms such as:

Apache Airflow
Prefect
Dagster
Luigi

Monitor:

pipeline health
failed jobs
scheduling reliability

Build automated workflows with:

retries
alerting
dependency management

Data Warehousing & Modeling

Design and optimize cloud data warehouses using:

Snowflake
BigQuery
Redshift

Develop:

star schemas
snowflake schemas
analytics-ready data models

Improve:

query performance
clustering
partitioning
warehouse efficiency

Data Quality & Governance

Implement:

validation checks
anomaly detection
logging systems
lineage tracking

Use tools such as:

dbt
Great Expectations

Ensure:

consistent naming conventions
clean transformations
audit-ready datasets

Support compliance requirements:

GDPR
HIPAA
industry-specific governance standards

Streaming & Real-Time Data

Build and maintain streaming pipelines using:

Kafka
Kinesis
Pub/Sub

Support:

real-time ingestion
event-driven processing
low-latency analytics workflows

Infrastructure & DevOps

Containerize services using:

Docker
Kubernetes

Build CI/CD workflows with:

GitHub Actions
Jenkins
GitLab CI

Manage cloud infrastructure using:

Terraform
CloudFormation

Improve scalability, reliability, and deployment automation

Cross-Functional Collaboration

Partner with:

analysts
data scientists
BI teams
product teams

Deliver curated datasets for:

dashboards
analytics
machine learning workflows

Support BI tools such as:

Tableau
Looker
Power BI

Maintain documentation for:

pipelines
schemas
workflows
data definitions

✅ Required Experience & Skills

3+ years of Data Engineering or backend engineering experience
Strong proficiency with:

Python
SQL

Experience with:

Snowflake
BigQuery
Redshift

Familiarity with:

Airflow
Prefect
workflow orchestration tools

Strong understanding of:

ETL pipelines
data modeling
cloud infrastructure
warehouse optimization

⭐ Ideal Experience

Experience using:

dbt
Great Expectations
data lineage tools

Streaming experience with:

Kafka
Kinesis
Pub/Sub

Experience with:

AWS Glue
GCP Dataflow
Azure Data Factory

Background in:

healthcare
fintech
regulated environments

Experience optimizing large-scale warehouse costs and performance

🧠 What Makes You a Great Fit

You care deeply about clean and reliable data
You enjoy debugging complex pipeline and infrastructure issues
You think about scalability and long-term maintainability
You combine engineering rigor with analytical thinking
You communicate effectively across technical and non-technical teams

📅 What a Typical Day Looks Like

Review Airflow/Prefect pipeline health and resolve failures
Build connectors for new APIs or SaaS platforms
Optimize SQL queries and warehouse performance
Collaborate with analysts and data scientists on datasets
Improve validation and monitoring systems
Document pipelines and warehouse structures
Reduce warehouse costs and improve pipeline reliability

In short:
You build the data infrastructure that powers analytics, reporting, automation, and business intelligence across the organization.

📊 Key Success Metrics (KPIs)

Pipeline uptime ≥ 99%
Data freshness within SLA
Zero critical data quality issues reaching production
Query performance & warehouse cost optimization
Reliable and scalable pipeline infrastructure
Positive feedback from analysts, BI teams, and leadership

🌟 Why This Role Stands Out

Work on modern cloud-native data infrastructure
Build scalable ETL and analytics systems
Exposure to:

streaming pipelines
cloud data platforms
orchestration frameworks
warehouse optimization

Opportunity to grow into:

Senior Data Engineer
Analytics Engineering
Platform Engineering
Data Architecture

Fully remote flexibility with collaborative engineering teams

🧪 Interview Process

Initial Phone Screen
Video Interview with Pavago Recruiter
Technical Task
(Build a small ETL pipeline or optimize a SQL query)
Client Interview with Engineering/Data Team
Offer & Background Verification

👉 Apply Now