Position Type: Full-Time, Remote
Working Hours: U.S. client business hours (with flexibility for pipeline monitoring, deployments, and data refresh cycles)
Our client is seeking a Data Engineer to design, build, and maintain scalable data infrastructure and reliable data pipelines that power analytics, reporting, and operational decision-making across the business.
This role requires strong software engineering fundamentals, deep experience with modern data stacks, and a passion for building clean, reliable, and high-performance data systems. The Data Engineer will ensure data flows seamlessly from source systems into warehouses, dashboards, and downstream applications while maintaining high standards for quality, governance, and scalability.
The ideal candidate is analytical, detail-oriented, and comfortable working across engineering, analytics, and business teams to deliver trustworthy and actionable data.
• Build, maintain, and optimize ETL/ELT pipelines using Python, SQL, or Scala
• Orchestrate workflows using Airflow, Prefect, Dagster, or similar orchestration tools
• Ingest structured and unstructured data from APIs, SaaS platforms, databases, files, and streaming systems
• Develop scalable connectors and automated ingestion workflows
• Manage and optimize cloud data warehouses such as Snowflake, BigQuery, or Redshift
• Design scalable schemas using star and snowflake modeling techniques
• Implement partitioning, clustering, indexing, and performance optimization strategies
• Build clean, analytics-ready datasets for business intelligence and reporting use cases
• Implement validation checks, anomaly detection, logging, and monitoring to ensure data integrity
• Enforce naming conventions, lineage tracking, and documentation standards using tools such as dbt or Great Expectations
• Maintain audit-ready data processes and ensure compliance with GDPR, HIPAA, or industry-specific requirements
• Monitor pipeline health and proactively resolve failures or inconsistencies
• Build and manage real-time data pipelines using Kafka, Kinesis, Pub/Sub, or similar platforms
• Support low-latency ingestion and event-driven architectures for time-sensitive applications
• Monitor streaming infrastructure and optimize throughput and reliability
• Partner closely with analysts, data scientists, and business stakeholders to deliver reliable datasets
• Support dashboard and reporting initiatives across Tableau, Looker, or Power BI
• Translate business requirements into scalable data solutions and models
• Maintain clear technical documentation for pipelines, schemas, and workflows
• Containerize data services using Docker and manage deployments through Kubernetes when applicable
• Automate deployments using CI/CD pipelines such as GitHub Actions, Jenkins, or GitLab CI
• Manage cloud infrastructure using Terraform, CloudFormation, or similar Infrastructure-as-Code tools
• Continuously optimize performance, scalability, reliability, and cloud costs
• Passionate about building clean, reliable, and scalable data systems
• Strong debugging and problem-solving mindset with high attention to detail
• Balance of software engineering discipline and analytical thinking
• Comfortable working cross-functionally with technical and non-technical stakeholders
• Proactive communicator who takes ownership of data quality and reliability
• 3+ years of experience in Data Engineering, Back-End Engineering, or Data Infrastructure roles
• Strong proficiency in Python and SQL
• Experience with at least one modern data warehouse (Snowflake, Redshift, BigQuery)
• Hands-on experience with orchestration tools such as Airflow or Prefect
• Strong understanding of ETL/ELT pipelines, data modeling, and data transformation workflows
• Familiarity with cloud platforms such as AWS, GCP, or Azure
• Experience with dbt for data modeling and transformation management
• Streaming and event-driven data pipeline experience (Kafka, Kinesis, Pub/Sub)
• Experience with cloud-native data services such as AWS Glue, GCP Dataflow, or Azure Data Factory
• Familiarity with Docker, Kubernetes, Terraform, or CI/CD workflows
• Background in regulated industries such as healthcare, fintech, or enterprise SaaS
• Experience optimizing warehouse costs and query performance at scale
A Data Engineer’s day revolves around maintaining reliable pipelines, improving data quality, and enabling teams with scalable access to trustworthy data. You will:
• Monitor pipeline health and troubleshoot failed jobs in Airflow or related orchestration systems
• Build and maintain ingestion pipelines for APIs, SaaS platforms, and operational databases
• Optimize SQL queries and warehouse performance to improve efficiency and reduce cloud costs
• Collaborate with analysts and data scientists to provide curated datasets for reporting and modeling
• Implement validation checks and monitoring to prevent downstream data quality issues
• Document data models, transformations, and workflows to ensure scalability and maintainability
In essence: you ensure the organization has accurate, timely, and reliable data powering operational, analytical, and strategic decisions.
• Pipeline uptime ≥ 99%
• Data freshness maintained within agreed SLAs
• Zero critical data quality issues reaching downstream reporting systems
• Improved warehouse query performance and cost optimization
• Timely delivery of scalable and reliable datasets
• Positive feedback from analysts, data scientists, and business stakeholders
• Initial Phone Screen
• Video Interview with Pavago Recruiter
• Technical Assessment (e.g., build a small ETL pipeline or optimize a SQL query)
• Client Interview with Engineering/Data Team
• Offer & Background Verification
#DataEngineer #ETL #DataPipelines #BigQuery #Snowflake #Redshift #Airflow #Python #SQL #CloudData #AnalyticsEngineering #DataInfrastructure #RemoteWork #DataEngineeringJobs