Role Overview
We are looking for a highly skilled Machine Learning Engineer who can design, build, and own end-to-end ML systems in production. This role requires a strong blend of machine learning expertise, backend engineering, and full-stack development, with a focus on building reliable, scalable platforms used by leadership and critical business functions.
Key Responsibilities
- Design, develop, and maintain end-to-end machine learning pipelines, including data ingestion, training, evaluation, deployment, monitoring, and retraining.
- Build and own production-grade ML services that are reliable, scalable, and fault-tolerant.
- Architect and manage async workflows and API-driven systems for ML and data services.
- Integrate ML solutions into complex production environments and distributed systems.
- Design robust systems with a strong focus on failure modes, observability, and guardrails to ensure reliability.
- Develop internal analytical tools used by leadership and cross-functional teams for decision-making.
- Develop interactive internal ML tools and dashboards using Streamlit for model insights, monitoring, and experimentation.
- Experience with cloud platforms (AWS, GCP, Azure).
- Collaborate with data scientists and stakeholders to deliver impactful solutions.
Required Skills & Qualifications
Core Engineering Skills
- Strong proficiency in Python, SQL, and building RESTful APIs
- Experience with asynchronous programming and workflows
- Solid understanding of software engineering best practices: Version control (bitbucket), Unit and integration testing, Code quality and maintainability
Machine Learning & MLOps
- Build or integrate data ingestion pipelines (batch or streaming)
- Experience in performing EDA and understand the analysis.
- Proven experience managing the full ML lifecycle.
- Hands-on experience with MLOps practices and tools:
- Experiment tracking
- Model versioning
- Automated training and deployment pipelines
- CI/CD for ML systems
Systems, Infrastructure & Orchestration
- Experience building scalable and reliable ML systems in production
- Familiarity with:
- Containerization (Docker)
- Orchestration platforms (e.g., Kubernetes, Airflow, Prefect, Dagster)
- Infrastructure as Code (IaC)
- Experience with distributed data processing systems (e.g., Spark)
- Understanding of workflow orchestration and scheduling for ML pipelines
Full Stack Development
- Experience developing end-to-end applications, including:
- Backend pipelines and services
- Frontend/UI components
- Hands-on experience building internal ML dashboards and tools using Streamlit
- Ability to create intuitive interfaces for monitoring models, exploring data, and enabling stakeholder interaction