Posted 3 years ago
Responsibilities
- Design and build effective, user-friendly infrastructure, tooling, and automation to accelerate Machine Learning
- Collaborate with teams to drive the ML technical roadmap
- Collaborate with Machine Learning Engineers and Product Managers to develop tools to support experimentation, training and production operations
- Build and maintain data pipelines using tools like Hadoop, Python, Airflow, and Kafka
- Offer support and troubleshooting assistance for the ML pipeline, while continuously improving stability along the way
- Build and maintain systems employing an Infrastructure-as-Code approach
- Own the AWS stack which comprises all ML resources
- Establish standards and practices around MLOps, including governance, compliance, and data security
- Collaborate on managing ML infrastructure costs
Skills And Qualifications
- 3+ years of experience with ML infrastructure and ML DevOps
- 5+ years of overall engineering experience in distributed systems and data infrastructure
- 3+ years’ experience coding in Python (preferred) or other languages like Java, C#, Golang etc.
- Experience working with ML engineers to build tooling and automation to support the entire ML engineering lifecycle, from experimentation to production operations
- Experience with Kubernetes and ML CI/CD workflows
- 3+ years’ experience with AWS or other public cloud platforms (GCP, Azure, etc.)
- Excellent verbal and written communication skills.