Who We Are
Cortex empowers internal teams to efficiently leverage ML by providing a platform and by unifying, educating, and advancing the state of the art in ML technologies within Twitter. We win when our customers win by helping our users stay informed, share and discuss what matters; by serving the public conversation. We’re building an AI-first company and every major initiative is increasingly dependent on the successful application of machine learning. Cortex is at the nexus of this evolution.
Our team of ML software engineers are constructing one of the strongest machine learning platforms in the world by marrying the latest ML industry practices with engineering excellence and the need to perform at Twitter scale. Our customers are all the ML engineers at Twitter and our goal is to provide a unified tooling ecosystem that allows these engineers to focus on what they are good at, building ML models with novel approaches, and abstract the way the complexities of bringing these models into a production environment.
We care deeply about:
Engineering excellence such as good design abstractions, API stability, unit testing, leading best practices for other engineers to follow, and solid documentation.
Staying abreast and compatible with a quickly shifting technology landscape for ML platform components and related open source solutions.
Creating the best ML Platform environment for Twitter that provides an exceptional developer experience for our engineering customers.
Encouraging engineering creativity and innovative solutions
Our Current projects include:
Establishing Kubeflow as a managed offering at Twitter
Enabling and sustaining GCP Infra/Platform components for broader use in Cortex platform; e.g. AI Platform, Dataflow, Data Proc, etc.
Improving Operations of essential ML Platform services
What You'll Do
If this sounds like a team you want to be part of, great! We are looking for engineers who are passionate about writing code, have a desire to learn new technologies, love working in collaborative teams, and are committed to serving their customers.
Your responsibilities include:
Informing and accelerating GCP Infrastructure adoption best practices (sustaining and improving User Onboarding, IAM, Image Management, Twitter Systems Integrations, Security et al)
Absorbing existing SRE/Operational support scopes (GPU Cluster Management, OS/Kernel Upgrades, RPM/Python Dependency Management, Bare Metal Host Management/Puppet Manifests, etc)
Partnering and supporting existing Cortex Platform teams with Operational guidance and expertise on various project initiatives
Creating tools and automation for Operational support and management for DS/ML use cases
Supporting various users and developers with operational issues (e.g. “I’m having trouble scheduling GPU jobs with Persistent Volumes”)
Maintaining the version updates of Tensorflow / PyTorch et al
Partner with Twitter’s Platform and Data Platform orgs to improve, enhance and influence direction and integration opportunities
Partner with teams to improve, enhance and integrate with the company’s GCP Adoption & Management strategy