Who We Are
Twitter Site Reliability Engineering (SRE) scales Twitter to serve the public conversation around the globe. We inspire engineering confidence by systematically making services reliable and efficient, and ensuring changes are safe and fast.
Compute SRE is responsible for maintaining the availability and reliability of Twitter’s computing platforms. We believe that reliability is the most important feature; without it, other features don’t matter. We use a blend of systems engineering, software development, and architectural skill to do our work, and deeply value collaboration and empathy.
What You’ll Do
Twitter offers engineers the unique opportunity to personally make a noticeable difference at a company that makes a difference in the world. In this role, you will support Twitter’s mission by ensuring the successful operations, and continuous improvement, of our internal computing platforms. You will operate at scale, understanding engineering needs and working constantly to automate into the future. Your contributions to the team will help maintain a high operational standard through effective monitoring, SLO development, and incident response.
Your responsibilities include, but are not limited to:
Serve as a steward of Twitter’s production environment through providing on-call support, incident response, collaborative debugging, and continuous learning via blameless postmortems.
Implement systemic improvements to reliability and operational excellence of Twitter’s compute clusters.
Collaborate with engineers in peer teams to develop solutions that work effectively in Twitter’s ecosystem.