Site Reliability Engineer - Compute
Who We Are:
SREs work on improving the availability, scalability, performance and reliability of Twitter’s production services. Come join us. We work shoulder-to-shoulder with our engineering teams to design and build the next generation of cloud and systems infrastructure, focusing on automation, availability and performance, and above all efficiency at ‘reach every user on the planet’ scale.
What You’ll Do:
As a Site Reliability Engineer for the Compute Team at Twitter you will be working to improve the reliability and performance of the compute platform that underlies most of the services that run Twitter. You will dive deep into gnarly operational issues; from software, systems, automation, and process perspectives. You will understand the challenges around integrating disparate infrastructures into new facilities, processes and procedures.
You will work with open-source technologies and the SRE community.
You will actively participate in the vision to move away from high operational cost tasks such as break/fix, cluster migrations, new service buildouts, abuse, etc. You will contribute to services that can shrink and expand based on demand, self heal, automatically rollout, etc.
Your responsibilities include but are not limited to:
- Performing deep dives into both systemic and latent reliability issues; partner with software and systems engineers across the organization to produce and roll out fixes.
- Troubleshooting issues across the entire stack: hardware, software, application and network.
- Driving standardization efforts across multiple disciplines and services in conjunction with embedded SREs throughout the organization.
- Mentoring SREs on standard methodology & best practices for everything from monitoring to troubleshooting complex code issues.
- Evaluating emerging technologies and potentially integrating them into the Twitter stack.
- Identifying and driving opportunities to improve automation for the company; scope and create automation for deployment, management and visibility of our services.
- Participating in code reviews for projects primarily written in Python, Java and Scala, built on open source libraries such as Finagle, and running on both physical and virtualized platforms.
- Representing the SRE organization in design reviews and operational readiness exercises for new and existing services.
Who You Are:
- Solid understanding of systems and application design, including the operational trade-offs of various designs.
- Practical knowledge of various aspects of service design like messaging protocols & behavior, caching strategies and software design practices.
- Practical, solid knowledge of shell scripting and at least one higher-level language (Python preferred, practical experience in Java and Scala are most welcome).
- Demonstrable knowledge of TCP/IP, HTTP, web application security, and experience supporting multi-tier web application architectures.
- Strong level understanding of Linux servers, specifically RHEL/CentOS.
- Comfortable configuring DNS, DHCP, and LAN/WAN technologies.
- Minimum 2+ years of handling services in a large scale environment.
- Work well with and be able to influence a myriad of personalities at all levels.
- Ability to prioritize tasks and work independently.
- Be adaptable and able to focus on the simplest, most efficient & reliable solutions.
- Track record of successful practical problem solving, excellent written and social communication, and documentation skills.
- Experience with existing projects such as Kubernetes, Scribe, ZooKeeper, Mesos and Aurora.
- Experience with current cloud offerings like GCP and AWS.
- B.S. in computer science or similar field or equivalent experience.
- Ability to lead technical teams through design and implementation across an organization.
We are committed to an inclusive and diverse Twitter. Twitter is an equal opportunity employer. We do not discriminate based on race, ethnicity, color, ancestry, national origin, religion, sex, sexual orientation, gender identity, age, disability, veteran status, genetic information, marital status or any other legally protected status.
San Francisco applicants: Pursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.
Engineering Hiring Process
Once your application is received, a recruiter will reach out pending your qualifications are a match for the role.
If your background is a match, you may have 1-2 technical phone interviews or be given the chance to provide a work sample depending on the role.
If the phone interviews go well or your work sample is strong, the final step includes interviews with 5-6 people held onsite in our office.
We're the People Team @Twitter. We're hiring service, purpose-driven people who are creative and move fast. All things Twitter Careers! #LoveWhereYouWork
We're your one stop shop for anything University related. That means campus outreach, student advice/tips, & of course, our University Recruiting efforts!