Site Reliability Engineer - Core Infrastructure Services
Who We Are
SREs work on improving the availability, scalability, performance and reliability of Twitter’s production services.
Twitter is looking for a Site Reliability Engineer to join our Core Infrastructure Services SRE (CISS) team. Our team builds, owns, and operates services such as Puppet, LDAP, Kerberos, NTP, Internal DNS, and a suite of Provisioning tools and services.
We enable our large clusters of servers and the teams that manage them to be built and operated safely, securely, and expertly. Our mission is to improve infrastructure effectiveness and increase efficiency for all core services used by the various infrastructure teams at Twitter.
Some examples of recently completed projects on the team include building a re-usable bare metal monitoring framework that allows service owners to very easily monitor their systems using our in-house Monitoring platform, upgrading our Puppet infrastructure to a more modern version and toolset, making our Internal DNS infrastructure more reliable through tooling and configuration changes, and automating various day-to-day operational tasks through self-serve tools, documentation, and cross-team trainings.
What You'll Do
- You will perform analysis, troubleshooting, and introspection on core infrastructure components
- You will partner with teams from across the organization to help tackle hard problems
- You will help drive standardization efforts across multiple disciplines
- You will ensure reliability of the existing core infrastructure systems to guarantee 99.99% uptime
- You will tackle issues across the entire stack: hardware, software, network and application
- You will develop new software-based solutions to infrastructure engineering problems
Who You Are
- You have an expert understanding of Linux systems and services
- You understand and have a strong interest in systems and application design
- You have the knowledge of various aspects of service design: including messaging protocols & behavior, caching strategies and software design practices
- You are familiar with and have practically applied shell scripting and at least one higher-level language to real-world problems
- You are able to prioritize tasks and work independently
- You can adapt and focus on the simplest, most efficient & reliable solutions
- You have excellent written communication, interpersonal communication, and documentation skills
- Public Cloud experience with AWS, GCP, or Rackspace
- Advanced knowledge of Python to be able to build, write, and support complex services
- Functional knowledge of bootstrapping tools like PXE or cloud-init that enable effective hardware lifecycle management
- Experience with configuration management tools: Puppet, Chef, or Ansible
Come Join Us
Do you enjoy working with customers to identify problems and proposing solutions to fix them both in the short-term and long-term?
Are you able to hold the standard high for code review and code quality for infrastructure while balancing the need to ship and iterate?
If you like working in an independent environment where you get to define requirements, work directly with other teams, and drive projects from conception to completion and long-term ownership, come join our team.
We are committed to an inclusive and diverse Twitter. Twitter is an equal opportunity employer. We do not discriminate based on race, color, ethnicity, ancestry, national origin, religion, sex, gender, gender identity, gender expression, sexual orientation, age, disability, veteran status, genetic information, marital status or any legally protected status.
San Francisco applicants: Pursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.
Engineering Hiring Process
Once your application is received, a recruiter will reach out pending your qualifications are a match for the role.
If your background is a match, you may have 1-2 technical phone interviews or be given the chance to provide a work sample depending on the role.
If the phone interviews go well or your work sample is strong, the final step includes interviews with 5-6 people held onsite in our office.