Senior Site Reliability Engineer - Platform Infrastructure
SREs are Software and Systems Engineers who specialize in large-scale distributed systems, low-level systems, and associated automation, tooling, and processes. They possess a breadth and depth of knowledge about Twitter’s production environment that allows them to design and write software with appropriate operability, reliability, scalability, and performance considerations to reduce operational overhead.
As a Site Reliability Engineer (SRE) in Twitter’s Platform team, you will work to improve the reliability and performance of the next-generation of distributed systems. You will partner with our product engineering teams to design, build, operate, and automate distributed storage systems at the heart of Twitter’s infrastructure that are used by millions of people.
• Build tooling to improve the automation of operations. This includes automatic failure detection and remediation, application deployment, OS/Kernel/JVM/Firmware deployment, capacity planning, and fleet management.
• Diagnose, and troubleshoot complex distributed systems handling millions of queries per second, petabytes of data, and develop solutions that have a significant impact at our massive scale.
• Collaborate with SWE teams to sustain and optimize the availability, reliability, and performance of production services.
• Work and collaborate with the diverse hardware, software and networking teams throughout the company to design next-generation distributed storage platforms.
• Troubleshoot issues across the entire stack - hardware, software, application and network.
• Participate in a 24x7 on-call rotation.
• 5+ years of managing services in a distributed, internet-scale *nix environment.
• Practical knowledge of at least one programming language (Python, Go, Java, Ruby, C++).
• Demonstrable knowledge of Linux operating system internals, TCP/IP, filesystems, disk/storage technologies.
• Familiarity with systems management tools (Puppet, Chef, Capistrano, Ansible, etc)
• Hands-on operational experience managing and performance tuning JVM based services.
• Ability to prioritize tasks and work independently
• Track record of practical problem solving, excellent communication, and documentation skills
• BS degree in Computer Science or Engineering, or equivalent experience.By applying for this role, you could choose to work in the following locations:
Engineering Hiring Process
Once your application is received, a recruiter will reach out pending your qualifications are a match for the role.
If your background is a match, you may have 1-2 technical phone interviews or be given the chance to provide a work sample depending on the role.
If the phone interviews go well or your work sample is strong, the final step includes interviews with 5-6 people held onsite in our office.