Senior Site Reliability Engineer, Storage Infrastructure (Cache)

The Position

Twitter developed and continually improves a large-scale storage platform. SREs ensure availability of the environment, with a watchful eye on security, capacity, and performance. This group writes software to improve service reliability and handle platform growth. Our tools and services reduce operational overhead and improve performance.

As a Site Reliability Engineer (SRE) in Twitter’s Storage Infrastructure team, you will work to improve the reliability and performance of the next generation of distributed systems and containerized deployments. This team ensures the availability of in-memory data services (including Redis and Memcached), caching content from foundation storage platforms. You will partner with product engineering teams to design, build, operate, and automate distributed storage services at the heart of Twitter’s infrastructure used by millions of people.

We are looking for software engineers that are passionate about reliability, performance, and efficiency, and that have experience building tools, services, and automation to manage and improve production services.

Responsibilities:

  • Build tooling to improve the operations automation. This includes automatic failure detection and remediation, application deployment, OS/kernel deployment, capacity planning, and fleet management.
  • Diagnose, and troubleshoot complex distributed systems handling millions of queries per second, petabytes of data, and develop solutions that have a significant impact at our massive scale.
  • Collaborate with software engineers to sustain and optimize service availability, reliability, and performance.
  • Work and collaborate with the diverse hardware, software, and networking teams throughout the company to craft next-generation distributed storage platforms.
  • Troubleshoot issues across the entire stack - hardware, software, application, and network.
  • Sustain data privacy and service security compliance.
  • Participate in a 24x7 on-call rotation.

Qualifications

  • 5+ years experience managing services in a distributed, internet-scale *nix environment.
  • Practical knowledge of at least one programming language (Python, Go, Java, C).
  • Proven knowledge of Linux operating system internals and TCP/IP networking; containerization a plus.
  • Familiarity with systems management tools (Puppet, Chef, Ansible, etc).
  • Ability to prioritize tasks and work independently.
  • Track record of practical problem solving, excellent communication, and documentation skills.
  • BS degree in Computer Science or Engineering, or equivalent experience.

Company Description

Twitter is what’s happening and what people are talking about right now. For us, life's not about a job, it's about purpose. We feel real change starts with conversation. Here, your voice matters. Come as you are and together we'll do what's right (not what's easy) to serve the public conversation.

Additional Information

A few other things we value:

  • Challenge - We solve some of the industry’s hardest problems. Come to be challenged, learn, and thrive as an engineer.
  • Diversity - Diversity makes us a better organization and team. We value diverse backgrounds, ideas, and experiences.
  • Work, Life, Balance - We work hard, but we believe with hard work should come balance.

We will ensure that individuals with disabilities are provided a reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request an accommodation.

Team

Infrastructure Engineering, Software Engineering

Location

Seattle, Remote US, Atlanta, New York City, Los Angeles, Chicago, Sacramento

 

Application

U.S. Equal Employment Opportunity information (Completion is voluntary)
Non U.S. Equal Employment Opportunity information (Completion is voluntary)
Privacy and data