Site Reliability Engineer - Data Infrastructure / Hadoop



Site Reliability Engineers work on improving the availability, scalability, performance and reliability of Twitter’s production services. Come join us.

Who We Are:

  • As a member of the organization you will be dedicated to improving the reliability of our end-to-end data infrastructure. Your work will integrate directly with Twitter's products.

  • Our core infrastructure receives hundreds of millions of tweets per day and serves tens of billions of API requests. We also serve over 2+ billion search queries per day, render millions of ad impressions, and process hundreds of terabytes of log and interaction data daily.

  • We dive deep into gnarly operational issues; from the software, systems, automation, and process perspectives. We will understand the challenges around integrating disparate infrastructures into a new facility, processes and procedures.

  • We work with open-source technologies and get involved with SRE and Hadoop community.

  • We actively participate in the vision to move away from high operational cost tasks such as break/fix, cluster migrations, new service buildouts, abuse, etc. You will contribute to services that can shrink and expand based on demand, self heal, automatically rollout, etc.

  • We will train and invest in our team members to make sure that they are successful in supporting large variety of system and products that Twitter use.

Your responsibilities include but are not limited to:

  • You will use your expertise to improve the reliability and performance of Hadoop clusters and data management services.

  • You will participate in and build tools to diagnose, and fix complex distributed systems handling 10s of petabytes of data and drive opportunities to improve automation for the company, scope and create automation for deployment, management and transparency of our services.

  • You will tackle issues across the entire stack - hardware, software, application and network.

  • You will test, monitor, administer, and operate multiple clusters across data centers, primarily in Python and Java.

  • You will take part in 24x7 on-call / support rotation.

Who You Are:

  • Minimum 3+ years of handling services in a large scale distributed systems environment, preferably Hadoop.

  • Familiarity with systems management tools (Puppet, Chef, Capistrano, etc)

  • Knowledge of Linux operating system internals, filesystems, disk/storage technologies and storage protocols and networking stack.

  • Proven knowledge of systems programming (bash and shell tools) and/or at least one scripting language (Python, Ruby, Perl, Scala).

  • Track record of practical problem solving, excellent communication, and documentation skills

  • Proven understanding of systems and application design, including the operational trade-offs of various designs.

  • Work well with and be able to influence a myriad of personalities at all levels.

  • Be adaptable and able to focus on the simplest, most efficient & reliable solutions.


  • Experience with HDFS, YARN and related hadoop technologies.

  • Ability to lead technical teams through design and implementation across an organization.

  • B.S. in computer science or similar field.


We are committed to an inclusive and diverse Twitter. Twitter is an equal opportunity employer. We do not discriminate based on race, color, ethnicity, ancestry, national origin, religion, sex, gender, gender identity, gender expression, sexual orientation, age, disability, veteran status, genetic information, marital status or any legally protected status.


Hiring Process

Step 1

After you apply, a recruiter may reach out to you for an introductory call.

Step 2

If your background is a match for the role, you may phone interview with 1-2 people.

Step 3

If you continue through the process, you will come onsite 1-2 times to interview with a total of 5-10 people.


Personal Information

This field is required.
This field is required.
This field is required.
This field is required.
Required field. PDFs only; max file size is 1MB.
Required field. PDFs only; max file size is 1MB.

Twitter does not accept and unsolicited resumes from recruiting agencies and will not pay fees associated with any such resumes. Agencies, please do not send resumes to any Twitter location, employee, or email address.

Thanks for applying!
Submission failed. Please make sure all fields are correctly formatted.

Don't see the right fit?

Check out other opportunities at Twitter.