Site Reliability Engineer - Hadoop / Data Platforms
Site Reliability Engineers work on improving the availability, scalability, performance and reliability of Twitter’s production services. Come join us.
Who We Are:
- As a member of the organization you will be dedicated to improving the reliability of our end-to-end data infrastructure. Your work will integrate directly with Twitter's products.
- Our core infrastructure receives hundreds of millions of tweets per day and serves tens of billions of API requests. We also serve over 2+ billion search queries per day, render millions of ad impressions, and process hundreds of terabytes of log and interaction data daily.
- We dive deep into gnarly operational issues; from the software, systems, automation, and process perspectives. We will understand the challenges around integrating disparate infrastructures into a new facility, processes and procedures.
- We work with open-source technologies and get involved with SRE and Hadoop community.
- We actively participate in the vision to move away from high operational cost tasks such as break/fix, cluster migrations, new service buildouts, abuse, etc. You will contribute to services that can shrink and expand based on demand, self heal, automatically rollout, etc.
- We will train and invest in our team members to make sure that they are successful in supporting large variety of system and products that Twitter use.
Your responsibilities include but are not limited to:
- You will use your expertise to improve the reliability and performance of Hadoop clusters and data management services.
- You will participate in and build tools to diagnose, and fix complex distributed systems handling 10s of petabytes of data and drive opportunities to improve automation for the company, scope and create automation for deployment, management and transparency of our services.
- You will tackle issues across the entire stack - hardware, software, application and network.
- You will test, monitor, administer, and operate multiple clusters across data centers, primarily in Python and Java.
- You will take part in 24x7 on-call / support rotation.
Who You Are:
- Minimum 3+ years of handling services in a large scale distributed systems environment, preferably Hadoop.
- Familiarity with systems management tools (Puppet, Chef, Capistrano, etc)
- Knowledge of Linux operating system internals, filesystems, disk/storage technologies and storage protocols and networking stack.
- Proven knowledge of systems programming (bash and shell tools) and/or at least one scripting language (Python, Ruby, Perl, Scala).
- Track record of practical problem solving, excellent communication, and documentation skills
- Proven understanding of systems and application design, including the operational trade-offs of various designs.
- Work well with and be able to influence a myriad of personalities at all levels.
- Be adaptable and able to focus on the simplest, most efficient & reliable solutions.
- B.S. in computer science or similar field or equivalent experience.
- Experience with HDFS, YARN and related hadoop technologies.
- Ability to lead technical teams through design and implementation across an organization.
We are committed to an inclusive and diverse Twitter. Twitter is an equal opportunity employer. We do not discriminate based on race, ethnicity, color, ancestry, national origin, religion, sex, sexual orientation, gender identity, age, disability, veteran status, genetic information, marital status or any other legally protected status.
San Francisco applicants: Pursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.
Engineering Hiring Process
Once your application is received, a recruiter will reach out pending your qualifications are a match for the role.
If your background is a match, you may have 1-2 technical phone interviews or be given the chance to provide a work sample depending on the role.
If the phone interviews go well or your work sample is strong, the final step includes interviews with 5-6 people held onsite in our office.
We're the People Team at Twitter. We Tweet about who we're hiring, what we're doing, and why you should work at Twitter! #LoveWhereYouWork
We're your one stop shop for anything University related. That means campus outreach, student advice/tips, & of course, our University Recruiting efforts!