Site Reliability Engineer

Bengaluru

Who We Are:

SREs work on improving the availability, scalability, performance and reliability of Twitter’s production services. Come join us.

What You’ll Do:

As a member of the organization you will be dedicated to improving the reliability of our end-to-end platform.  Your work will integrate directly with Twitter's products.

Our core infrastructure receives hundreds of millions of tweets per day and serves tens of billions of API requests. We also serve over 2+ billion search queries per day, render hundreds of millions of ad impressions, and process hundreds of terabytes of log and interaction data daily.

You will dive deep into gnarly operational issues; from the software, systems, automation, and process perspectives. You will understand the challenges around integrating disparate infrastructures into a new facility, processes and procedures.

You will work with open-source technologies and the SRE community.

You will actively participate in the vision to move away from high operational cost tasks such as break/fix, cluster migrations, new service buildouts, abuse, etc.  You will contribute to  services that can shrink and expand based on demand, self heal, automatically rollout, etc.

SRE is hiring for the following applications teams:

  • App Services -  our core services handling users, tweets and more

  • Storage infrastructure - our  next-generation distributed cache and storage systems

  • Core Infrastructure System - our  internal core infrastructure services (provision engineering stack, DNS, Puppet, LDAP, Subversion, Kerberos etc.),

  • Database Engineering - our relational stores like MySQL, PostgreSQL and Vertica

  • Engineering Effectiveness - our tools and services related to build, test and deployment systems.

  • Hadoop/Data Platform - our Hadoop clusters, data management services and all the ecosystems YARN, Scalding, Parquet, Hbase,...

  • M&A - help our acquired companies manage their infrastructure

  • Mesos/Aurora -  our compute platforms that all other Twitter runs on top of

  • Platform - our API/frontend services

  • Traffic Engineering - our traffic management systems

Your responsibilities include but are not limited to:

  • You will perform deep dives into both systemic and latent reliability issues; partner with software and systems engineers across the organization to produce and roll out fixes.

  • You will troubleshoot issues across the entire stack: hardware, software, application and network,

  • You will drive standardization efforts across multiple disciplines and services in conjunction with embedded SREs throughout the organization.

  • You will mentor SREs on standard methodology for everything from monitoring to troubleshooting complex code issues.

  • You will identify and drive opportunities to improve automation for the company; scope and create automation for deployment, management and visibility of our services.

  • You will participate in code reviews for projects primarily written in Java and Scala, built on open source libraries such as Finagle, and running on both physical and virtualized platforms.

  • You will represent the SRE organization in design reviews and operational readiness exercises for new and existing services.

Who You Are:

  • Solid understanding of systems and application design, including the operational trade-offs of various designs.

  • Practical knowledge of various aspects of  service design like messaging protocols & behavior, caching strategies and software design practices.

  • Practical, solid knowledge of shell scripting and at least one higher-level language (Python or Ruby preferred).

  • Demonstrable knowledge of TCP/IP, HTTP, web application security, and experience supporting multi-tier web application architectures.

  • Expert level understanding of Linux servers, specifically RHEL/CentOS.

  • Comfortable configuring DNS, DHCP, and LAN/WAN technologies.

  • Minimum 3+ years of handling services in a large scale environment.

  • Work well with and be able to influence a myriad of personalities at all levels.

  • Ability to prioritize tasks and work independently.

  • Be adaptable and able to focus on the simplest, most efficient & reliable solutions.

  • Track record of successful practical problem solving, excellent written and social communication, and documentation skills.

Desired

  • Practical experience in Java or Scala.

  • Ability to lead technical teams through design and implementation across an organization.

  • Experience with existing open source projects such as Scribe, ZooKeeper and Apache Mesos.

  • B.S. in computer science or similar field.

Engineering Hiring Process

Step 1

Once your application is received, a recruiter will reach out pending your qualifications are a match for the role.

Step 2

If your background is a match, you may have 1-2 technical phone interviews or be given the chance to provide a work sample depending on the role.

Step 3

If the phone interviews go well or your work sample is strong, the final step includes interviews with 5-6 people held onsite in our office.

Application

Personal Information

This field is required.
This field is required.
This field is required.
This field is required.
Required field. PDFs only; max file size is 1MB.
Required field. PDFs only; max file size is 1MB.

Twitter does not accept and unsolicited resumes from recruiting agencies and will not pay fees associated with any such resumes. Agencies, please do not send resumes to any Twitter location, employee, or email address.

Success
Thanks for applying!
Error
Submission failed. Please make sure all fields are correctly formatted.

Don't see the right fit?

Check out other opportunities at Twitter.