Senior Site Reliability Engineer (SRE) - Database Infrastructure
Site Reliability Engineers and Database Engineers work on improving the availability, scalability, performance and reliability of Twitter’s production services. Come join us.
Who We Are:
As a member of the organization you will be dedicated to improving the reliability of our end-to-end database infrastructure. Your work will integrate directly with Twitter's products.
Our core infrastructure receives hundreds of millions of tweets per day and serves tens of billions of API requests. We also serve over 2+ billion search queries per day, render hundreds of millions of ad impressions, and process hundreds of terabytes of log and interaction data daily.
You will dive deep into gnarly operational issues; from the software, systems, automation, and process perspectives. You will understand the challenges around integrating disparate infrastructures into a new facility, processes and procedures.
You will work with open-source technologies and the SRE and Database community.
You will actively participate in the vision to move away from high operational cost tasks such as break/fix, cluster migrations, new service buildouts, abuse, etc. You will contribute to services that can shrink and expand based on demand, self heal, automatically rollout, etc.
Your responsibilities include but are not limited to:
- You will use your expertise to tune and push our databases beyond their normal limit
- You will work closely with engineering teams to design, build, and maintain systems and help them decide on database to use, schema design and query tuning
- You will troubleshoot issues across the entire stack: hardware, software, application and network
- You will mentor other SREs and DBE’s on standard methodology for everything from monitoring to troubleshooting complex code and database issues.
- You will identify and drive opportunities to improve automation for the company; scope and create automation for deployment, management and visibility of our services.
- You will take part in 24x7 on-call rotation and 8x5 customer support rotation
- Represent the SRE organization in design reviews and operational readiness exercises for new and existing services.
- Participate in on-call rotation and periodic conference calls with other specialists from other time zones including but not limited to our headquarters in San Francisco, CA USA.
Who You Are:
- You have solid experience with MySQL databases on-prem or in the cloud.
- You have a solid understanding of systems and application design, including the operational trade-offs of various designs.
- You have practical, solid knowledge of shell scripting and at least one higher-level language (Python or Go)
- You have an expert understanding of Linux systems, services, optimization, storage subsystems, and file systems
- You have a minimum 5 years experience handling services in a large scale environment
- You work well with and be able to influence a myriad of personalities at all levels.
- You are able to prioritize tasks and work independently.
- You are adaptable and able to focus on the simplest, most efficient & reliable solutions.
- You have a track record of successful practical problem solving, excellent written and social communication, and documentation skills.
- B.S. in computer science or similar field.
- Ability to lead technical teams through design and implementation across an organization.
- Experience with open source project like Vitess, Orchestrator, Percona, Airflow and other database tools.
Engineering Hiring Process
Once your application is received, a recruiter will reach out pending your qualifications are a match for the role.
If your background is a match, you may have 1-2 technical phone interviews or be given the chance to provide a work sample depending on the role.
If the phone interviews go well or your work sample is strong, the final step includes interviews with 5-6 people held onsite in our office.