Site Reliability Engineer-Revenue Engineering

The Position

Who We Are

Twitter’s Revenue organization operates services at extensive scale. We are looking for a Site Reliability Engineer (SRE) to join our Revenue Engineering team!

Our Revenue SRE's provide a healthy user experience while serving highly relevant and personalized ads. We also help advertisers translate their marketing strategy to an audience on Twitter while protecting the safety of their brand.

At Twitter, we use state of the art open-source and proprietary technologies, and operate some of the world’s largest and most sophisticated distributed systems. We embed deeply with development teams, sharing on-call with a focus on up-leveling services and increasing automation.

(More information: Resilient Ad Serving at Twitter Scale)

How You’ll Work

  • Work closely with Software Engineering (SWE) counterparts and take an active role as a co-owner of production services to ensure services are built, maintained, and operated in a reliable and scalable way. You will be part of the successful delivery of new features and services, as well as the day-to-day successful operation of existing services.
  • Deep involvement with application services throughout the Software Development Lifecycle, serving as the local SRE domain specialist and point of contact. Through this involvement you will gain a deep understanding of the technology stack, and be empowered to meaningfully contribute to design documents, code reviews, and other technical discussions.
  • Collaborate with the software engineering teams to drive operational health improvements, root cause analysis, postmortem discussions and their associated remediations that serve to improve reliability and sublinearly scale operations.
  • Partner with others to use tools, processes, and techniques to sublinearly scale operations and reduce business risk, in areas that include: infrastructure & configuration management, deploys, capacity modeling & planning, and incident mitigation.
  • Identify common patterns in challenges with operating services in production, collaborate with other SRE teams to design and implement reusable solutions and/or other multi-functional work that drives down the complexity, difficulty, costs, and risks of operating the business.

What You’ll Do

  • Actively participate and contribute to code reviews and technical design documents, with an eye toward identifying performance and reliability bottlenecks.
  • Work with SWE counterparts to identify and mitigate production issues; validate, document and exercise failover/disaster recovery plans and graceful degradation mechanisms policies and standard methodologies
  • Capacity planning and analysis, and infrastructure change management (including tuning, reshaping, resizing, and migrating infrastructure), for services and their immediate downstreams.
  • Join with SWE service owners on in-progress large engineering projects, including migrating to the latest Twitter technologies and adopting related standard methodologies.
  • Productionalize new services and features, as well as improve production landscape for existing services, providing SRE expertise and implementing standard methodologies in the areas of CI/CD, dashboard integrity improvements, identifying and evaluating for the right set of alerts, SLOs and error budgets to use for services on an ongoing basis.
  • Attend team meetings, standups, and on-call handoffs.
  • Participate in team on-call rotation.

Qualifications

  • 3+years of experience managing, diagnosing, and debugging large-scale distributed systems in production.
  • Practical knowledge of at least one higher-level language (Python, Go, Ruby, or similar).
  • Thorough understanding of Linux servers, specifically RHEL/CentOS.
  • Detailed understanding of tools, methodologies, and analysis techniques in a distributed systems environment.
  • Experience developing infrastructure, configuration, and deployment scripting and automation for large scale / high complexity services in a microservice environment.
  • Experience dealing with large data sets that inform your knowledge around building robust data pipelines and architectures, and tuning java applications.
  • Experience using containerization software such as: Mesos, Kubernetes, Docker or LXC
  • Experience with Lucene based search systems and scatter gather query patterns is desirable.
  • B.S. in Computer Science or equivalent experience.

Company Description

Twitter is what’s happening and what people are talking about right now. For us, life's not about a job, it's about purpose. We believe real change starts with conversation. Here, your voice matters. Come as you are and together we'll do what's right (not what's easy) to serve the public conversation.

Additional Information

A few other things we value:

  • Challenge - We solve some of the industry’s hardest problems. Come to be challenged, learn, and thrive as an engineer.
  • Diversity - Diversity makes us a better organization and team. We value diverse backgrounds, ideas, and experiences.
  • Work, Life, Balance - We work hard, but we believe with hard work should come balance.

We are committed to an inclusive and diverse Twitter. Twitter is an equal opportunity employer. We do not discriminate based on race, color, ethnicity, ancestry, national origin, religion, sex, gender, gender identity, gender expression, sexual orientation, age, disability, veteran status, genetic information, marital status or any legally protected status.

Team

Infrastructure Engineering, Software Engineering

Location

Remote Canada, Toronto

 

Application

U.S. Equal Employment Opportunity information (Completion is voluntary)
Non U.S. Equal Employment Opportunity information (Completion is voluntary)
Privacy and data