Software Engineer - Data Infrastructure
Who We Are:
Twitter’s globally distributed, real-time communications network generates trillions of events and petabytes of data per day. As engineers on the Data Infrastructure team, our mission is to build the fastest, most reliable, and largest-scale data processing technologies in the world - able to cope with ever-increasing volumes of data in real time - and then apply them to the company’s most critical and fundamental data problems.
What You’ll Do:
As a member of the team, you will build systems to manage hundreds of petabytes of data and process tens of millions of events per second in real time. The services you build will integrate directly with Twitter’s products, opening the door to new and cutting-edge features. You will work with open-source technologies such as Hadoop, Scalding, Heron, and Presto and be an active member of the open-source community. You will empower dozens of engineering teams, hundreds of co-workers, and millions of users to dream of new insights and new possibilities.
Data Infrastructure is hiring for the following teams:
Hadoop Infrastructure - one of the industry’s largest-scale deployments, with over 36,000 total nodes, 700 internal users, 500 petabytes of storage, and a track record of active contributions to the Apache Hadoop code base
Data I/O - technologies to ingest, archive, and replicate source data at scales of trillions of messages per day
Real-Time Compute Infrastructure - cutting-edge streaming and interactive compute technologies, including Presto and Twitter Heron
Core Data Libraries - open source abstraction libraries, such as Scalding, Summingbird, and Parquet, that power almost all of Twitter’s batch and streaming data applications
Data Pipeline - tools and services that simplify data discovery, data management, and job scheduling for engineers and data scientists
Help us solve some of our biggest challenges!
Scale Hadoop clusters beyond 10,000 nodes
Integrate Scalding with next-generation frameworks such as Spark and Tez
Optimize our batch and real-time compute stacks to improve efficiency and reliability
Understand how Twitter’s data is used and “what it all means”
Work with hardware, network, and SiteOps teams to design next-generation storage and compute platforms
Who You Are:
You want to be part of a community of the most talented, forward-thinking engineers in the industry. You want to optimize existing real-time and batch applications and save the company millions of dollars. You want to learn, work with, and contribute to cutting-edge open-source technologies. The ideal candidate has experience with and/or a history of contributions to Hadoop, Spark, Hive, Scalding, Parquet, or similar technologies. You are a strong Java, Scala, or C++ developer. You have experience in distributed systems, database internals, Linux and networking fundamentals, or performance analysis.
BS, MS, or PhD in computer science or a related field, or equivalent work experience
We are committed to an inclusive and diverse Twitter. Twitter is an equal opportunity employer. We do not discriminate based on race, ethnicity, color, ancestry, national origin, religion, sex, sexual orientation, gender identity, age, disability, veteran status, genetic information, marital status or any other legally protected status.
Engineering Hiring Process
Once your application is received, a recruiter will reach out pending your qualifications are a match for the role.
If your background is a match, you may have 1-2 technical phone interviews or be given the chance to provide a work sample depending on the role.
If the phone interviews go well or your work sample is strong, the final step includes interviews with 5-6 people held onsite in our office.
We're the People Team at Twitter. We Tweet about who we're hiring, what we're doing, and why you should work at Twitter! #LoveWhereYouWork
We're your one stop shop for anything University related. That means campus outreach, student advice/tips, & of course, our University Recruiting efforts!