This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
View analytic
Thursday, August 20 • 2:00pm - 2:40pm
Twitter’s Production Scale: Mesos and Aurora Operations - Joe Smith, Twitter

Sign up or log in to save this to your schedule and see who's attending!

Twitter has used Aurora and Mesos to scale from only tens of nodes to tens of thousands. During the process of scaling up, the configuration, deployment, and operational procedures for this system have evolved and improved significantly. This talk will offer an operations perspective on the management of a Mesos+Aurora cluster, and cover many of the cluster management best practices that have evolved at Twitter from real-world production experience. It will explore methods that decrease operational overhead, as well as examples of outages and incidents to illustrate various failure domains. Furthermore, the talk will highlight current and future safeguards in Aurora and Mesos to mitigate impact from these failures in the future.

avatar for Joe Smith

Joe Smith

SRE Tech Lead for Aurora and Mesos, Twitter
Joe Smith is a Senior Site Reliability Engineer at Twitter, and built the Aurora and Mesos cluster from tens of nodes to tens of thousands. As Mesos and Aurora SRE, he automated the build, deployment, management, repair, and maintenance of the production clusters. In addition, he spearheaded the migration of Twitter’s production services onto Aurora and Mesos from bare metal machines. He presented at SCaLE, the Reliability Meetup in... Read More →

Thursday August 20, 2015 2:00pm - 2:40pm
Grand Ballroom B

Attendees (100)