Konstantinos Karanasos

Title: Scheduling and tuning an exabyte-scale data infrastructure at Microsoft

Abstract:

Microsoft’s internal big-data infrastructure is one of the largest in the world: over 300k machines run billions of tasks from over half a million daily jobs that process exabytes of data. Operating this infrastructure is a costly and complex endeavor, and efficiency is paramount.

In the first part of the talk, I will present Hydra, the resource manager that we use to schedule this massive workload. Over the last few years, Hydra has scheduled nearly one trillion tasks that manipulated close to a Zettabyte of production data. We built Hydra by leveraging, extending, and contributing our code to Apache Hadoop YARN. I will describe research, open-source, and production findings associated with building and deploying Hydra.

In the second part, I will discuss KEA, a multi-year effort for the data-driven tuning of our infrastructure. KEA leverages machine learning models to capture our clusters’ dynamic behavior. These models power automated optimization procedures for parameter tuning and inform our leadership in critical decisions, such as hardware/data center design and software investments. KEA is on track to save Microsoft tens of millions of dollars per year.

Bio: Konstantinos Karanasos is a Principal Scientist Manager at Microsoft’s Gray Systems Lab (GSL), Azure Data’s applied research group. He is the manager of the Bay Area branch of GSL and tech lead for several systems-for-ML efforts in the group. Konstantinos’ work at Microsoft previously focused on resource management for the company’s production analytics clusters. This work was deployed in over 300K machines across Microsoft and was key to enable the company to operate the world’s largest YARN clusters. He has also contributed big part of his work at Microsoft to open source projects: he is a committer and member of the Project Management Committee (PMC) of Apache Hadoop, and a contributor to ONNX Runtime. Before joining Microsoft, he was a postdoctoral researcher at IBM Almaden Research Center. Konstantinos holds a PhD from Inria and the University Paris-Sud, France, and a Diploma in Electrical and Computer Engineering from the National Technical University of Athens, Greece.