First is the huge diversity of scale. This is not just comparing inference workloads with training, but looking across the spectrum from federated learning and edge devices, up to the size of models with 100B+ parameters that require distribution just to hold their weights in memory. Second, is the difficulty in identifying abstractions. We have techniques such as pipeline parallelism and tensor parallelism, but the implementations can be coupled tightly with the models themselves, and optimized alongside them. Third, it is tempting to design fresh alternatives that can decouple distribution techniques from models; however, deploying these into an evolving ecosystem is itself a challenge.
Short Bio: Tim Harris is a Principal Architect at Microsoft where he currently works on distributed training of PyTorch models in the ONNX runtime. Prior to that he was with AWS and worked on large-scale storage performance and data analytics with Amazon S3. Further back, he led the Oracle Labs group in Cambridge, UK working on runtime systems for in-memory graph analytics, and the confluence of work on “big data” and ideas from high-performance computing. Before joining Oracle he was with Microsoft (2004–2012), and on the faculty of the University of Cambridge Computer Laboratory (2000–2004) where he led the department’s research on concurrent data structures and contributed to the Xen virtual machine monitor project.