Distributed Cluster Schedulers are becoming increasingly popular. They present a good abstraction for running workloads at a “warehouse-scale” on the public and private clouds by decoupling workload from compute, network and storage resources.
In this talk, we will talk about the operational challenges of running a Cluster Scheduler to serve highly available services across multiple geographies and in a heterogeneous runtime environment. We will go into details of the needs from a cluster scheduler with respect to managing multiple runtime/virtualization platforms, provide observability, running maintenance on hardware and software, etc.
Lastly, we will talk about an open source distributed cluster scheduler called Nomad, briefly describe it’s architecture and
present how Nomad solves the operational challenges and provides a highly scalable environment for running applications.
Diptanu is a Senior Engineer at HashiCorp, and works on large-scale distributed systems, cluster schedulers, service discovery and highly available and high throughput systems on the public cloud. He is a core committer to the Nomad cluster scheduler which has a parallel and distributed scheduler and support heterogeneous virtualized workloads.
Prior to HashiCorp, Diptanu worked in the Cloud Platform group at Netflix, where he worked on the core platform infrastructure that powered the Microservices infrastructure of Netflix. He worked on Apache Mesos and wrote a cluster scheduler for running clusters of Docker containers on AWS, and also contributed to various reactive IPC and service discovery infrastructure projects.