Targeted CPU Scheduling for a serverless platform
My recent work on implementing custom cpu scheduling (using SchedExt framework) to improve the performance of a serverless platform (written in Rust).
SchedExt Framework
As of Sep 2024, SchedExt is a new framework which will become part of Linux 6.12 kernel. It introduces a new scheduling class at the same priority level as EEVDF which allows BPF programs to modify the scheduling decisions. It’s possible to write a BPF program which can register following callbacks (only few basic ones are mentioned).
* init/exit_task - a new task is created or an existing task is destroyed
* select_cpu - select a CPU for a task
* dispatch - a task needs to be dispatched to local DSQs of the SchedExt core
* init/exit_scheduler - a new scheduler is created or an existing scheduler is destroyed
By virtue of these callbacks, any arbitrary scheduling policy can be implemented.
Important gotos:
Serverless Control Plane
Proof of concept is developed on a research centric serverless control plane that is implemented in Rust. See project Ilúvatar for details about the control plane.
Metadata Driven Policy
The idea is to take advantage of tight coupling between Control Plane and the scheduling policy. In essence the policy is driven by the metadata of the functions thereby taking advantage of the extra information available at the higher layers.
Proof of Concept: A task size interval assignment policy that is driven by execution time of the functions while enforcing locality works better for short single threaded functions as opposed to long multi threaded functions. This POC divides the available cores into buckets and assigns function to each bucket based on the historical end to end time of the function. Metadata is shared between the control plane and the schedext scheduler using a pinned eBPF map.