Benchmarks have shown that per-socket runqueues arrangement
behaves better (e.g., we achieve better load balancing)
than the current per-core default.
Here's an example (coming from
https://lists.xen.org/archives/html/xen-devel/2016-06/msg02287.html ):
|=======================================|
| XEN BUILD TIME, LOW LOAD, NO NOISE |
|---------------------------------------|
| runq=core runq=socket |
| 35.200 33.433 |
|---------------------------------------|------------------------------|
| XEN BUILD TIME, HIGH LOAD, NO NOISE | IPERF, HIGH LOAD, NO NOISE |
|---------------------------------------|------------------------------|
| runq=core runq=socket | runq=core runq=socket |
| 18.013 18.530 | 23.200 23.466 |
|---------------------------------------|------------------------------|
| XEN BUILD TIME, LOW LOAD, WITH NOISE |
|------------------------------------- |
| runq=core runq=socket |
| 45.866 39.493 |
|---------------------------------------|------------------------------|
| XEN BUILD TIME, HIGH LOAD, WITH NOISE | IPERF, HIGH LOAD, WITH NOISE |
|---------------------------------------|------------------------------|
| runq=core runq=socket | runq=core runq=socket |
| 36.840 29.080 | 19.967 21.000 |
|=======================================|==============================|
The only reason why we went for per-core, initially, was to
introduce some form of hyperthreading support. Now we have
hyperthreading support, independently from how runqueues
are organized (
9bb9c7388 "xen: credit2: implement true SMT
support"), and thus we can switch to per-socket.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>