There are a lot of places which release a lock before calling
vcpu_sleep_nosync(), which then just grabs the lock again. This is
not only a waste of time, but leads to more code duplication (since
you have to copy-and-paste recipes rather than calling a unified
function), which in turn leads to an increased chance of bugs.
Introduce vcpu_sleep_nosync_locked(), which can be called if you
already hold the schedule lock.