One area where Xen deviates from Linux is that test_bit() forces a volatile
read. This leads to poor code generation, because the optimiser cannot merge
bit operations on the same word.
Drop the use of test_bit(), and write the expressions in regular C. This
removes the include of bitops.h (which is a frequent source of header
tangles), and it offers the optimiser far more flexibility.
Bloat-o-meter reports a net change of:
add/remove: 0/0 grow/shrink: 21/87 up/down: 641/-2751 (-2110)
with half of that in x86_emulate() alone. vmx_ctxt_switch_to() seems to be
the fastpath with the greatest delta at -24, where the optimiser has
successfully removed the branch hidden in cpu_has_msr_tsc_aux.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
#define X86_FEATURE_ALWAYS X86_FEATURE_LM
#ifndef __ASSEMBLY__
-#include <xen/bitops.h>
struct cpuinfo_x86 {
unsigned char x86; /* CPU family */
extern struct cpuinfo_x86 boot_cpu_data;
-#define cpu_has(c, bit) test_bit(bit, (c)->x86_capability)
-#define boot_cpu_has(bit) test_bit(bit, boot_cpu_data.x86_capability)
+static inline bool cpu_has(const struct cpuinfo_x86 *info, unsigned int feat)
+{
+ return info->x86_capability[cpufeat_word(feat)] & cpufeat_mask(feat);
+}
+
+static inline bool boot_cpu_has(unsigned int feat)
+{
+ return cpu_has(&boot_cpu_data, feat);
+}
#define CPUID_PM_LEAF 6
#define CPUID6_ECX_APERFMPERF_CAPABILITY 0x1