When offlining a CPU, we currently free the perf_event used for
the hardlockup detector. This sometimes results in an allocation
failure when bringing the CPU back online. The hotplug code to
bring the CPU back online is run while gfp flags are masked by
pm_restrict_gfp_mask.
[10086.691854] [<
ffffffff810a4e80>] warn_alloc_failed+0x10f/0x123
[10086.691864] [<
ffffffff810a6560>] ? drain_local_pages+0x16/0x18
[10086.691872] [<
ffffffff810a75f8>] __alloc_pages_nodemask+0x697/0x752
[10086.691881] [<
ffffffff810a7738>] __get_free_pages+0x17/0x44
[10086.691891] [<
ffffffff810ccca0>] kmalloc_order_trace+0x2b/0x5b
[10086.691898] [<
ffffffff810cf095>] __kmalloc+0x37/0x151
[10086.691908] [<
ffffffff81011a0d>] ? kmalloc_node.isra.0.constprop.3+0xe/0x10
[10086.691917] [<
ffffffff81011a0d>] kmalloc_node.isra.0.constprop.3+0xe/0x10
[10086.691925] [<
ffffffff810120dc>] reserve_ds_buffers+0xaf/0x30e
[10086.691934] [<
ffffffff8100ec2c>] x86_pmu_event_init+0x2d2/0x31c
[10086.691941] [<
ffffffff8109e456>] perf_init_event+0x66/0xac
[10086.691948] [<
ffffffff8109e6fb>] perf_event_alloc+0x25f/0x38f
[10086.691957] [<
ffffffff81076488>] ? touch_nmi_watchdog+0x67/0x67
[10086.691965] [<
ffffffff8109eacd>] perf_event_create_kernel_counter+0x26/0xd9
[10086.691973] [<
ffffffff810762b0>] watchdog_enable+0x7e/0x1ef
[10086.691981] [<
ffffffff81076848>] cpu_callback+0x31/0x3f
[10086.691999] [<
ffffffff810769d4>] lockup_detector_bootcpu_resume+0x2e/0x32
[10086.692009] [<
ffffffff8105beab>] suspend_devices_and_enter+0x1a3/0x26e
[10086.692018] [<
ffffffff8105c075>] pm_suspend+0xff/0x1c2
[10086.692025] [<
ffffffff8105b3a1>] state_store+0x99/0xca
[10086.692034] [<
ffffffff811d6562>] kobj_attr_store+0xf/0x1b
[10086.692042] [<
ffffffff81123280>] sysfs_write_file+0xe9/0x121
[10086.692050] [<
ffffffff810d2fca>] vfs_write+0x98/0xda
[10086.692057] [<
ffffffff810d3199>] sys_write+0x43/0x73
[10086.692066] [<
ffffffff8146ac52>] system_call_fastpath+0x16/0x1b
The end result of the allocation failure is that the watchdog no
longer works sometimes after suspend/resume. The fix is to only
disable the event and not free it. This avoids the failed allocation.
BUG=chrome-os-partner:17522
TEST=Verify that detector still works after suspend/resume.
Change-Id: I78c90b13f718d660bd23884968be2f14c7c61860
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Reviewed-on: https://gerrit.chromium.org/gerrit/42657
Reviewed-by: Sameer Nanda <snanda@chromium.org>