Lyude Paul [Fri, 13 Jul 2018 17:06:33 +0000 (13:06 -0400)]
drm/nouveau: Avoid looping through fake MST connectors
When MST and atomic were introduced to nouveau, another structure that
could contain a drm_connector embedded within it was introduced; struct
nv50_mstc. This meant that we no longer would be able to simply loop
through our connector list and assume that nouveau_connector() would
return a proper pointer for each connector, since the assertion that
all connectors coming from nouveau have a full nouveau_connector struct
became invalid.
Unfortunately, none of the actual code that looped through connectors
ever got updated, which means that we've been causing invalid memory
accesses for quite a while now.
An example that was caught by KASAN:
[ 201.038698] ==================================================================
[ 201.038792] BUG: KASAN: slab-out-of-bounds in nvif_notify_get+0x190/0x1a0 [nouveau]
[ 201.038797] Read of size 4 at addr ffff88076738c650 by task kworker/0:3/718
[ 201.038800]
[ 201.038822] CPU: 0 PID: 718 Comm: kworker/0:3 Tainted: G O 4.18.0-rc4Lyude-Test+ #1
[ 201.038825] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET78W (1.51 ) 05/18/2018
[ 201.038882] Workqueue: events nouveau_display_hpd_work [nouveau]
[ 201.038887] Call Trace:
[ 201.038894] dump_stack+0xa4/0xfd
[ 201.038900] print_address_description+0x71/0x239
[ 201.038929] ? nvif_notify_get+0x190/0x1a0 [nouveau]
[ 201.038935] kasan_report.cold.6+0x242/0x2fe
[ 201.038942] __asan_report_load4_noabort+0x19/0x20
[ 201.038970] nvif_notify_get+0x190/0x1a0 [nouveau]
[ 201.038998] ? nvif_notify_put+0x1f0/0x1f0 [nouveau]
[ 201.039003] ? kmsg_dump_rewind_nolock+0xe4/0xe4
[ 201.039049] nouveau_display_init.cold.12+0x34/0x39 [nouveau]
[ 201.039089] ? nouveau_user_framebuffer_create+0x120/0x120 [nouveau]
[ 201.039133] nouveau_display_resume+0x5c0/0x810 [nouveau]
[ 201.039173] ? nvkm_client_ioctl+0x20/0x20 [nouveau]
[ 201.039215] nouveau_do_resume+0x19f/0x570 [nouveau]
[ 201.039256] nouveau_pmops_runtime_resume+0xd8/0x2a0 [nouveau]
[ 201.039264] pci_pm_runtime_resume+0x130/0x250
[ 201.039269] ? pci_restore_standard_config+0x70/0x70
[ 201.039275] __rpm_callback+0x1f2/0x5d0
[ 201.039279] ? rpm_resume+0x560/0x18a0
[ 201.039283] ? pci_restore_standard_config+0x70/0x70
[ 201.039287] ? pci_restore_standard_config+0x70/0x70
[ 201.039291] ? pci_restore_standard_config+0x70/0x70
[ 201.039296] rpm_callback+0x175/0x210
[ 201.039300] ? pci_restore_standard_config+0x70/0x70
[ 201.039305] rpm_resume+0xcc3/0x18a0
[ 201.039312] ? rpm_callback+0x210/0x210
[ 201.039317] ? __pm_runtime_resume+0x9e/0x100
[ 201.039322] ? kasan_check_write+0x14/0x20
[ 201.039326] ? do_raw_spin_lock+0xc2/0x1c0
[ 201.039333] __pm_runtime_resume+0xac/0x100
[ 201.039374] nouveau_display_hpd_work+0x67/0x1f0 [nouveau]
[ 201.039380] process_one_work+0x7a0/0x14d0
[ 201.039388] ? cancel_delayed_work_sync+0x20/0x20
[ 201.039392] ? lock_acquire+0x113/0x310
[ 201.039398] ? kasan_check_write+0x14/0x20
[ 201.039402] ? do_raw_spin_lock+0xc2/0x1c0
[ 201.039409] worker_thread+0x86/0xb50
[ 201.039418] kthread+0x2e9/0x3a0
[ 201.039422] ? process_one_work+0x14d0/0x14d0
[ 201.039426] ? kthread_create_worker_on_cpu+0xc0/0xc0
[ 201.039431] ret_from_fork+0x3a/0x50
[ 201.039441]
[ 201.039444] Allocated by task 79:
[ 201.039449] save_stack+0x43/0xd0
[ 201.039452] kasan_kmalloc+0xc4/0xe0
[ 201.039456] kmem_cache_alloc_trace+0x10a/0x260
[ 201.039494] nv50_mstm_add_connector+0x9a/0x340 [nouveau]
[ 201.039504] drm_dp_add_port+0xff5/0x1fc0 [drm_kms_helper]
[ 201.039511] drm_dp_send_link_address+0x4a7/0x740 [drm_kms_helper]
[ 201.039518] drm_dp_check_and_send_link_address+0x1a7/0x210 [drm_kms_helper]
[ 201.039525] drm_dp_mst_link_probe_work+0x71/0xb0 [drm_kms_helper]
[ 201.039529] process_one_work+0x7a0/0x14d0
[ 201.039533] worker_thread+0x86/0xb50
[ 201.039537] kthread+0x2e9/0x3a0
[ 201.039541] ret_from_fork+0x3a/0x50
[ 201.039543]
[ 201.039546] Freed by task 0:
[ 201.039549] (stack is not available)
[ 201.039551]
[ 201.039555] The buggy address belongs to the object at ffff88076738c1a8
which belongs to the cache kmalloc-2048 of size 2048
[ 201.039559] The buggy address is located 1192 bytes inside of
2048-byte region [ffff88076738c1a8, ffff88076738c9a8)
[ 201.039563] The buggy address belongs to the page:
[ 201.039567] page:ffffea001d9ce200 count:1 mapcount:0 mapping:ffff88084000d0c0 index:0x0 compound_mapcount: 0
[ 201.039573] flags: 0x8000000000008100(slab|head)
[ 201.039578] raw: 8000000000008100ffffea001da3be08ffffea001da25a08ffff88084000d0c0
[ 201.039582] raw: 000000000000000000000000000d000d00000001ffffffff0000000000000000
[ 201.039585] page dumped because: kasan: bad access detected
[ 201.039588]
[ 201.039591] Memory state around the buggy address:
[ 201.039594] ffff88076738c500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 201.039598] ffff88076738c580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 201.039601] >ffff88076738c600: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc
[ 201.039604] ^
[ 201.039607] ffff88076738c680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 201.039611] ffff88076738c700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 201.039613] ==================================================================
Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org Cc: Karol Herbst <karolherbst@gmail.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Lyude Paul [Fri, 13 Jul 2018 17:06:32 +0000 (13:06 -0400)]
drm/nouveau: Use drm_connector_list_iter_* for iterating connectors
Every codepath in nouveau that loops through the connector list
currently does so using the old method, which is prone to race
conditions from MST connectors being created and destroyed. This has
been causing a multitude of problems, including memory corruption from
trying to access connectors that have already been freed!
Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org Cc: Karol Herbst <karolherbst@gmail.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Dan Carpenter [Tue, 3 Jul 2018 12:30:56 +0000 (15:30 +0300)]
drm/nouveau/gem: off by one bugs in nouveau_gem_pushbuf_reloc_apply()
The bo array has req->nr_buffers elements so the > should be >= so we
don't read beyond the end of the array.
Fixes: a1606a9596e5 ("drm/nouveau: new gem pushbuf interface, bump to 0.0.16") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Tue, 3 Jul 2018 00:52:34 +0000 (10:52 +1000)]
drm/nouveau/kms/nv50-: ensure window updates are submitted when flushing mst disables
It was possible for this to be skipped when shutting down MST streams, and
leaving the core channel interlocked with a wndw channel update that never
happens - leading to a hung display.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Tested-By: Lyude Paul <lyude@redhat.com>
Ben Skeggs [Mon, 18 Jun 2018 08:06:13 +0000 (18:06 +1000)]
drm/nouveau/kms/nv50-: cursors always use core channel vram ctxdma
Ctxdmas for cursors from all heads are setup in the core channel, and due
to us tracking allocated handles per-window, we were failing with -EEXIST
on multiple-head setups trying to allocate duplicate handles.
The cursor code is hardcoded to use the core channel vram ctxdma already,
so just skip ctxdma allocation for cursor fbs to fix the issue.
Fixes: 5bca1621c07 ("drm/nouveau/kms/nv50-: move fb ctxdma tracking into windows") Reported-by: Adam Borowski <kilobyte@angband.pl> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Arushi Singhal [Tue, 8 May 2018 13:13:09 +0000 (23:13 +1000)]
drm/nouveau/clk: Use list_for_each_entry_from_reverse
It's better to use "list_for_each_entry_from_reverse" for iterating list
than "for loop" as it makes the code more clear to read.
This patch replace "for loop" with "list_for_each_entry_from_reverse"
and "start" variable with "cstate" which helps in refactoring
the code and also "cstate" variable is more commonly used in the other
functions.
changes in v2:
"start" variable is removed, before "cstate" variable was removed
but "cstate" is more common so preferred "cstate" over "start".
Signed-off-by: Arushi Singhal <arushisinghal19971997@gmail.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ilia Mirkin [Sun, 22 Apr 2018 21:47:12 +0000 (17:47 -0400)]
drm/nouveau: fix temp/pwm visibility, skip hwmon when no sensors exist
A NV34 GPU was seeing temp and pwm entries in hwmon, which would error
out when read. These should not have been visible, but also the whole
hwmon object should just not have been registered in the first place.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
drm/nouveau: fix nouveau_dsm_get_client_id()'s return type
The method struct vga_switcheroo_handler::get_client_id() is defined
as returning an 'enum vga_switcheroo_client_id' but the implementation
in this driver, nouveau_dsm_get_client_id(), returns an 'int'.
Fix this by returning 'enum vga_switcheroo_client_id' in this driver too.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The method struct drm_connector_helper_funcs::mode_valid is defined
as returning an 'enum drm_mode_status' but the driver implementation
for this method uses an 'int' for it.
Fix this by using 'enum drm_mode_status' in the driver too.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Tue, 8 May 2018 10:39:47 +0000 (20:39 +1000)]
drm/nouveau/kms/nv50-: store window visibility in state
Window visibility is going to become a little more complicated with the
upcoming LUT changes, so store the calculated value to avoid needing to
recalculate the armed state again.
Ben Skeggs [Tue, 8 May 2018 10:39:47 +0000 (20:39 +1000)]
drm/nouveau/kms/nv50-: simplify tracking of channel interlocks
Instead of windows returning their core channel interlock mask if they
know core has been modified, it's recorded unconditionally and used if
required when update methods are emitted.
Ben Skeggs [Tue, 8 May 2018 10:39:47 +0000 (20:39 +1000)]
drm/nouveau: remove fence wait code from deferred client work handler
Fences attached to deferred client work items now originate from channels
belonging to the client, meaning we can be certain they've been signalled
before we destroy a client.
This closes a race that could happen if the dma_fence_wait_timeout() call
didn't succeed. When the fence was later signalled, a use-after-free was
possible.
Ben Skeggs [Tue, 8 May 2018 10:39:47 +0000 (20:39 +1000)]
drm/nouveau/gem: tie deferred unmapping of buffers to VMA fence completion
As VMAs are per-client, unlike buffers, this allows us to avoid referencing
foreign fences (those that belong to another client/driver) from the client
deferred work handler, and prevent some not-fun race conditions that can be
triggered when a fence stalls.
The algorithm for GM200 and newer matches RM for all the boards I have, but
I don't have enough data to try and figure something out for earlier boards,
so these will still write zeroes to the table as we did before.
The code in NVGPU isn't helpful here, it appears to handle specific cases.
I haven't yet been able to find a fully programatic way of calculating the
same mapping as NVIDIA for GF100-GF119, so the algorithm partially depends
on data tables for specific configurations.
I couldn't find traces for every possibility, so the algorithm will switch
to a mapping similar to what GK104-GM10x use if it encounters one. We did
the wrong thing before anyway, so shouldn't matter too much.
The algorithm used in the GK104 implementation was ported from NVGPU.