2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

208 Commits

Author SHA1 Message Date
Maíra Canal
497d370a64
drm/v3d: Fix out-of-bounds read in v3d_csd_job_run()
When enabling UBSAN on Raspberry Pi 5, we get the following warning:

[  387.894977] UBSAN: array-index-out-of-bounds in drivers/gpu/drm/v3d/v3d_sched.c:320:3
[  387.903868] index 7 is out of range for type '__u32 [7]'
[  387.909692] CPU: 0 PID: 1207 Comm: kworker/u16:2 Tainted: G        WC         6.10.3-v8-16k-numa #151
[  387.919166] Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT)
[  387.925961] Workqueue: v3d_csd drm_sched_run_job_work [gpu_sched]
[  387.932525] Call trace:
[  387.935296]  dump_backtrace+0x170/0x1b8
[  387.939403]  show_stack+0x20/0x38
[  387.942907]  dump_stack_lvl+0x90/0xd0
[  387.946785]  dump_stack+0x18/0x28
[  387.950301]  __ubsan_handle_out_of_bounds+0x98/0xd0
[  387.955383]  v3d_csd_job_run+0x3a8/0x438 [v3d]
[  387.960707]  drm_sched_run_job_work+0x520/0x6d0 [gpu_sched]
[  387.966862]  process_one_work+0x62c/0xb48
[  387.971296]  worker_thread+0x468/0x5b0
[  387.975317]  kthread+0x1c4/0x1e0
[  387.978818]  ret_from_fork+0x10/0x20
[  387.983014] ---[ end trace ]---

This happens because the UAPI provides only seven configuration
registers and we are reading the eighth position of this u32 array.

Therefore, fix the out-of-bounds read in `v3d_csd_job_run()` by
accessing only seven positions on the '__u32 [7]' array. The eighth
register exists indeed on V3D 7.1, but it isn't currently used. That
being so, let's guarantee that it remains unused and add a note that it
could be set in a future patch.

Fixes: 0ad5bc1ce4 ("drm/v3d: fix up register addresses for V3D 7.x")
Reported-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240809152001.668314-1-mcanal@igalia.com
2024-08-12 11:14:21 -03:00
Maxime Ripard
a1ff5a7d78
Merge drm/drm-fixes into drm-misc-fixes
Let's start the new drm-misc-fixes cycle by bringing in 6.11-rc1.

Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-07-30 09:09:23 +02:00
Tvrtko Ursulin
4ecc24a84d drm/v3d: Validate passed in drm syncobj handles in the performance extension
If userspace provides an unknown or invalid handle anywhere in the handle
array the rest of the driver will not handle that well.

Fix it by checking handle was looked up successfully or otherwise fail the
extension by jumping into the existing unwind.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Fixes: bae7cb5d68 ("drm/v3d: Create a CPU job extension for the reset performance query job")
Cc: Maíra Canal <mcanal@igalia.com>
Cc: Iago Toral Quiroga <itoral@igalia.com>
Cc: stable@vger.kernel.org # v6.8+
Reviewed-by: Maíra Canal <mcanal@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240711135340.84617-6-tursulin@igalia.com
(cherry picked from commit a546b7e4d7)
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2024-07-18 15:50:08 +02:00
Tvrtko Ursulin
023d22e8bb drm/v3d: Validate passed in drm syncobj handles in the timestamp extension
If userspace provides an unknown or invalid handle anywhere in the handle
array the rest of the driver will not handle that well.

Fix it by checking handle was looked up successfully or otherwise fail the
extension by jumping into the existing unwind.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Fixes: 9ba0ff3e08 ("drm/v3d: Create a CPU job extension for the timestamp query job")
Cc: Maíra Canal <mcanal@igalia.com>
Cc: Iago Toral Quiroga <itoral@igalia.com>
Cc: stable@vger.kernel.org # v6.8+
Reviewed-by: Maíra Canal <mcanal@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240711135340.84617-5-tursulin@igalia.com
(cherry picked from commit 8d1276d1b8)
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2024-07-18 15:49:55 +02:00
Tvrtko Ursulin
32df4abc44 drm/v3d: Fix potential memory leak in the performance extension
If fetching of userspace memory fails during the main loop, all drm sync
objs looked up until that point will be leaked because of the missing
drm_syncobj_put.

Fix it by exporting and using a common cleanup helper.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Fixes: bae7cb5d68 ("drm/v3d: Create a CPU job extension for the reset performance query job")
Cc: Maíra Canal <mcanal@igalia.com>
Cc: Iago Toral Quiroga <itoral@igalia.com>
Cc: stable@vger.kernel.org # v6.8+
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240711135340.84617-4-tursulin@igalia.com
(cherry picked from commit 484de39fa5)
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2024-07-18 15:49:28 +02:00
Tvrtko Ursulin
0e50fcc20b drm/v3d: Fix potential memory leak in the timestamp extension
If fetching of userspace memory fails during the main loop, all drm sync
objs looked up until that point will be leaked because of the missing
drm_syncobj_put.

Fix it by exporting and using a common cleanup helper.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Fixes: 9ba0ff3e08 ("drm/v3d: Create a CPU job extension for the timestamp query job")
Cc: Maíra Canal <mcanal@igalia.com>
Cc: Iago Toral Quiroga <itoral@igalia.com>
Cc: stable@vger.kernel.org # v6.8+
Reviewed-by: Maíra Canal <mcanal@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240711135340.84617-3-tursulin@igalia.com
(cherry picked from commit 753ce4fea6)
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2024-07-18 15:49:08 +02:00
Tvrtko Ursulin
6ce9efd12a drm/v3d: Prevent out of bounds access in performance query extensions
Check that the number of perfmons userspace is passing in the copy and
reset extensions is not greater than the internal kernel storage where
the ids will be copied into.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Fixes: bae7cb5d68 ("drm/v3d: Create a CPU job extension for the reset performance query job")
Cc: Maíra Canal <mcanal@igalia.com>
Cc: Iago Toral Quiroga <itoral@igalia.com>
Cc: stable@vger.kernel.org # v6.8+
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Maíra Canal <mcanal@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240711135340.84617-2-tursulin@igalia.com
(cherry picked from commit f32b5128d2)
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2024-07-18 15:48:37 +02:00
Maíra Canal
1fe1c66274
drm/v3d: Fix Indirect Dispatch configuration for V3D 7.1.6 and later
`args->cfg[4]` is configured in Indirect Dispatch using the number of
batches. Currently, for all V3D tech versions, `args->cfg[4]` equals the
number of batches subtracted by 1. But, for V3D 7.1.6 and later, we must not
subtract 1 from the number of batches.

Implement the fix by checking the V3D tech version and revision.

Fixes several `dEQP-VK.synchronization*` CTS tests related to Indirect Dispatch.

Fixes: 18b8413b25 ("drm/v3d: Create a CPU job extension for a indirect CSD job")
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240714145243.1223131-2-mcanal@igalia.com
2024-07-15 12:49:52 -03:00
Maíra Canal
7c78fdbace
drm/v3d: Add V3D tech revision to the device information
The V3D tech revision can be a useful information when configuring
jobs. Therefore, expose it in the `struct v3d_dev` with the V3D tech
version.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240714145243.1223131-1-mcanal@igalia.com
2024-07-15 12:45:59 -03:00
Tvrtko Ursulin
9c3951ec27
drm/v3d: Fix perfmon build error/warning
Move static const array into the source file to fix the "defined but not
used" errors.

The fix is perhaps not the prettiest due hand crafting the array sizes
in v3d_performance_counters.h, but I did add some build time asserts to
validate the counts look sensible, so hopefully it is good enough for a
quick fix.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Fixes: 3cbcbe016c ("drm/v3d: Add Performance Counters descriptions for V3D 4.2 and 7.1")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202405211137.hueFkLKG-lkp@intel.com/Cc: Maíra Canal <mcanal@igalia.com>
Cc: Iago Toral Quiroga <itoral@igalia.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Maxime Ripard <mripard@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240604160210.24073-1-tursulin@igalia.com
2024-06-05 10:44:51 +02:00
Maxime Ripard
375c4d1583
Merge drm/drm-next into drm-misc-next
Let's start the new release cycle.

Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-05-27 11:08:31 +02:00
Maíra Canal
f5b798bdc9
drm/v3d: Use V3D_MAX_COUNTERS instead of V3D_PERFCNT_NUM
V3D_PERFCNT_NUM represents the maximum number of performance counters
for V3D 4.2, but not for V3D 7.1. This means that, if we use
V3D_PERFCNT_NUM, we might go out-of-bounds on V3D 7.1.

Therefore, use the number of performance counters on V3D 7.1 as the
maximum number of counters. This will allow us to create arrays on the
stack with reasonable size. Note that userspace must use the value
provided by DRM_V3D_PARAM_MAX_PERF_COUNTERS.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240512222655.2792754-6-mcanal@igalia.com
2024-05-20 16:38:03 -03:00
Maíra Canal
f33fe58298
drm/v3d: Create new IOCTL to expose performance counters information
Userspace usually needs some information about the performance counters
available. Although we could replicate this information in the kernel
and user-space, let's use the kernel as the "single source of truth" to
avoid issues in the future (e.g. list of performance counters is updated
in user-space, but not in the kernel, generating invalid requests).

Therefore, create a new IOCTL to expose the performance counters
information, that is name, category, and description.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240512222655.2792754-5-mcanal@igalia.com
2024-05-20 16:38:02 -03:00
Maíra Canal
36b75080e6
drm/v3d: Create a new V3D parameter for the maximum number of perfcnt
The maximum number of performance counters can change from version to
version and it's important for userspace to know this value, as it needs
to use the counters for performance queries. Therefore, expose the
maximum number of performance counters to userspace as a parameter.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240512222655.2792754-4-mcanal@igalia.com
2024-05-20 16:38:01 -03:00
Maíra Canal
c606043ddb
drm/v3d: Different V3D versions can have different number of perfcnt
Currently, even though V3D 7.1 has 93 performance counters, it is not
possible to create counters bigger than 87, as
`v3d_perfmon_create_ioctl()` understands that counters bigger than 87
are invalid.

Therefore, create a device variable to expose the maximum
number of counters for a given V3D version and make
`v3d_perfmon_create_ioctl()` check this variable.

This commit fixes CTS failures in the performance queries tests
`dEQP-VK.query_pool.performance_query.*` [1]

Link: ea1f09a5f2 [1]
Fixes: 6fd9487147 ("drm/v3d: add brcm,2712-v3d as a compatible V3D device")
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240512222655.2792754-3-mcanal@igalia.com
2024-05-20 16:38:00 -03:00
Maíra Canal
3cbcbe016c
drm/v3d: Add Performance Counters descriptions for V3D 4.2 and 7.1
Add name, category and description for each one of the 93 performance
counters available on V3D.

Note that V3D 4.2 has 87 performance counters, while V3D 7.1 has 93.
Therefore, there are two performance counters arrays. The index of the
performance counter for each V3D version is represented by its position
on the array.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240512222655.2792754-2-mcanal@igalia.com
2024-05-20 16:37:59 -03:00
Linus Torvalds
61307b7be4 The usual shower of singleton fixes and minor series all over MM,
documented (hopefully adequately) in the respective changelogs.  Notable
 series include:
 
 - Lucas Stach has provided some page-mapping
   cleanup/consolidation/maintainability work in the series "mm/treewide:
   Remove pXd_huge() API".
 
 - In the series "Allow migrate on protnone reference with
   MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
   MPOL_PREFERRED_MANY mode, yielding almost doubled performance in one
   test.
 
 - In their series "Memory allocation profiling" Kent Overstreet and
   Suren Baghdasaryan have contributed a means of determining (via
   /proc/allocinfo) whereabouts in the kernel memory is being allocated:
   number of calls and amount of memory.
 
 - Matthew Wilcox has provided the series "Various significant MM
   patches" which does a number of rather unrelated things, but in largely
   similar code sites.
 
 - In his series "mm: page_alloc: freelist migratetype hygiene" Johannes
   Weiner has fixed the page allocator's handling of migratetype requests,
   with resulting improvements in compaction efficiency.
 
 - In the series "make the hugetlb migration strategy consistent" Baolin
   Wang has fixed a hugetlb migration issue, which should improve hugetlb
   allocation reliability.
 
 - Liu Shixin has hit an I/O meltdown caused by readahead in a
   memory-tight memcg.  Addressed in the series "Fix I/O high when memory
   almost met memcg limit".
 
 - In the series "mm/filemap: optimize folio adding and splitting" Kairui
   Song has optimized pagecache insertion, yielding ~10% performance
   improvement in one test.
 
 - Baoquan He has cleaned up and consolidated the early zone
   initialization code in the series "mm/mm_init.c: refactor
   free_area_init_core()".
 
 - Baoquan has also redone some MM initializatio code in the series
   "mm/init: minor clean up and improvement".
 
 - MM helper cleanups from Christoph Hellwig in his series "remove
   follow_pfn".
 
 - More cleanups from Matthew Wilcox in the series "Various page->flags
   cleanups".
 
 - Vlastimil Babka has contributed maintainability improvements in the
   series "memcg_kmem hooks refactoring".
 
 - More folio conversions and cleanups in Matthew Wilcox's series
 
 	"Convert huge_zero_page to huge_zero_folio"
 	"khugepaged folio conversions"
 	"Remove page_idle and page_young wrappers"
 	"Use folio APIs in procfs"
 	"Clean up __folio_put()"
 	"Some cleanups for memory-failure"
 	"Remove page_mapping()"
 	"More folio compat code removal"
 
 - David Hildenbrand chipped in with "fs/proc/task_mmu: convert hugetlb
   functions to work on folis".
 
 - Code consolidation and cleanup work related to GUP's handling of
   hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".
 
 - Rick Edgecombe has developed some fixes to stack guard gaps in the
   series "Cover a guard gap corner case".
 
 - Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the series
   "mm/ksm: fix ksm exec support for prctl".
 
 - Baolin Wang has implemented NUMA balancing for multi-size THPs.  This
   is a simple first-cut implementation for now.  The series is "support
   multi-size THP numa balancing".
 
 - Cleanups to vma handling helper functions from Matthew Wilcox in the
   series "Unify vma_address and vma_pgoff_address".
 
 - Some selftests maintenance work from Dev Jain in the series
   "selftests/mm: mremap_test: Optimizations and style fixes".
 
 - Improvements to the swapping of multi-size THPs from Ryan Roberts in
   the series "Swap-out mTHP without splitting".
 
 - Kefeng Wang has significantly optimized the handling of arm64's
   permission page faults in the series
 
 	"arch/mm/fault: accelerate pagefault when badaccess"
 	"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"
 
 - GUP cleanups from David Hildenbrand in "mm/gup: consistently call it
   GUP-fast".
 
 - hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault path to
   use struct vm_fault".
 
 - selftests build fixes from John Hubbard in the series "Fix
   selftests/mm build without requiring "make headers"".
 
 - Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
   series "Improved Memory Tier Creation for CPUless NUMA Nodes".  Fixes
   the initialization code so that migration between different memory types
   works as intended.
 
 - David Hildenbrand has improved follow_pte() and fixed an errant driver
   in the series "mm: follow_pte() improvements and acrn follow_pte()
   fixes".
 
 - David also did some cleanup work on large folio mapcounts in his
   series "mm: mapcount for large folios + page_mapcount() cleanups".
 
 - Folio conversions in KSM in Alex Shi's series "transfer page to folio
   in KSM".
 
 - Barry Song has added some sysfs stats for monitoring multi-size THP's
   in the series "mm: add per-order mTHP alloc and swpout counters".
 
 - Some zswap cleanups from Yosry Ahmed in the series "zswap same-filled
   and limit checking cleanups".
 
 - Matthew Wilcox has been looking at buffer_head code and found the
   documentation to be lacking.  The series is "Improve buffer head
   documentation".
 
 - Multi-size THPs get more work, this time from Lance Yang.  His series
   "mm/madvise: enhance lazyfreeing with mTHP in madvise_free" optimizes
   the freeing of these things.
 
 - Kemeng Shi has added more userspace-visible writeback instrumentation
   in the series "Improve visibility of writeback".
 
 - Kemeng Shi then sent some maintenance work on top in the series "Fix
   and cleanups to page-writeback".
 
 - Matthew Wilcox reduces mmap_lock traffic in the anon vma code in the
   series "Improve anon_vma scalability for anon VMAs".  Intel's test bot
   reported an improbable 3x improvement in one test.
 
 - SeongJae Park adds some DAMON feature work in the series
 
 	"mm/damon: add a DAMOS filter type for page granularity access recheck"
 	"selftests/damon: add DAMOS quota goal test"
 
 - Also some maintenance work in the series
 
 	"mm/damon/paddr: simplify page level access re-check for pageout"
 	"mm/damon: misc fixes and improvements"
 
 - David Hildenbrand has disabled some known-to-fail selftests ni the
   series "selftests: mm: cow: flag vmsplice() hugetlb tests as XFAIL".
 
 - memcg metadata storage optimizations from Shakeel Butt in "memcg:
   reduce memory consumption by memcg stats".
 
 - DAX fixes and maintenance work from Vishal Verma in the series
   "dax/bus.c: Fixups for dax-bus locking".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZkgQYwAKCRDdBJ7gKXxA
 jrdKAP9WVJdpEcXxpoub/vVE0UWGtffr8foifi9bCwrQrGh5mgEAx7Yf0+d/oBZB
 nvA4E0DcPrUAFy144FNM0NTCb7u9vAw=
 =V3R/
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull mm updates from Andrew Morton:
 "The usual shower of singleton fixes and minor series all over MM,
  documented (hopefully adequately) in the respective changelogs.
  Notable series include:

   - Lucas Stach has provided some page-mapping cleanup/consolidation/
     maintainability work in the series "mm/treewide: Remove pXd_huge()
     API".

   - In the series "Allow migrate on protnone reference with
     MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
     MPOL_PREFERRED_MANY mode, yielding almost doubled performance in
     one test.

   - In their series "Memory allocation profiling" Kent Overstreet and
     Suren Baghdasaryan have contributed a means of determining (via
     /proc/allocinfo) whereabouts in the kernel memory is being
     allocated: number of calls and amount of memory.

   - Matthew Wilcox has provided the series "Various significant MM
     patches" which does a number of rather unrelated things, but in
     largely similar code sites.

   - In his series "mm: page_alloc: freelist migratetype hygiene"
     Johannes Weiner has fixed the page allocator's handling of
     migratetype requests, with resulting improvements in compaction
     efficiency.

   - In the series "make the hugetlb migration strategy consistent"
     Baolin Wang has fixed a hugetlb migration issue, which should
     improve hugetlb allocation reliability.

   - Liu Shixin has hit an I/O meltdown caused by readahead in a
     memory-tight memcg. Addressed in the series "Fix I/O high when
     memory almost met memcg limit".

   - In the series "mm/filemap: optimize folio adding and splitting"
     Kairui Song has optimized pagecache insertion, yielding ~10%
     performance improvement in one test.

   - Baoquan He has cleaned up and consolidated the early zone
     initialization code in the series "mm/mm_init.c: refactor
     free_area_init_core()".

   - Baoquan has also redone some MM initializatio code in the series
     "mm/init: minor clean up and improvement".

   - MM helper cleanups from Christoph Hellwig in his series "remove
     follow_pfn".

   - More cleanups from Matthew Wilcox in the series "Various
     page->flags cleanups".

   - Vlastimil Babka has contributed maintainability improvements in the
     series "memcg_kmem hooks refactoring".

   - More folio conversions and cleanups in Matthew Wilcox's series:
	"Convert huge_zero_page to huge_zero_folio"
	"khugepaged folio conversions"
	"Remove page_idle and page_young wrappers"
	"Use folio APIs in procfs"
	"Clean up __folio_put()"
	"Some cleanups for memory-failure"
	"Remove page_mapping()"
	"More folio compat code removal"

   - David Hildenbrand chipped in with "fs/proc/task_mmu: convert
     hugetlb functions to work on folis".

   - Code consolidation and cleanup work related to GUP's handling of
     hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".

   - Rick Edgecombe has developed some fixes to stack guard gaps in the
     series "Cover a guard gap corner case".

   - Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the
     series "mm/ksm: fix ksm exec support for prctl".

   - Baolin Wang has implemented NUMA balancing for multi-size THPs.
     This is a simple first-cut implementation for now. The series is
     "support multi-size THP numa balancing".

   - Cleanups to vma handling helper functions from Matthew Wilcox in
     the series "Unify vma_address and vma_pgoff_address".

   - Some selftests maintenance work from Dev Jain in the series
     "selftests/mm: mremap_test: Optimizations and style fixes".

   - Improvements to the swapping of multi-size THPs from Ryan Roberts
     in the series "Swap-out mTHP without splitting".

   - Kefeng Wang has significantly optimized the handling of arm64's
     permission page faults in the series
	"arch/mm/fault: accelerate pagefault when badaccess"
	"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"

   - GUP cleanups from David Hildenbrand in "mm/gup: consistently call
     it GUP-fast".

   - hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault
     path to use struct vm_fault".

   - selftests build fixes from John Hubbard in the series "Fix
     selftests/mm build without requiring "make headers"".

   - Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
     series "Improved Memory Tier Creation for CPUless NUMA Nodes".
     Fixes the initialization code so that migration between different
     memory types works as intended.

   - David Hildenbrand has improved follow_pte() and fixed an errant
     driver in the series "mm: follow_pte() improvements and acrn
     follow_pte() fixes".

   - David also did some cleanup work on large folio mapcounts in his
     series "mm: mapcount for large folios + page_mapcount() cleanups".

   - Folio conversions in KSM in Alex Shi's series "transfer page to
     folio in KSM".

   - Barry Song has added some sysfs stats for monitoring multi-size
     THP's in the series "mm: add per-order mTHP alloc and swpout
     counters".

   - Some zswap cleanups from Yosry Ahmed in the series "zswap
     same-filled and limit checking cleanups".

   - Matthew Wilcox has been looking at buffer_head code and found the
     documentation to be lacking. The series is "Improve buffer head
     documentation".

   - Multi-size THPs get more work, this time from Lance Yang. His
     series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free"
     optimizes the freeing of these things.

   - Kemeng Shi has added more userspace-visible writeback
     instrumentation in the series "Improve visibility of writeback".

   - Kemeng Shi then sent some maintenance work on top in the series
     "Fix and cleanups to page-writeback".

   - Matthew Wilcox reduces mmap_lock traffic in the anon vma code in
     the series "Improve anon_vma scalability for anon VMAs". Intel's
     test bot reported an improbable 3x improvement in one test.

   - SeongJae Park adds some DAMON feature work in the series
	"mm/damon: add a DAMOS filter type for page granularity access recheck"
	"selftests/damon: add DAMOS quota goal test"

   - Also some maintenance work in the series
	"mm/damon/paddr: simplify page level access re-check for pageout"
	"mm/damon: misc fixes and improvements"

   - David Hildenbrand has disabled some known-to-fail selftests ni the
     series "selftests: mm: cow: flag vmsplice() hugetlb tests as
     XFAIL".

   - memcg metadata storage optimizations from Shakeel Butt in "memcg:
     reduce memory consumption by memcg stats".

   - DAX fixes and maintenance work from Vishal Verma in the series
     "dax/bus.c: Fixups for dax-bus locking""

* tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits)
  memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order
  selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime
  mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp
  mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault
  selftests: cgroup: add tests to verify the zswap writeback path
  mm: memcg: make alloc_mem_cgroup_per_node_info() return bool
  mm/damon/core: fix return value from damos_wmark_metric_value
  mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED
  selftests: cgroup: remove redundant enabling of memory controller
  Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree
  Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT
  Docs/mm/damon/design: use a list for supported filters
  Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command
  Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file
  selftests/damon: classify tests for functionalities and regressions
  selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None'
  selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts
  selftests/damon/_damon_sysfs: check errors from nr_schemes file reads
  mm/damon/core: initialize ->esz_bp from damos_quota_init_priv()
  selftests/damon: add a test for DAMOS quota goal
  ...
2024-05-19 09:21:03 -07:00
Kent Overstreet
0069455bcb fix missing vmalloc.h includes
Patch series "Memory allocation profiling", v6.

Overview:
Low overhead [1] per-callsite memory allocation profiling. Not just for
debug kernels, overhead low enough to be deployed in production.

Example output:
  root@moria-kvm:~# sort -rn /proc/allocinfo
   127664128    31168 mm/page_ext.c:270 func:alloc_page_ext
    56373248     4737 mm/slub.c:2259 func:alloc_slab_page
    14880768     3633 mm/readahead.c:247 func:page_cache_ra_unbounded
    14417920     3520 mm/mm_init.c:2530 func:alloc_large_system_hash
    13377536      234 block/blk-mq.c:3421 func:blk_mq_alloc_rqs
    11718656     2861 mm/filemap.c:1919 func:__filemap_get_folio
     9192960     2800 kernel/fork.c:307 func:alloc_thread_stack_node
     4206592        4 net/netfilter/nf_conntrack_core.c:2567 func:nf_ct_alloc_hashtable
     4136960     1010 drivers/staging/ctagmod/ctagmod.c:20 [ctagmod] func:ctagmod_start
     3940352      962 mm/memory.c:4214 func:alloc_anon_folio
     2894464    22613 fs/kernfs/dir.c:615 func:__kernfs_new_node
     ...

Usage:
kconfig options:
 - CONFIG_MEM_ALLOC_PROFILING
 - CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
 - CONFIG_MEM_ALLOC_PROFILING_DEBUG
   adds warnings for allocations that weren't accounted because of a
   missing annotation

sysctl:
  /proc/sys/vm/mem_profiling

Runtime info:
  /proc/allocinfo

Notes:

[1]: Overhead
To measure the overhead we are comparing the following configurations:
(1) Baseline with CONFIG_MEMCG_KMEM=n
(2) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
    CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n)
(3) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
    CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y)
(4) Enabled at runtime (CONFIG_MEM_ALLOC_PROFILING=y &&
    CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n && /proc/sys/vm/mem_profiling=1)
(5) Baseline with CONFIG_MEMCG_KMEM=y && allocating with __GFP_ACCOUNT
(6) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
    CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n)  && CONFIG_MEMCG_KMEM=y
(7) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
    CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y) && CONFIG_MEMCG_KMEM=y

Performance overhead:
To evaluate performance we implemented an in-kernel test executing
multiple get_free_page/free_page and kmalloc/kfree calls with allocation
sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU
affinity set to a specific CPU to minimize the noise. Below are results
from running the test on Ubuntu 22.04.2 LTS with 6.8.0-rc1 kernel on
56 core Intel Xeon:

                        kmalloc                 pgalloc
(1 baseline)            6.764s                  16.902s
(2 default disabled)    6.793s  (+0.43%)        17.007s (+0.62%)
(3 default enabled)     7.197s  (+6.40%)        23.666s (+40.02%)
(4 runtime enabled)     7.405s  (+9.48%)        23.901s (+41.41%)
(5 memcg)               13.388s (+97.94%)       48.460s (+186.71%)
(6 def disabled+memcg)  13.332s (+97.10%)       48.105s (+184.61%)
(7 def enabled+memcg)   13.446s (+98.78%)       54.963s (+225.18%)

Memory overhead:
Kernel size:

   text           data        bss         dec         diff
(1) 26515311	      18890222    17018880    62424413
(2) 26524728	      19423818    16740352    62688898    264485
(3) 26524724	      19423818    16740352    62688894    264481
(4) 26524728	      19423818    16740352    62688898    264485
(5) 26541782	      18964374    16957440    62463596    39183

Memory consumption on a 56 core Intel CPU with 125GB of memory:
Code tags:           192 kB
PageExts:         262144 kB (256MB)
SlabExts:           9876 kB (9.6MB)
PcpuExts:            512 kB (0.5MB)

Total overhead is 0.2% of total memory.

Benchmarks:

Hackbench tests run 100 times:
hackbench -s 512 -l 200 -g 15 -f 25 -P
      baseline       disabled profiling           enabled profiling
avg   0.3543         0.3559 (+0.0016)             0.3566 (+0.0023)
stdev 0.0137         0.0188                       0.0077


hackbench -l 10000
      baseline       disabled profiling           enabled profiling
avg   6.4218         6.4306 (+0.0088)             6.5077 (+0.0859)
stdev 0.0933         0.0286                       0.0489

stress-ng tests:
stress-ng --class memory --seq 4 -t 60
stress-ng --class cpu --seq 4 -t 60
Results posted at: https://evilpiepirate.org/~kent/memalloc_prof_v4_stress-ng/

[2] https://lore.kernel.org/all/20240306182440.2003814-1-surenb@google.com/


This patch (of 37):

The next patch drops vmalloc.h from a system header in order to fix a
circular dependency; this adds it to all the files that were pulling it in
implicitly.

[kent.overstreet@linux.dev: fix arch/alpha/lib/memcpy.c]
  Link: https://lkml.kernel.org/r/20240327002152.3339937-1-kent.overstreet@linux.dev
[surenb@google.com: fix arch/x86/mm/numa_32.c]
  Link: https://lkml.kernel.org/r/20240402180933.1663992-1-surenb@google.com
[kent.overstreet@linux.dev: a few places were depending on sizes.h]
  Link: https://lkml.kernel.org/r/20240404034744.1664840-1-kent.overstreet@linux.dev
[arnd@arndb.de: fix mm/kasan/hw_tags.c]
  Link: https://lkml.kernel.org/r/20240404124435.3121534-1-arnd@kernel.org
[surenb@google.com: fix arc build]
  Link: https://lkml.kernel.org/r/20240405225115.431056-1-surenb@google.com
Link: https://lkml.kernel.org/r/20240321163705.3067592-1-surenb@google.com
Link: https://lkml.kernel.org/r/20240321163705.3067592-2-surenb@google.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Andreas Hindborg <a.hindborg@samsung.com>
Cc: Benno Lossin <benno.lossin@proton.me>
Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Gary Guo <gary@garyguo.net>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wedson Almeida Filho <wedsonaf@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25 20:55:49 -07:00
Maíra Canal
6abe93b621
drm/v3d: Fix race-condition between sysfs/fdinfo and interrupt handler
In V3D, the conclusion of a job is indicated by a IRQ. When a job
finishes, then we update the local and the global GPU stats of that
queue. But, while the GPU stats are being updated, a user might be
reading the stats from sysfs or fdinfo.

For example, on `gpu_stats_show()`, we could think about a scenario where
`v3d->queue[queue].start_ns != 0`, then an interrupt happens, we update
the value of `v3d->queue[queue].start_ns` to 0, we come back to
`gpu_stats_show()` to calculate `active_runtime` and now,
`active_runtime = timestamp`.

In this simple example, the user would see a spike in the queue usage,
that didn't match reality.

In order to address this issue properly, use a seqcount to protect read
and write sections of the code.

Fixes: 09a93cc4f7 ("drm/v3d: Implement show_fdinfo() callback for GPU usage stats")
Reported-by: Tvrtko Ursulin <tursulin@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240420213632.339941-7-mcanal@igalia.com
2024-04-23 19:32:49 -03:00
Maíra Canal
12d1624ce3
drm/v3d: Decouple stats calculation from printing
Create a function to decouple the stats calculation from the printing.
This will be useful in the next step when we add a seqcount to protect
the stats.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240420213632.339941-6-mcanal@igalia.com
2024-04-23 19:32:48 -03:00
Maíra Canal
da483d079b
drm/v3d: Create function to update a set of GPU stats
Given a set of GPU stats, that is, a `struct v3d_stats` related to a
queue in a given context, create a function that can update this set
of GPU stats.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240420213632.339941-5-mcanal@igalia.com
2024-04-23 19:32:47 -03:00
Maíra Canal
b136b1953f
drm/v3d: Create a struct to store the GPU stats
This will make it easier to instantiate the GPU stats variables and it
will create a structure where we can store all the variables that refer
to GPU stats.

Note that, when we created the struct `v3d_stats`, we renamed
`jobs_sent` to `jobs_completed`. This better express the semantics of
the variable, as we are only accounting jobs that have been completed.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240420213632.339941-4-mcanal@igalia.com
2024-04-23 19:32:46 -03:00
Maíra Canal
52ce97765c
drm/v3d: Create two functions to update all GPU stats variables
Currently, we manually perform all operations to update the GPU stats
variables. Apart from the code repetition, this is very prone to errors,
as we can see on commit 35f4f8c9fc ("drm/v3d: Don't increment
`enabled_ns` twice").

Therefore, create two functions to manage updating all GPU stats
variables. Now, the jobs only need to call for `v3d_job_update_stats()`
when the job is done and `v3d_job_start_stats()` when starting the job.

Co-developed-by: Tvrtko Ursulin <tursulin@igalia.com>
Signed-off-by: Tvrtko Ursulin <tursulin@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240420213632.339941-3-mcanal@igalia.com
2024-04-23 19:32:45 -03:00
Maíra Canal
35f4f8c9fc
drm/v3d: Don't increment enabled_ns twice
The commit 509433d814 ("drm/v3d: Expose the total GPU usage stats on sysfs")
introduced the calculation of global GPU stats. For the regards, it used
the already existing infrastructure provided by commit 09a93cc4f7 ("drm/v3d:
Implement show_fdinfo() callback for GPU usage stats"). While adding
global GPU stats calculation ability, the author forgot to delete the
existing one.

Currently, the value of `enabled_ns` is incremented twice by the end of
the job, when it should be added just once. Therefore, delete the
leftovers from commit 509433d814 ("drm/v3d: Expose the total GPU usage
stats on sysfs").

Fixes: 509433d814 ("drm/v3d: Expose the total GPU usage stats on sysfs")
Reported-by: Tvrtko Ursulin <tursulin@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240403203517.731876-2-mcanal@igalia.com
2024-04-15 15:00:19 -03:00
Maíra Canal
51b76c1f30
drm/v3d: Enable V3D to use different PAGE_SIZE
Currently, the V3D driver uses PAGE_SHIFT over the assumption that
PAGE_SHIFT = 12, as the PAGE_SIZE = 4KB. But, the RPi 5 is using
PAGE_SIZE = 16KB, so the MMU PAGE_SHIFT is different than the system's
PAGE_SHIFT.

Enable V3D to be used in system's with any PAGE_SIZE by making sure that
everything MMU-related uses the MMU page shift.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240214193503.164462-1-mcanal@igalia.com
2024-02-23 16:37:20 -03:00
Dave Airlie
f8e4806e0d drm-misc-next for v6.9:
UAPI Changes:
 
 virtio:
 - add Venus capset defines
 
 Cross-subsystem Changes:
 
 Core Changes:
 
 - fix drm_fixp2int_ceil()
 - documentation fixes
 - clean ups
 - allow DRM_MM_DEBUG with DRM=m
 - build fixes for debugfs support
 - EDID cleanups
 - sched: error-handling fixes
 - ttm: add tests
 
 Driver Changes:
 
 bridge:
 - ite-6505: fix DP link-training bug
 - samsung-dsim: fix error checking in probe
 - tc358767: fix regmap usage
 
 efifb:
 - use copy of global screen_info state
 
 hisilicon:
 - fix EDID includes
 
 mgag200:
 - improve ioremap usage
 - convert to struct drm_edid
 
 nouveau:
 - disp: use kmemdup()
 - fix EDID includes
 - documentation fixes
 
 panel:
 - ltk050h3146w: error-handling fixes
 - panel-edp: support delay between power-on and enable; use put_sync in
   unprepare; support Mediatek MT8173 Chromebooks, BOE NV116WHM-N49 V8.0,
   BOE NV122WUM-N41, CSO MNC207QS1-1 plus DT bindings
 - panel-lvds: support EDT ETML0700Z9NDHA plus DT bindings
 - panel-novatek: FRIDA FRD400B25025-A-CTK plus DT bindings
 
 qaic:
 - fixes to BO handling
 - make use of DRM managed release
 - fix order of remove operations
 
 rockchip:
 - analogix_dp: get encoder port from DT
 - inno_hdmi: support HDMI for RK3128
 - lvds: error-handling fixes
 
 simplefb:
 - fix logging
 
 ssd130x:
 - support SSD133x plus DT bindings
 
 tegra:
 - fix error handling
 
 tilcdc:
 - make use of DRM managed release
 
 v3d:
 - show memory stats in debugfs
 
 vc4:
 - fix error handling in plane prepare_fb
 - fix framebuffer test in plane helpers
 
 vesafb:
 - use copy of global screen_info state
 
 virtio:
 - cleanups
 
 vkms:
 - fix OOB access when programming the LUT
 - Kconfig improvements
 
 vmwgfx:
 - unmap surface before changing plane state
 - fix memory leak in error handling
 - documentation fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmWgDSsACgkQaA3BHVML
 eiPG5AgAjK42pY25OezV/wmUiffhfDluTbjw+Cjwg/IzZ+RyAMcf1q7R8Oid6G6o
 kG64WfUmBURwdOQ2nvtJGum8YoBx70uvmKyFzhu20v+e9HT/EFizRi7Qvg71Il/0
 l8WQ5BJ1W6PvJKgEU4io+OWWhDyZ1B3GpCvkd30wF2Vfi0CPeeVAvwmeKC5Rj4lE
 lbYa93sAS/neNkTBvhEEUGoz/t9+5FqAMarygM6hYzDVgnhs8Mbm/pRKCUgPvYvp
 871oibBPenQGP6jkWU1h1UWIa5CACvHgkebSGUGy6b5e9jPBRrSmlnRhiviJEY1/
 6x7kGnB69r6I40CdY/1cnTqyhJ25cA==
 =GX2h
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2024-01-11' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

drm-misc-next for v6.9:

UAPI Changes:

virtio:
- add Venus capset defines

Cross-subsystem Changes:

Core Changes:

- fix drm_fixp2int_ceil()
- documentation fixes
- clean ups
- allow DRM_MM_DEBUG with DRM=m
- build fixes for debugfs support
- EDID cleanups
- sched: error-handling fixes
- ttm: add tests

Driver Changes:

bridge:
- ite-6505: fix DP link-training bug
- samsung-dsim: fix error checking in probe
- tc358767: fix regmap usage

efifb:
- use copy of global screen_info state

hisilicon:
- fix EDID includes

mgag200:
- improve ioremap usage
- convert to struct drm_edid

nouveau:
- disp: use kmemdup()
- fix EDID includes
- documentation fixes

panel:
- ltk050h3146w: error-handling fixes
- panel-edp: support delay between power-on and enable; use put_sync in
  unprepare; support Mediatek MT8173 Chromebooks, BOE NV116WHM-N49 V8.0,
  BOE NV122WUM-N41, CSO MNC207QS1-1 plus DT bindings
- panel-lvds: support EDT ETML0700Z9NDHA plus DT bindings
- panel-novatek: FRIDA FRD400B25025-A-CTK plus DT bindings

qaic:
- fixes to BO handling
- make use of DRM managed release
- fix order of remove operations

rockchip:
- analogix_dp: get encoder port from DT
- inno_hdmi: support HDMI for RK3128
- lvds: error-handling fixes

simplefb:
- fix logging

ssd130x:
- support SSD133x plus DT bindings

tegra:
- fix error handling

tilcdc:
- make use of DRM managed release

v3d:
- show memory stats in debugfs

vc4:
- fix error handling in plane prepare_fb
- fix framebuffer test in plane helpers

vesafb:
- use copy of global screen_info state

virtio:
- cleanups

vkms:
- fix OOB access when programming the LUT
- Kconfig improvements

vmwgfx:
- unmap surface before changing plane state
- fix memory leak in error handling
- documentation fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240111154902.GA8448@linux-uq9g
2024-02-05 13:50:15 +10:00
Maíra Canal
502756e233
drm/v3d: Show the memory-management stats on debugfs
Dump the contents of the DRM MM allocator of the V3D driver. This will
help us to debug the VA ranges allocated.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240105145851.193492-1-mcanal@igalia.com
2024-01-11 11:45:47 -03:00
Maíra Canal
2ad62d16cd
drm/v3d: Free the job and assign it to NULL if initialization fails
Currently, if `v3d_job_init()` fails (e.g. in the IGT test "bad-in-sync",
where we submit an invalid in-sync to the IOCTL), then we end up with
the following NULL pointer dereference:

[   34.146279] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000078
[   34.146301] Mem abort info:
[   34.146306]   ESR = 0x0000000096000005
[   34.146315]   EC = 0x25: DABT (current EL), IL = 32 bits
[   34.146322]   SET = 0, FnV = 0
[   34.146328]   EA = 0, S1PTW = 0
[   34.146334]   FSC = 0x05: level 1 translation fault
[   34.146340] Data abort info:
[   34.146345]   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
[   34.146351]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[   34.146357]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[   34.146366] user pgtable: 4k pages, 39-bit VAs, pgdp=00000001232e6000
[   34.146375] [0000000000000078] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
[   34.146399] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
[   34.146406] Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device algif_hash aes_neon_bs aes_neon_blk algif_skcipher af_alg bnep hid_logitech_hidpp brcmfmac_wcc brcmfmac brcmutil hci_uart vc4 btbcm cfg80211 bluetooth bcm2835_v4l2(C) snd_soc_hdmi_codec binfmt_misc cec drm_display_helper hid_logitech_dj bcm2835_mmal_vchiq(C) drm_dma_helper drm_kms_helper videobuf2_v4l2 raspberrypi_hwmon ecdh_generic videobuf2_vmalloc videobuf2_memops ecc videobuf2_common rfkill videodev libaes snd_soc_core dwc2 i2c_brcmstb snd_pcm_dmaengine snd_bcm2835(C) i2c_bcm2835 pwm_bcm2835 snd_pcm mc v3d snd_timer snd gpu_sched drm_shmem_helper nvmem_rmem uio_pdrv_genirq uio i2c_dev drm fuse dm_mod drm_panel_orientation_quirks backlight configfs ip_tables x_tables ipv6
[   34.146556] CPU: 1 PID: 1890 Comm: v3d_submit_csd Tainted: G         C         6.7.0-rc3-g49ddab089611 #68
[   34.146563] Hardware name: Raspberry Pi 4 Model B Rev 1.5 (DT)
[   34.146569] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   34.146575] pc : drm_sched_job_cleanup+0x3c/0x190 [gpu_sched]
[   34.146611] lr : v3d_submit_csd_ioctl+0x1b4/0x460 [v3d]
[   34.146653] sp : ffffffc083cbbb80
[   34.146658] x29: ffffffc083cbbb90 x28: ffffff81035afc00 x27: ffffffe77a641168
[   34.146668] x26: ffffff81056a8000 x25: 0000000000000058 x24: 0000000000000000
[   34.146677] x23: ffffff81065e2000 x22: ffffff81035afe00 x21: ffffffc083cbbcf0
[   34.146686] x20: ffffff81035afe00 x19: 00000000ffffffea x18: 0000000000000000
[   34.146694] x17: 0000000000000000 x16: ffffffe7989e34b0 x15: 0000000000000000
[   34.146703] x14: 0000000004000004 x13: ffffff81035afe80 x12: ffffffc083cb8000
[   34.146711] x11: cc57e05dfbe5ef00 x10: cc57e05dfbe5ef00 x9 : ffffffe77a64131c
[   34.146719] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 000000000000003f
[   34.146727] x5 : 0000000000000040 x4 : ffffff81fefb03f0 x3 : ffffffc083cbba40
[   34.146736] x2 : ffffff81056a8000 x1 : ffffffe7989e35e8 x0 : 0000000000000000
[   34.146745] Call trace:
[   34.146748]  drm_sched_job_cleanup+0x3c/0x190 [gpu_sched]
[   34.146768]  v3d_submit_csd_ioctl+0x1b4/0x460 [v3d]
[   34.146791]  drm_ioctl_kernel+0xe0/0x120 [drm]
[   34.147029]  drm_ioctl+0x264/0x408 [drm]
[   34.147135]  __arm64_sys_ioctl+0x9c/0xe0
[   34.147152]  invoke_syscall+0x4c/0x118
[   34.147162]  el0_svc_common+0xb8/0xf0
[   34.147168]  do_el0_svc+0x28/0x40
[   34.147174]  el0_svc+0x38/0x88
[   34.147184]  el0t_64_sync_handler+0x84/0x100
[   34.147191]  el0t_64_sync+0x190/0x198
[   34.147201] Code: aa0003f4 f90007e8 f9401008 aa0803e0 (b8478c09)
[   34.147210] ---[ end trace 0000000000000000 ]---

This happens because we are calling `drm_sched_job_cleanup()` twice:
once at `v3d_job_init()` and again when we call `v3d_job_cleanup()`.

To mitigate this issue, we can return to the same approach that we used
to use before 464c61e76d: deallocate the job after `v3d_job_init()`
fails and assign it to NULL. Then, when we call `v3d_job_cleanup()`, job
is NULL and the function returns.

Fixes: 464c61e76d ("drm/v3d: Decouple job allocation from job initiation")
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240109142857.1122704-1-mcanal@igalia.com
2024-01-11 11:23:13 -03:00
Maíra Canal
89fe46019a
drm/v3d: Fix support for register debugging on the RPi 4
RPi 4 uses V3D 4.2, which is currently not supported by the register
definition stated at `v3d_core_reg_defs`. We should be able to support
V3D 4.2, therefore, change the maximum version of the register
definition to 42, not 41.

Fixes: 0ad5bc1ce4 ("drm/v3d: fix up register addresses for V3D 7.x")
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240109113126.929446-1-mcanal@igalia.com
2024-01-09 14:21:47 -03:00
Harshit Mogalapalli
dce94061f0 drm/v3d: Fix missing error code in v3d_submit_cpu_ioctl()
Smatch warns:
	drivers/gpu/drm/v3d/v3d_submit.c:1222 v3d_submit_cpu_ioctl()
	warn: missing error code 'ret'

When there is no job type or job is submitted with wrong number of BOs
it is an error path, ret is zero at this point which is incorrect
return.

Fix this by changing it to -EINVAL.

Fixes: aafc1a2bea ("drm/v3d: Add a CPU job submission")
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Melissa Wen <melissa.srw@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231204122102.181298-1-harshit.m.mogalapalli@oracle.com
2023-12-04 21:30:33 -01:00
Maíra Canal
209e8d2695
drm/v3d: Create a CPU job extension for the copy performance query job
A CPU job is a type of job that performs operations that requires CPU
intervention. A copy performance query job is a job that copy the complete
or partial result of a query to a buffer. In order to copy the result of
a performance query to a buffer, we need to get the values from the
performance monitors.

So, create a user extension for the CPU job that enables the creation
of a copy performance query job. This user extension will allow the creation
of a CPU job that copy the results of a performance query to a BO with the
possibility to indicate the availability with a availability bit.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231130164420.932823-19-mcanal@igalia.com
2023-12-01 09:47:36 -03:00
Maíra Canal
bae7cb5d68
drm/v3d: Create a CPU job extension for the reset performance query job
A CPU job is a type of job that performs operations that requires CPU
intervention. A reset performance query job is a job that resets the
performance queries by resetting the values of the perfmons. Moreover,
we also reset the syncobjs related to the availability of the query.

So, create a user extension for the CPU job that enables the creation
of a reset performance job. This user extension will allow the creation of
a CPU job that resets the perfmons values and resets the availability syncobj.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231130164420.932823-18-mcanal@igalia.com
2023-12-01 09:47:35 -03:00
Maíra Canal
6745f3e44a
drm/v3d: Create a CPU job extension to copy timestamp query to a buffer
A CPU job is a type of job that performs operations that requires CPU
intervention. A copy timestamp query job is a job that copy the complete
or partial result of a query to a buffer. As V3D doesn't provide any
mechanism to obtain a timestamp from the GPU, it is a job that needs
CPU intervention.

So, create a user extension for the CPU job that enables the creation
of a copy timestamp query job. This user extension will allow the creation
of a CPU job that copy the results of a timestamp query to a BO with the
possibility to indicate the timestamp availability with a availability bit.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231130164420.932823-17-mcanal@igalia.com
2023-12-01 09:47:31 -03:00
Maíra Canal
34a101e642
drm/v3d: Create a CPU job extension for the reset timestamp job
A CPU job is a type of job that performs operations that requires CPU
intervention. A reset timestamp job is a job that resets the timestamp
queries based on the value offset of the first query. As V3D doesn't
provide any mechanism to obtain a timestamp from the GPU, it is a job
that needs CPU intervention.

So, create a user extension for the CPU job that enables the creation
of a reset timestamp job. This user extension will allow the creation of
a CPU job that resets the timestamp value in the timestamp BO and resets
the availability syncobj.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231130164420.932823-16-mcanal@igalia.com
2023-12-01 09:42:47 -03:00
Maíra Canal
9ba0ff3e08
drm/v3d: Create a CPU job extension for the timestamp query job
A CPU job is a type of job that performs operations that requires CPU
intervention. A timestamp query job is a job that calculates the
query timestamp and updates the query availability by signaling a
syncobj. As V3D doesn't provide any mechanism to obtain a timestamp
from the GPU, it is a job that needs CPU intervention.

So, create a user extension for the CPU job that enables the creation
of a timestamp query job. This user extension will allow the creation of
a CPU job that performs the timestamp query calculation and updates the
timestamp BO with the proper value.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231130164420.932823-15-mcanal@igalia.com
2023-12-01 09:41:37 -03:00
Maíra Canal
18b8413b25
drm/v3d: Create a CPU job extension for a indirect CSD job
A CPU job is a type of job that performs operations that requires CPU
intervention. An indirect CSD job is a job that, when executed in the
queue, will map the indirect buffer, read the dispatch parameters, and
submit a regular dispatch. Therefore, it is a job that needs CPU
intervention.

So, create a user extension for the CPU job that enables the creation
of an indirect CSD. This user extension will allow the creation of a CSD
job linked to a CPU job. The CPU job will wait for the indirect CSD job
dependencies and, once they are signaled, it will update the CSD job
parameters.

Co-developed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231130164420.932823-14-mcanal@igalia.com
2023-12-01 09:40:15 -03:00
Maíra Canal
7c13132c40
drm/v3d: Enable BO mapping
For the indirect CSD CPU job, we will need to access the internal
contents of the BO with the dispatch parameters. Therefore, create
methods to allow the mapping and unmapping of the BO.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231130164420.932823-13-mcanal@igalia.com
2023-12-01 09:37:50 -03:00
Melissa Wen
369b059617
drm/v3d: Detach the CSD job BO setup
Detach CSD job setup from CSD submission ioctl to reuse it in CPU
submission ioctl for indirect CSD job.

Signed-off-by: Melissa Wen <mwen@igalia.com>
Co-developed-by: Maíra Canal <mcanal@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231130164420.932823-12-mcanal@igalia.com
2023-12-01 09:37:49 -03:00
Maíra Canal
1fe0879efc
drm/v3d: Create tracepoints to track the CPU job
Create tracepoints to track the three major events of a CPU job
lifetime:
	1. Submission of a `v3d_submit_cpu` IOCTL
	2. Beginning of the execution of a CPU job
	3. Ending of the execution of a CPU job

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231130164420.932823-11-mcanal@igalia.com
2023-12-01 09:37:48 -03:00
Maíra Canal
c5195d001f
drm/v3d: Use v3d_get_extensions() to parse CPU job data
Currently, v3d_get_extensions() only parses multisync data and assigns
it to the `struct v3d_submit_ext`. But, to implement the CPU job with
user extensions, we want v3d_get_extensions() to be able to parse CPU
job data and assign it to the `struct v3d_cpu_job`.

Therefore, allow the function v3d_get_extensions() to use `struct v3d_cpu_job *`
as a parameter. If the `struct v3d_cpu_job *` is assigned to NULL, it means
that the job is a GPU job and CPU job extensions should be rejected.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231130164420.932823-10-mcanal@igalia.com
2023-12-01 09:34:25 -03:00
Melissa Wen
aafc1a2bea
drm/v3d: Add a CPU job submission
Create a new type of job, a CPU job. A CPU job is a type of job that
performs operations that requires CPU intervention. The overall idea is
to use user extensions to enable different types of CPU job, allowing the
CPU job to perform different operations according to the type of user
extension. The user extension ID identify the type of CPU job that must
be dealt.

Having a CPU job is interesting for synchronization purposes as a CPU
job has a queue like any other V3D job and can be synchoronized by the
multisync extension.

Signed-off-by: Melissa Wen <mwen@igalia.com>
Co-developed-by: Maíra Canal <mcanal@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231130164420.932823-9-mcanal@igalia.com
2023-12-01 09:34:19 -03:00
Maíra Canal
464c61e76d
drm/v3d: Decouple job allocation from job initiation
We want to allow the IOCTLs to allocate the job without initiating it.
This will be useful for the CPU job submission IOCTL, as the CPU job has
the need to use information from the user extensions. Currently, the
user extensions are parsed before the job allocation, making it
impossible to fill the CPU job when parsing the user extensions.
Therefore, decouple the job allocation from the job initiation.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231130164420.932823-8-mcanal@igalia.com
2023-12-01 09:34:15 -03:00
Maíra Canal
6893deb881
drm/v3d: Don't allow two multisync extensions in the same job
Currently, two multisync extensions can be added to the same job and
only the last multisync extension will be used. To avoid this
vulnerability, don't allow two multisync extensions in the same job.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231130164420.932823-7-mcanal@igalia.com
2023-12-01 09:34:12 -03:00
Melissa Wen
8288faaa8b
drm/v3d: Simplify job refcount handling
Instead of checking if the job is NULL every time we call the function,
check it inside the function.

Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231130164420.932823-6-mcanal@igalia.com
2023-12-01 09:34:08 -03:00
Melissa Wen
9032d5f633
drm/v3d: Detach job submissions IOCTLs to a new specific file
We will include a new job submission type, the CPU job submission. For
readability and maintability, separate the job submission IOCTLs and
related operations from v3d_gem.c.

Minor fix in the CSD submission kernel doc:
CSD (texture formatting) -> CSD (compute shader).

Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231130164420.932823-5-mcanal@igalia.com
2023-12-01 09:34:01 -03:00
Melissa Wen
a8ad9d63a1
drm/v3d: Move wait BO ioctl to the v3d_bo file
IOCTLs related to BO operations reside on the file v3d_bo.c. The wait BO
ioctl is the only IOCTL regarding BOs that is placed in a different file.
So, move it to the v3d_bo.c file.

Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231130164420.932823-4-mcanal@igalia.com
2023-12-01 09:33:56 -03:00
Melissa Wen
780b9463ce
drm/v3d: Remove unused function header
v3d_mmu_get_offset header was added but the function was never defined.
Just remove it.

Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231130164420.932823-3-mcanal@igalia.com
2023-12-01 09:33:52 -03:00
Danilo Krummrich
a78422e9df drm/sched: implement dynamic job-flow control
Currently, job flow control is implemented simply by limiting the number
of jobs in flight. Therefore, a scheduler is initialized with a credit
limit that corresponds to the number of jobs which can be sent to the
hardware.

This implies that for each job, drivers need to account for the maximum
job size possible in order to not overflow the ring buffer.

However, there are drivers, such as Nouveau, where the job size has a
rather large range. For such drivers it can easily happen that job
submissions not even filling the ring by 1% can block subsequent
submissions, which, in the worst case, can lead to the ring run dry.

In order to overcome this issue, allow for tracking the actual job size
instead of the number of jobs. Therefore, add a field to track a job's
credit count, which represents the number of credits a job contributes
to the scheduler's credit limit.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Reviewed-by: Luben Tuikov <ltuikov89@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231110001638.71750-1-dakr@redhat.com
2023-11-10 02:54:29 +01:00
Maíra Canal
509433d814
drm/v3d: Expose the total GPU usage stats on sysfs
The previous patch exposed the accumulated amount of active time per
client for each V3D queue. But this doesn't provide a global notion of
the GPU usage.

Therefore, provide the accumulated amount of active time for each V3D
queue (BIN, RENDER, CSD, TFU and CACHE_CLEAN), considering all the jobs
submitted to the queue, independent of the client.

This data is exposed through the sysfs interface, so that if the
interface is queried at two different points of time the usage percentage
of each of the queues can be calculated.

Co-developed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Acked-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230905213416.1290219-3-mcanal@igalia.com
2023-11-06 10:09:29 -03:00
Maíra Canal
09a93cc4f7
drm/v3d: Implement show_fdinfo() callback for GPU usage stats
This patch exposes the accumulated amount of active time per client
through the fdinfo infrastructure. The amount of active time is exposed
for each V3D queue: BIN, RENDER, CSD, TFU and CACHE_CLEAN.

In order to calculate the amount of active time per client, a CPU clock
is used through the function local_clock(). The point where the jobs has
started is marked and is finally compared with the time that the job had
finished.

Moreover, the number of jobs submitted to each queue is also exposed on
fdinfo through the identifier "v3d-jobs-<queue>".

Co-developed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Acked-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230905213416.1290219-3-mcanal@igalia.com
2023-11-06 10:09:23 -03:00