Pull vhost fix from Michael Tsirkin:
"A single fix for a regression in vhost"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost: initialize vq->nheads properly
Pull drm fixes from Dave Airlie:
"This is the fixes that built up in the merge window, mostly amdgpu and
xe with one i915 display fix, seems like things are pretty good for
rc1.
i915:
- DP LPFS fixes
xe:
- SRIOV: PF fixes and removal of need of module param
- Fix driver unbind around Devcoredump
- Mark xe driver as BROKEN if kernel page size is not 4kB
amdgpu:
- GC 9.5.0 fixes
- SMU fix
- DCE 6 DC fixes
- mmhub client ID fixes
- VRR fix
- Backlight fix
- UserQ fix
- Legacy reset fix
- Misc fixes
amdkfd:
- CRIU fix
- Debugfs fix"
* tag 'drm-next-2025-08-08' of https://gitlab.freedesktop.org/drm/kernel: (28 commits)
drm/amdgpu: add missing vram lost check for LEGACY RESET
drm/amdgpu/discovery: fix fw based ip discovery
drm/amdkfd: Destroy KFD debugfs after destroy KFD wq
amdgpu/amdgpu_discovery: increase timeout limit for IFWI init
drm/amdgpu: Update SDMA firmware version check for user queue support
drm/amdgpu: Add NULL check for asic_funcs
drm/amd/display: Revert "drm/amd/display: Fix AMDGPU_MAX_BL_LEVEL value"
drm/amd/display: fix a Null pointer dereference vulnerability
drm/amd/display: Add primary plane to commits for correct VRR handling
drm/amdgpu: update mmhub 3.3 client id mappings
drm/amdgpu: update mmhub 3.0.1 client id mappings
drm/amdgpu: Retain job->vm in amdgpu_job_prepare_job
drm/amd/display: Fix DCE 6.0 and 6.4 PLL programming.
drm/amd/display: Don't overwrite dce60_clk_mgr
drm/amdkfd: Fix checkpoint-restore on multi-xcc
drm/amd: Restore cached manual clock settings during resume
drm/amd: Restore cached power limit during resume
drm/amdgpu: Update external revid for GC v9.5.0
drm/amdgpu: Update supported modes for GC v9.5.0
Mark xe driver as BROKEN if kernel page size is not 4kB
...
Pull fbdev fixes for 6.17-rc1:
- Revert a patch which broke VGA console
- Fix an out-of-bounds access bug which may happen during console
resizing when a console is mapped to a frame buffer
* tag 'fbdev-for-6.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
Revert "vgacon: Add check for vc_origin address range in vgacon_scroll()"
fbdev: Fix vmalloc out-of-bounds write in fast_imageblit
Pull LoongArch updates from Huacai Chen:
- Complete KSave registers definition
- Support the mem=<size> kernel parameter
- Support BPF dynamic modification & trampoline
- Add MMC/SDIO controller nodes in dts
- Some bug fixes and other small changes
* tag 'loongarch-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: vDSO: Remove -nostdlib complier flag
LoongArch: dts: Add eMMC/SDIO controller support to Loongson-2K2000
LoongArch: dts: Add SDIO controller support to Loongson-2K1000
LoongArch: dts: Add SDIO controller support to Loongson-2K0500
LoongArch: BPF: Set bpf_jit_bypass_spec_v1/v4()
LoongArch: BPF: Fix the tailcall hierarchy
LoongArch: BPF: Fix jump offset calculation in tailcall
LoongArch: BPF: Add struct ops support for trampoline
LoongArch: BPF: Add basic bpf trampoline support
LoongArch: BPF: Add dynamic code modification support
LoongArch: BPF: Rename and refactor validate_code()
LoongArch: Add larch_insn_gen_{beq,bne} helpers
LoongArch: Don't use %pK through printk() in unwinder
LoongArch: Avoid in-place string operation on FDT content
LoongArch: Support mem=<size> kernel parameter
LoongArch: Make relocate_new_kernel_size be a .quad value
LoongArch: Complete KSave registers definition
Pull input updates from Dmitry Torokhov:
- updates to several drivers consuming GPIO APIs to use setters
returning error codes
- an infrastructure allowing to define "overlays" for touchscreens
carving out regions implementing buttons and other elements from a
bigger sensors and a corresponding update to st1232 driver
- an update to AT/PS2 keyboard driver to map F13-F24 by default
- Samsung keypad driver got a facelift
- evdev input handler will now bind to all devices using EV_SYN event
instead of abusing id->driver_info
- two new sub-drivers implementing 1A (capacitive buttons) and 21
(forcepad button) functions in Synaptics RMI driver
- support for polling mode in Goodix touchscreen driver
- support for support for FocalTech FT8716 in edt-ft5x06 driver
- support for MT6359 in mtk-pmic-keys driver
- removal of pcf50633-input driver since platform it was used on is
gone
- new definitions for game controller "grip" buttons (BTN_GRIP*) and
corresponding changes to xpad and hid-steam controller drivers
- a new definition for "performance" key
* tag 'input-for-v6.17-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (38 commits)
HID: hid-steam: Use new BTN_GRIP* buttons
Input: add keycode for performance mode key
Input: max77693 - convert to atomic pwm operation
Input: st1232 - add touch-overlay handling
dt-bindings: input: touchscreen: st1232: add touch-overlay example
Input: touch-overlay - add touchscreen overlay handling
dt-bindings: touchscreen: add touch-overlay property
Input: atkbd - correctly map F13 - F24
Input: xpad - use new BTN_GRIP* buttons
Input: Add and document BTN_GRIP*
Input: xpad - change buttons the D-Pad gets mapped as to BTN_DPAD_*
Documentation: Fix capitalization of XBox -> Xbox
Input: synaptics-rmi4 - add support for F1A
dt-bindings: input: syna,rmi4: Document F1A function
Input: synaptics-rmi4 - add support for Forcepads (F21)
Input: mtk-pmic-keys - add support for MT6359 PMIC keys
Input: remove special handling of id->driver_info when matching
Input: evdev - switch matching to EV_SYN
Input: samsung-keypad - use BIT() and GENMASK() where appropriate
Input: samsung-keypad - use per-chip parameters
...
Pull ipmi updates from Corey Minyard:
"Some small fixes for the IPMI driver
Nothing huge, some rate limiting on logs, a strncpy fix where the
source and destination could be the same, and removal of some unused
cruft"
* tag 'for-linus-6.17-1' of https://github.com/cminyard/linux-ipmi:
ipmi: Use dev_warn_ratelimited() for incorrect message warnings
char: ipmi: remove redundant variable 'type' and check
ipmi: Fix strcpy source and destination the same
Pull rdma fix from Jason Gunthorpe:
"Single fix to correct the iov_iter construction in soft iwarp. This
avoids blktest crashes with recent changes to the allocators"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/siw: Fix the sendmsg byte count in siw_tcp_sendpages
Pull VFIO updates from Alex Williamson:
- Fix imbalance where the no-iommu/cdev device path skips too much on
open, failing to increment a reference, but still decrements the
reference on close. Add bounds checking to prevent such underflows
(Jacob Pan)
- Fill missing detach_ioas op for pds_vfio_pci, fixing probe failure
when used with IOMMUFD (Brett Creeley)
- Split SR-IOV VFs to separate dev_set, avoiding unnecessary
serialization between VFs that appear on the same bus (Alex
Williamson)
- Fix a theoretical integer overflow is the mlx5-vfio-pci variant
driver (Artem Sadovnikov)
- Implement missing VF token checking support via vfio cdev/IOMMUFD
interface (Jason Gunthorpe)
- Update QAT vfio-pci variant driver to claim latest VF devices
(Małgorzata Mielnik)
- Add a cond_resched() call to avoid holding the CPU too long during
DMA mapping operations (Keith Busch)
* tag 'vfio-v6.17-rc1-v2' of https://github.com/awilliam/linux-vfio:
vfio/type1: conditional rescheduling while pinning
vfio/qat: add support for intel QAT 6xxx virtual functions
vfio/qat: Remove myself from VFIO QAT PCI driver maintainers
vfio/pci: Do vf_token checks for VFIO_DEVICE_BIND_IOMMUFD
vfio/mlx5: fix possible overflow in tracking max message size
vfio/pci: Separate SR-IOV VF dev_set
vfio/pds: Fix missing detach_ioas op
vfio: Prevent open_count decrement to negative
vfio: Fix unbalanced vfio_df_close call in no-iommu mode
We only need the fw based discovery table for sysfs. No
need to parse it. Additionally parsing some of the board
specific tables may result in incorrect data on some boards.
just load the binary and don't parse it on those boards.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4441
Fixes: 80a0e82829 ("drm/amdgpu/discovery: optionally use fw based ip discovery")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 62eedd150f)
Cc: stable@vger.kernel.org
Since KFD proc content was moved to kernel debugfs, we can't destroy KFD
debugfs before kfd_process_destroy_wq. Move kfd_process_destroy_wq prior
to kfd_debugfs_fini to fix a kernel NULL pointer problem. It happens
when /sys/kernel/debug/kfd was already destroyed in kfd_debugfs_fini but
kfd_process_destroy_wq calls kfd_debugfs_remove_process. This line
debugfs_remove_recursive(entry->proc_dentry);
tries to remove /sys/kernel/debug/kfd/proc/<pid> while
/sys/kernel/debug/kfd is already gone. It hangs the kernel by kernel
NULL pointer.
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Eric Huang <jinhuieric.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0333052d90)
Cc: stable@vger.kernel.org
Pull btrfs fix from David Sterba:
"A single btrfs commit. It fixes a problem that people started to hit
since 6.15.3 during log replay (e.g. after a crash).
The bug is old but got more likely to happen since commit
5e85262e54 ("btrfs: fix fsync of files with no hard links not
persisting deletion") got backported to stable (6.15 only)"
* tag 'for-6.17-fix-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix log tree replay failure due to file with 0 links and extents
Pull more SCSI updates from James Bottomley:
"This is mostly fixes and cleanups and code reworks that trickled in
across the merge window and the weeks leading up. The only substantive
update is the Mediatek ufs driver which accounts for the bulk of the
additions"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (37 commits)
scsi: libsas: Use a bool for sas_deform_port() second argument
scsi: libsas: Move declarations of internal functions to sas_internal.h
scsi: libsas: Make sas_get_ata_info() static
scsi: libsas: Simplify sas_ata_wait_eh()
scsi: libsas: Refactor dev_is_sata()
scsi: sd: Make sd shutdown issue START STOP UNIT appropriately
scsi: arm64: dts: mediatek: mt8195: Add UFSHCI node
scsi: dt-bindings: mediatek,ufs: add MT8195 compatible and update clock nodes
scsi: dt-bindings: mediatek,ufs: Add ufs-disable-mcq flag for UFS host
scsi: ufs: ufs-mediatek: Add UFS host support for MT8195 SoC
scsi: ufs: ufs-pci: Remove control of UIC Completion interrupt for Intel MTL
scsi: ufs: core: Do not write interrupt enable register unnecessarily
scsi: ufs: core: Set and clear UIC Completion interrupt as needed
scsi: ufs: core: Remove duplicated code in ufshcd_send_bsg_uic_cmd()
scsi: ufs: core: Move ufshcd_enable_intr() and ufshcd_disable_intr()
scsi: ufs: ufs-pci: Remove UFS PCI driver's ->late_init() call back
scsi: ufs: ufs-pci: Fix default runtime and system PM levels
scsi: ufs: ufs-pci: Fix hibernate state transition for Intel MTL-like host controllers
scsi: ufs: host: mediatek: Support FDE (AES) clock scaling
scsi: ufs: host: mediatek: Support clock scaling with Vcore binding
...
If we log a new inode (not persisted in a past transaction) that has 0
links and extents, then log another inode with an higher inode number, we
end up with failing to replay the log tree with -EINVAL. The steps for
this are:
1) create new file A
2) write some data to file A
3) open an fd on file A
4) unlink file A
5) fsync file A using the previously open fd
6) create file B (has higher inode number than file A)
7) fsync file B
8) power fail before current transaction commits
Now when attempting to mount the fs, the log replay will fail with
-ENOENT at replay_one_extent() when attempting to replay the first
extent of file A. The failure comes when trying to open the inode for
file A in the subvolume tree, since it doesn't exist.
Before commit 5f61b96159 ("btrfs: fix inode lookup error handling
during log replay"), the returned error was -EIO instead of -ENOENT,
since we converted any errors when attempting to read an inode during
log replay to -EIO.
The reason for this is that the log replay procedure fails to ignore
the current inode when we are at the stage LOG_WALK_REPLAY_ALL, our
current inode has 0 links and last inode we processed in the previous
stage has a non 0 link count. In other words, the issue is that at
replay_one_extent() we only update wc->ignore_cur_inode if the current
replay stage is LOG_WALK_REPLAY_INODES.
Fix this by updating wc->ignore_cur_inode whenever we find an inode item
regardless of the current replay stage. This is a simple solution and easy
to backport, but later we can do other alternatives like avoid logging
extents or inode items other than the inode item for inodes with a link
count of 0.
The problem with the wc->ignore_cur_inode logic has been around since
commit f2d72f42d5 ("Btrfs: fix warning when replaying log after fsync
of a tmpfile") but it only became frequent to hit since the more recent
commit 5e85262e54 ("btrfs: fix fsync of files with no hard links not
persisting deletion"), because we stopped skipping inodes with a link
count of 0 when logging, while before the problem would only be triggered
if trying to replay a log tree created with an older kernel which has a
logged inode with 0 links.
A test case for fstests will be submitted soon.
Reported-by: Peter Jung <ptr1337@cachyos.org>
Link: https://lore.kernel.org/linux-btrfs/fce139db-4458-4788-bb97-c29acf6cb1df@cachyos.org/
Reported-by: burneddi <burneddi@protonmail.com>
Link: https://lore.kernel.org/linux-btrfs/lh4W-Lwc0Mbk-QvBhhQyZxf6VbM3E8VtIvU3fPIQgweP_Q1n7wtlUZQc33sYlCKYd-o6rryJQfhHaNAOWWRKxpAXhM8NZPojzsJPyHMf2qY=@protonmail.com/#t
Reported-by: Russell Haley <yumpusamongus@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/598ecc75-eb80-41b3-83c2-f2317fbb9864@gmail.com/
Fixes: f2d72f42d5 ("Btrfs: fix warning when replaying log after fsync of a tmpfile")
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Pull ata fixes from Damien Le Moal:
- Cleanup whitespace in messages in libata-core and the pata_pdc2027x,
pata_macio drivers (Colin)
- Fix ata_to_sense_error() to avoid seeing nonsensical sense data for
rare cases where we fail to get sense data from the drive. The
complementary fix to this is to ensure that we always return the
generic "ABORTED COMMAND" sense data for a failed command for which
we have no status or error fields
- The recent changes to link power management (LPM) which now prevent
the user from attempting to set an LPM policy through the
link_power_management_policy caused some regressions in test
environments because of the error that is now returned when writing
to that attribute when LPM is not supported. To allow users to not
trip on this, introduce the new link_power_management_supported
attribute to allow simple testing of a port/device LPM support (me)
* tag 'ata-6.17-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: pata_pdc2027x: Remove space before newline and abbreviations
ata: pata_macio: Remove space before newline
ata: libata-core: Remove space before newline
ata: libata-sata: Add link_power_management_supported sysfs attribute
ata: libata-scsi: Return aborted command when missing sense and result TF
ata: libata-scsi: Fix ata_to_sense_error() status handling
Pull Kbuild updates from Masahiro Yamada:
"This is the last pull request from me.
I'm grateful to have been able to continue as a maintainer for eight
years. From the next cycle, Nathan and Nicolas will maintain Kbuild.
- Fix a shortcut key issue in menuconfig
- Fix missing rebuild of kheaders
- Sort the symbol dump generated by gendwarfsyms
- Support zboot extraction in scripts/extract-vmlinux
- Migrate gconfig to GTK 3
- Add TAR variable to allow overriding the default tar command
- Hand over Kbuild maintainership"
* tag 'kbuild-v6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (92 commits)
MAINTAINERS: hand over Kbuild maintenance
kheaders: make it possible to override TAR
kbuild: userprogs: use correct linker when mixing clang and GNU ld
kconfig: lxdialog: replace strcpy() with strncpy() in inputbox.c
kconfig: lxdialog: replace strcpy with snprintf in print_autowrap
kconfig: gconf: refactor text_insert_help()
kconfig: gconf: remove unneeded variable in text_insert_msg
kconfig: gconf: use hyphens in signals
kconfig: gconf: replace GtkImageMenuItem with GtkMenuItem
kconfig: gconf: Fix Back button behavior
kconfig: gconf: fix single view to display dependent symbols correctly
scripts: add zboot support to extract-vmlinux
gendwarfksyms: order -T symtypes output by name
gendwarfksyms: use preferred form of sizeof for allocation
kconfig: qconf: confine {begin,end}Group to constructor and destructor
kconfig: qconf: fix ConfigList::updateListAllforAll()
kconfig: add a function to dump all menu entries in a tree-like format
kconfig: gconf: show GTK version in About dialog
kconfig: gconf: replace GtkHPaned and GtkVPaned with GtkPaned
kconfig: gconf: replace GdkColor with GdkRGBA
...
The venus driver fails to check if dev_pm_opp_find_freq_{ceil,floor}()
returns an error pointer before calling dev_pm_opp_put(). This causes
a crash when OPP tables are not present in device tree.
Unable to handle kernel access to user memory outside uaccess routines
at virtual address 000000000000002e
...
pc : dev_pm_opp_put+0x1c/0x4c
lr : core_clks_enable+0x4c/0x16c [venus_core]
Add IS_ERR() checks before calling dev_pm_opp_put() to avoid
dereferencing error pointers.
Fixes: b179234b5e ("media: venus: pm_helpers: use opp-table for the frequency")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull perf fixes from Thomas Gleixner:
"Perf fixes for perf_mmap() reference counting to prevent potential
reference count leaks which are caused by:
- VMA splits, which change the offset or size of a mapping, which
causes perf_mmap_close() to ignore the unmap or unmap the wrong
buffer.
- Several internal issues of perf_mmap(), which can cause reference
count leaks in the perf mmap, corrupt accounting or cause leaks in
perf drivers.
The main fix is to prevent VMA splits by implementing the
[may_]split() callback for vm operations.
The other issues are addressed by rearranging code, early returns on
failure and invocation of cleanups.
Also provide a selftest to validate the fixes.
The reference counting should be converted to refcount_t, but that
requires larger refactoring of the code and will be done once these
fixes are upstream"
* tag 'perf-fixes-27504' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git:
selftests/perf_events: Add a mmap() correctness test
perf/core: Prevent VMA split of buffer mappings
perf/core: Handle buffer mapping fail correctly in perf_mmap()
perf/core: Exit early on perf_mmap() fail
perf/core: Don't leak AUX buffer refcount on allocation failure
perf/core: Preserve AUX buffer allocation failure result
I'm stepping down as the maintainer of Kbuild/Kconfig.
It was enjoyable to refactor and improve the kernel build system,
but due to personal reasons, I believe it's difficult for me to
continue in this role any further.
I discussed this off-list with Nathan and Nicolas, and they have
kindly agreed to take over the maintenance of Kbuild with Odd Fixes.
I'm grateful to them for stepping in.
As for Kconfig, there are currently no designated reviewers, so the
maintainer position will remain vacant for now. I hope someone will
step up to take on the role.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Nicolas Schier <nicolas@fjasle.eu>
Commit 86cdd2fdc4 ("kheaders: make headers archive reproducible")
introduced a number of options specific to GNU tar to the `tar`
invocation in `gen_kheaders.sh` script. This causes the script to fail
to work on systems where `tar` is not GNU tar. This can occur e.g.
on recent Gentoo Linux installations that support using bsdtar from
libarchive instead.
Add a `TAR` make variable to make it possible to override the tar
executable used, e.g. by specifying:
make TAR=gtar
Link: https://bugs.gentoo.org/884061
Reported-by: Sam James <sam@gentoo.org>
Tested-by: Sam James <sam@gentoo.org>
Co-developed-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Michał Górny <mgorny@gentoo.org>
Signed-off-by: Sam James <sam@gentoo.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
The userprogs infrastructure does not expect clang being used with GNU ld
and in that case uses /usr/bin/ld for linking, not the configured $(LD).
This fallback is problematic as it will break when cross-compiling.
Mixing clang and GNU ld is used for example when building for SPARC64,
as ld.lld is not sufficient; see Documentation/kbuild/llvm.rst.
Relax the check around --ld-path so it gets used for all linkers.
Fixes: dfc1b168a8 ("kbuild: userprogs: use correct lld when linking through clang")
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
strcpy() performs no bounds checking and can lead to buffer overflows if
the input string exceeds the destination buffer size. This patch replaces
it with strncpy(), and null terminates the input string.
Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com>
Reviewed-by: Nicolas Schier <nicolas.schier@linux.dev>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
strcpy() does not perform bounds checking and can lead to buffer overflows
if the source string exceeds the destination buffer size. In
print_autowrap(), replace strcpy() with snprintf() to safely copy the
prompt string into the fixed-size tempstr buffer.
Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
A large DMA mapping request can loop through dma address pinning for
many pages. In cases where THP can not be used, the repeated vmf_insert_pfn can
be costly, so let the task reschedule as need to prevent CPU stalls. Failure to
do so has potential harmful side effects, like increased memory pressure
as unrelated rcu tasks are unable to make their reclaim callbacks and
result in OOM conditions.
rcu: INFO: rcu_sched self-detected stall on CPU
rcu: 36-....: (20999 ticks this GP) idle=b01c/1/0x4000000000000000 softirq=35839/35839 fqs=3538
rcu: hardirqs softirqs csw/system
rcu: number: 0 107 0
rcu: cputime: 50 0 10446 ==> 10556(ms)
rcu: (t=21075 jiffies g=377761 q=204059 ncpus=384)
...
<TASK>
? asm_sysvec_apic_timer_interrupt+0x16/0x20
? walk_system_ram_range+0x63/0x120
? walk_system_ram_range+0x46/0x120
? pgprot_writethrough+0x20/0x20
lookup_memtype+0x67/0xf0
track_pfn_insert+0x20/0x40
vmf_insert_pfn_prot+0x88/0x140
vfio_pci_mmap_huge_fault+0xf9/0x1b0 [vfio_pci_core]
__do_fault+0x28/0x1b0
handle_mm_fault+0xef1/0x2560
fixup_user_fault+0xf5/0x270
vaddr_get_pfns+0x169/0x2f0 [vfio_iommu_type1]
vfio_pin_pages_remote+0x162/0x8e0 [vfio_iommu_type1]
vfio_iommu_type1_ioctl+0x1121/0x1810 [vfio_iommu_type1]
? futex_wake+0x1c1/0x260
x64_sys_call+0x234/0x17a0
do_syscall_64+0x63/0x130
? exc_page_fault+0x63/0x130
entry_SYSCALL_64_after_hwframe+0x4b/0x53
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20250715184622.3561598-1-kbusch@meta.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
This was missed during the initial implementation. The VFIO PCI encodes
the vf_token inside the device name when opening the device from the group
FD, something like:
"0000:04:10.0 vf_token=bd8d9d2b-5a5f-4f5a-a211-f591514ba1f3"
This is used to control access to a VF unless there is co-ordination with
the owner of the PF.
Since we no longer have a device name in the cdev path, pass the token
directly through VFIO_DEVICE_BIND_IOMMUFD using an optional field
indicated by VFIO_DEVICE_BIND_FLAG_TOKEN.
Fixes: 5fcc26969a ("vfio: Add VFIO_DEVICE_BIND_IOMMUFD")
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/0-v3-bdd8716e85fe+3978a-vfio_token_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Exercise various mmap(), munmap() and mremap() invocations, which might
cause a perf buffer mapping to be split or truncated.
To avoid hard coding the perf event and having dependencies on
architectures and configuration options, scan through event types in sysfs
and try to open them. On success, try to mmap() and if that succeeds try to
mmap() the AUX buffer.
In case that no AUX buffer supporting event is found, only test the base
buffer mapping. If no mappable event is found or permissions are not
sufficient, skip the tests.
Reserve a PROT_NONE region for both rb and aux tests to allow testing the
case where mremap unmaps beyond the end of a mapped VMA to prevent it from
unmapping unrelated mappings.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Co-developed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
The perf mmap code is careful about mmap()'ing the user page with the
ringbuffer and additionally the auxiliary buffer, when the event supports
it. Once the first mapping is established, subsequent mapping have to use
the same offset and the same size in both cases. The reference counting for
the ringbuffer and the auxiliary buffer depends on this being correct.
Though perf does not prevent that a related mapping is split via mmap(2),
munmap(2) or mremap(2). A split of a VMA results in perf_mmap_open() calls,
which take reference counts, but then the subsequent perf_mmap_close()
calls are not longer fulfilling the offset and size checks. This leads to
reference count leaks.
As perf already has the requirement for subsequent mappings to match the
initial mapping, the obvious consequence is that VMA splits, caused by
resizing of a mapping or partial unmapping, have to be prevented.
Implement the vm_operations_struct::may_split() callback and return
unconditionally -EINVAL.
That ensures that the mapping offsets and sizes cannot be changed after the
fact. Remapping to a different fixed address with the same size is still
possible as it takes the references for the new mapping and drops those of
the old mapping.
Fixes: 45bfb2e504 ("perf: Add AUX area to ring buffer for raw data streams")
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-27504
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: stable@vger.kernel.org
After successful allocation of a buffer or a successful attachment to an
existing buffer perf_mmap() tries to map the buffer read only into the page
table. If that fails, the already set up page table entries are zapped, but
the other perf specific side effects of that failure are not handled. The
calling code just cleans up the VMA and does not invoke perf_mmap_close().
This leaks reference counts, corrupts user->vm accounting and also results
in an unbalanced invocation of event::event_mapped().
Cure this by moving the event::event_mapped() invocation before the
map_range() call so that on map_range() failure perf_mmap_close() can be
invoked without causing an unbalanced event::event_unmapped() call.
perf_mmap_close() undoes the reference counts and eventually frees buffers.
Fixes: b709eb872e ("perf: map pages in advance")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: stable@vger.kernel.org
When perf_mmap() fails to allocate a buffer, it still invokes the
event_mapped() callback of the related event. On X86 this might increase
the perf_rdpmc_allowed reference counter. But nothing undoes this as
perf_mmap_close() is never called in this case, which causes another
reference count leak.
Return early on failure to prevent that.
Fixes: 1e0fb9ec67 ("perf: Add pmu callbacks to track event mapping and unmapping")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: stable@vger.kernel.org
Failure of the AUX buffer allocation leaks the reference count.
Set the reference count to 1 only when the allocation succeeds.
Fixes: 45bfb2e504 ("perf: Add AUX area to ring buffer for raw data streams")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: stable@vger.kernel.org
A recent overhaul sets the return value to 0 unconditionally after the
allocations, which causes reference count leaks and corrupts the user->vm
accounting.
Preserve the AUX buffer allocation failure return value, so that the
subsequent code works correctly.
Fixes: 0983593f32 ("perf/core: Lift event->mmap_mutex in perf_mmap()")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: stable@vger.kernel.org
Ever since commit c2ff29e99a ("siw: Inline do_tcp_sendpages()"),
we have been doing this:
static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset,
size_t size)
[...]
/* Calculate the number of bytes we need to push, for this page
* specifically */
size_t bytes = min_t(size_t, PAGE_SIZE - offset, size);
/* If we can't splice it, then copy it in, as normal */
if (!sendpage_ok(page[i]))
msg.msg_flags &= ~MSG_SPLICE_PAGES;
/* Set the bvec pointing to the page, with len $bytes */
bvec_set_page(&bvec, page[i], bytes, offset);
/* Set the iter to $size, aka the size of the whole sendpages (!!!) */
iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
try_page_again:
lock_sock(sk);
/* Sendmsg with $size size (!!!) */
rv = tcp_sendmsg_locked(sk, &msg, size);
This means we've been sending oversized iov_iters and tcp_sendmsg calls
for a while. This has a been a benign bug because sendpage_ok() always
returned true. With the recent slab allocator changes being slowly
introduced into next (that disallow sendpage on large kmalloc
allocations), we have recently hit out-of-bounds crashes, due to slight
differences in iov_iter behavior between the MSG_SPLICE_PAGES and
"regular" copy paths:
(MSG_SPLICE_PAGES)
skb_splice_from_iter
iov_iter_extract_pages
iov_iter_extract_bvec_pages
uses i->nr_segs to correctly stop in its tracks before OoB'ing everywhere
skb_splice_from_iter gets a "short" read
(!MSG_SPLICE_PAGES)
skb_copy_to_page_nocache copy=iov_iter_count
[...]
copy_from_iter
/* this doesn't help */
if (unlikely(iter->count < len))
len = iter->count;
iterate_bvec
... and we run off the bvecs
Fix this by properly setting the iov_iter's byte count, plus sending the
correct byte count to tcp_sendmsg_locked.
Link: https://patch.msgid.link/r/20250729120348.495568-1-pfalcato@suse.de
Cc: stable@vger.kernel.org
Fixes: c2ff29e99a ("siw: Inline do_tcp_sendpages()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202507220801.50a7210-lkp@intel.com
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Pedro Falcato <pfalcato@suse.de>
Acked-by: Bernard Metzler <bernard.metzler@linux.dev>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Pull ARM update from Russell King:
"Just one development update this time:
- Finish removing Coresight support"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
ARM: 9449/1: coresight: Finish removal of Coresight support in arch/arm/kernel
Pull exfat updates from Namjae Jeon:
- Use generic_write_sync instead of vfs_fsync_range in exfat_file_write_iter.
It will fix an issue where fdatasync would be set incorrectly.
- Fix potential infinite loop by the self-linked chain.
* tag 'exfat-for-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
exfat: add cluster chain loop check for dir
exfat: fdatasync flag should be same like generic_write_sync()
Pull more MM updates from Andrew Morton:
"Significant patch series in this pull request:
- "mseal cleanups" (Lorenzo Stoakes)
Some mseal cleaning with no intended functional change.
- "Optimizations for khugepaged" (David Hildenbrand)
Improve khugepaged throughput by batching PTE operations for large
folios. This gain is mainly for arm64.
- "x86: enable EXECMEM_ROX_CACHE for ftrace and kprobes" (Mike Rapoport)
A bugfix, additional debug code and cleanups to the execmem code.
- "mm/shmem, swap: bugfix and improvement of mTHP swap in" (Kairui Song)
Bugfixes, cleanups and performance improvememnts to the mTHP swapin
code"
* tag 'mm-stable-2025-08-03-12-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (38 commits)
mm: mempool: fix crash in mempool_free() for zero-minimum pools
mm: correct type for vmalloc vm_flags fields
mm/shmem, swap: fix major fault counting
mm/shmem, swap: rework swap entry and index calculation for large swapin
mm/shmem, swap: simplify swapin path and result handling
mm/shmem, swap: never use swap cache and readahead for SWP_SYNCHRONOUS_IO
mm/shmem, swap: tidy up swap entry splitting
mm/shmem, swap: tidy up THP swapin checks
mm/shmem, swap: avoid redundant Xarray lookup during swapin
x86/ftrace: enable EXECMEM_ROX_CACHE for ftrace allocations
x86/kprobes: enable EXECMEM_ROX_CACHE for kprobes allocations
execmem: drop writable parameter from execmem_fill_trapping_insns()
execmem: add fallback for failures in vmalloc(VM_ALLOW_HUGE_VMAP)
execmem: move execmem_force_rw() and execmem_restore_rox() before use
execmem: rework execmem_cache_free()
execmem: introduce execmem_alloc_rw()
execmem: drop unused execmem_update_copy()
mm: fix a UAF when vma->mm is freed after vma->vm_refcnt got dropped
mm/rmap: add anon_vma lifetime debug check
mm: remove mm/io-mapping.c
...
Since $(LD) is directly used, hence -nostdlib is unneeded, MIPS has
removed this, we should remove it too.
bdbf2038fb ("MIPS: VDSO: remove -nostdlib compiler flag").
In fact, other architectures also use $(LD) now.
fe00e50b2d ("ARM: 8858/1: vdso: use $(LD) instead of $(CC) to link VDSO")
691efbedc6 ("arm64: vdso: use $(LD) instead of $(CC) to link VDSO")
2ff906994b ("MIPS: VDSO: Use $(LD) instead of $(CC) to link VDSO")
2b2a25845d ("s390/vdso: Use $(LD) instead of $(CC) to link vDSO")
Cc: stable@vger.kernel.org
Reviewed-by: Yanteng Si <siyanteng@cqsoftware.com.cn>
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
The Loongson-2K2000 integrates one eMMC controller and one SDIO controller.
The module is supported now, enable it.
Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
The Loongson-2K1000 integrates one SDIO controller for SD storage cards
and SDIO cards.
The module is supported now, enable it.
Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
The Loongson-2K0500 integrates two SDIO controllers for SD storage cards
and SDIO cards, supporting SD storage card boot.
The module is supported now, enable it.
Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
JITs can set bpf_jit_bypass_spec_v1/v4() if they want the verifier to
skip analysis/patching for the respective vulnerability, it is safe to
set both bpf_jit_bypass_spec_v1/v4(), because there is no speculation
barrier instruction for LoongArch.
Suggested-by: Luis Gerhorst <luis.gerhorst@fau.de>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>