2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

20120 Commits

Author SHA1 Message Date
Ingo Molnar
60675d4ca1 Merge branch 'linus' into x86/mm, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-12-20 10:25:44 +01:00
Linus Torvalds
37cb0c76ac hyperv-fixes for v6.13-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmdhxFsTHHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXmGuB/0ZsXKBh9eCZnmVSZwdVkcSeE/HEHHc
 K5qeIxjqirAzajvLbAtgnTQpi5w0kLYYOT/X/8b6L1nnefcfk1cz5cbyQKxXx7J6
 Zuob5KpjX7FtJbMc7QQtzrLyApw2k2OYe1QPCRxuWkYzQjaus/6kM27ivjjYFqDs
 Qv6IKCcebomAbJTN8IwF38KsFQ2JS3moWr+kcVyna7+Kg1ymEto2QlnI01Z+M1n7
 BUYvLgYUD/5HzbNjqC0HysUuX5PsVQ45OsmTgjkvs/XzLcAGGEHJGrJHJkEccx7H
 AsiNzHoB6wUrFhCAX7IPCSE0g9vjtwI21ozqYllzOTa/Q5KnUE+q9fhJ
 =Fw3U
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-fixes-signed-20241217' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyperv fixes from Wei Liu:

 - Various fixes to Hyper-V tools in the kernel tree (Dexuan Cui, Olaf
   Hering, Vitaly Kuznetsov)

 - Fix a bug in the Hyper-V TSC page based sched_clock() (Naman Jain)

 - Two bug fixes in the Hyper-V utility functions (Michael Kelley)

 - Convert open-coded timeouts to secs_to_jiffies() in Hyper-V drivers
   (Easwar Hariharan)

* tag 'hyperv-fixes-signed-20241217' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  tools/hv: reduce resource usage in hv_kvp_daemon
  tools/hv: add a .gitignore file
  tools/hv: reduce resouce usage in hv_get_dns_info helper
  hv/hv_kvp_daemon: Pass NIC name to hv_get_dns_info as well
  Drivers: hv: util: Avoid accessing a ringbuffer not initialized yet
  Drivers: hv: util: Don't force error code to ENODEV in util_probe()
  tools/hv: terminate fcopy daemon if read from uio fails
  drivers: hv: Convert open-coded timeouts to secs_to_jiffies()
  tools: hv: change permissions of NetworkManager configuration file
  x86/hyperv: Fix hv tsc page based sched_clock for hibernation
  tools: hv: Fix a complier warning in the fcopy uio daemon
2024-12-18 09:55:55 -08:00
Dave Hansen
e5d3a57891 x86/cpu: Make all all CPUID leaf names consistent
The leaf names are not consistent.  Give them all a CPUID_LEAF_ prefix
for consistency and vertical alignment.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Dave Jiang <dave.jiang@intel.com> # for ioatdma bits
Link: https://lore.kernel.org/all/20241213205040.7B0C3241%40davehans-spike.ostc.intel.com
2024-12-18 06:17:46 -08:00
Dave Hansen
588e148d8b x86/fpu: Remove unnecessary CPUID level check
The CPUID level dependency table will entirely zap X86_FEATURE_XSAVE
if the CPUID level is too low.  This code is unreachable.  Kill it.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Chang S. Bae <chang.seok.bae@intel.com>
Link: https://lore.kernel.org/all/20241213205038.6E71F9A4%40davehans-spike.ostc.intel.com
2024-12-18 06:17:45 -08:00
Dave Hansen
754aaac3bb x86/fpu: Move CPUID leaf definitions to common code
Move the XSAVE-related CPUID leaf definitions to common code.  Then,
use the new definition to remove the last magic number from the CPUID
level dependency table.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/all/20241213205037.43C57CDE%40davehans-spike.ostc.intel.com
2024-12-18 06:17:42 -08:00
Dave Hansen
e558eadf6b x86/tsc: Remove CPUID "frequency" leaf magic numbers.
All the code that reads the CPUID frequency information leaf hard-codes
a magic number.  Give it a symbolic name and use it.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/all/20241213205036.4397658F%40davehans-spike.ostc.intel.com
2024-12-18 06:17:40 -08:00
Dave Hansen
030c15b561 x86/tsc: Move away from TSC leaf magic numbers
The TSC code has a bunch of hard-coded references to leaf 0x15.  Change
them over to the symbolic name.  Also zap the 'ART_CPUID_LEAF' definition.
It was a duplicate of 'CPUID_TSC_LEAF'.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20241213205034.B79D6224%40davehans-spike.ostc.intel.com
2024-12-18 06:17:38 -08:00
Dave Hansen
5d82d8e0a9 x86/cpu: Refresh DCA leaf reading code
The DCA leaf number is also hard-coded in the CPUID level dependency
table. Move its definition to common code and use it.

While at it, fix up the naming and types in the probe code.  All
CPUID data is provided in 32-bit registers, not 'unsigned long'.
Also stop referring to "level_9".  Move away from test_bit()
because the type is no longer an 'unsigned long'.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/all/20241213205032.476A30FE%40davehans-spike.ostc.intel.com
2024-12-18 06:17:34 -08:00
Dave Hansen
262fba5570 x86/cpu: Remove unnecessary MwAIT leaf checks
The CPUID leaf dependency checker will remove X86_FEATURE_MWAIT if
the CPUID level is below the required level (CPUID_MWAIT_LEAF).
Thus, if you check X86_FEATURE_MWAIT you do not need to also
check the CPUID level.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20241213205030.9B42B458%40davehans-spike.ostc.intel.com
2024-12-18 06:17:30 -08:00
Dave Hansen
8bd6821c9c x86/cpu: Use MWAIT leaf definition
The leaf-to-feature dependency array uses hard-coded leaf numbers.
Use the new common header definition for the MWAIT leaf.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/all/20241213205029.5B055D6E%40davehans-spike.ostc.intel.com
2024-12-18 06:17:28 -08:00
Dave Hansen
497f702846 x86/cpu: Move MWAIT leaf definition to common header
Begin constructing a common place to keep all CPUID leaf definitions.
Move CPUID_MWAIT_LEAF to the CPUID header and include it where
needed.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/all/20241213205028.EE94D02A%40davehans-spike.ostc.intel.com
2024-12-18 06:17:24 -08:00
Dave Hansen
5366d8965d x86/cpu: Remove 'x86_cpu_desc' infrastructure
All the users of 'x86_cpu_desc' are gone.  Zap it from the tree.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20241213185133.AF0BF2BC%40davehans-spike.ostc.intel.com
2024-12-18 06:09:12 -08:00
Dave Hansen
f3f3251526 x86/cpu: Move AMD erratum 1386 table over to 'x86_cpu_id'
The AMD erratum 1386 detection code uses and old style 'x86_cpu_desc'
table. Replace it with 'x86_cpu_id' so the old style can be removed.

I did not create a new helper macro here. The new table is certainly
more noisy than the old and it can be improved on. But I was hesitant
to create a new macro just for a single site that is only two ugly
lines in the end.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20241213185132.07555E1D%40davehans-spike.ostc.intel.com
2024-12-18 06:09:04 -08:00
Dave Hansen
85b08180df x86/cpu: Expose only stepping min/max interface
The x86_match_cpu() infrastructure can match CPU steppings. Since
there are only 16 possible steppings, the matching infrastructure goes
all out and stores the stepping match as a bitmap. That means it can
match any possible steppings in a single list entry. Fun.

But it exposes this bitmap to each of the X86_MATCH_*() helpers when
none of them really need a bitmap. It makes up for this by exporting a
helper (X86_STEPPINGS()) which converts a contiguous stepping range
into the bitmap which every single user leverages.

Instead of a bitmap, have the main helper for this sort of thing
(X86_MATCH_VFM_STEPS()) just take a stepping range. This ends up
actually being even more compact than before.

Leave the helper in place (renamed to __X86_STEPPINGS()) to make it
more clear what is going on instead of just having a random GENMASK()
in the middle of an already complicated macro.

One oddity that I hit was this macro:

       X86_MATCH_VFM_STEPS(vfm, X86_STEPPING_MIN, max_stepping, issues)

It *could* have been converted over to take a min/max stepping value
for each entry. But that would have been a bit too verbose and would
prevent the one oddball in the list (INTEL_COMETLAKE_L stepping 0)
from sticking out.

Instead, just have it take a *maximum* stepping and imply that the match
is from 0=>max_stepping. This is functional for all the cases now and
also retains the nice property of having INTEL_COMETLAKE_L stepping 0
stick out like a sore thumb.

skx_cpuids[] is goofy. It uses the stepping match but encodes all
possible steppings. Just use a normal, non-stepping match helper.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20241213185129.65527B2A%40davehans-spike.ostc.intel.com
2024-12-17 16:14:49 -08:00
Dave Hansen
b8e10c86e6 x86/cpu: Introduce new microcode matching helper
The 'x86_cpu_id' and 'x86_cpu_desc' structures are very similar and
need to be consolidated.  There is a microcode version matching
function for 'x86_cpu_desc' but not 'x86_cpu_id'.

Create one for 'x86_cpu_id'.

This essentially just leverages the x86_cpu_id->driver_data field
to replace the less generic x86_cpu_desc->x86_microcode_rev field.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20241213185128.8F24EEFC%40davehans-spike.ostc.intel.com
2024-12-17 16:14:39 -08:00
Juergen Gross
7fa0da5373 x86/xen: remove hypercall page
The hypercall page is no longer needed. It can be removed, as from the
Xen perspective it is optional.

But, from Linux's perspective, it removes naked RET instructions that
escape the speculative protections that Call Depth Tracking and/or
Untrain Ret are trying to achieve.

This is part of XSA-466 / CVE-2024-53241.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2024-12-17 08:23:42 +01:00
Tom Lendacky
4972808d6f x86/sev: Require the RMPREAD instruction after Zen4
Limit usage of the non-architectural RMP format to Zen3/Zen4 processors.
The RMPREAD instruction, with architectural defined output, is available
and should be used for RMP access beyond Zen4.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikunj A Dadhania <nikunj@amd.com>
Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Reviewed-by: Ashish Kalra <ashish.kalra@amd.com>
Link: https://lore.kernel.org/r/5be0093e091778a151266ea853352f62f838eb99.1733172653.git.thomas.lendacky@amd.com
2024-12-14 10:55:28 +01:00
Juergen Gross
0ef8047b73 x86/static-call: provide a way to do very early static-call updates
Add static_call_update_early() for updating static-call targets in
very early boot.

This will be needed for support of Xen guest type specific hypercall
functions.

This is part of XSA-466 / CVE-2024-53241.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Co-developed-by: Peter Zijlstra <peterz@infradead.org>
Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com>
2024-12-13 09:28:32 +01:00
Juergen Gross
efbcd61d9b x86: make get_cpu_vendor() accessible from Xen code
In order to be able to differentiate between AMD and Intel based
systems for very early hypercalls without having to rely on the Xen
hypercall page, make get_cpu_vendor() non-static.

Refactor early_cpu_init() for the same reason by splitting out the
loop initializing cpu_devs() into an externally callable function.

This is part of XSA-466 / CVE-2024-53241.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2024-12-13 09:28:10 +01:00
Tony Luck
8e931105ac x86/resctrl: Add write option to "mba_MBps_event" file
The "mba_MBps" mount option provides an alternate method to control memory
bandwidth. Instead of specifying allowable bandwidth as a percentage of
maximum possible, the user provides a MiB/s limit value.

There is a file in each CTRL_MON group directory that shows the event
currently in use.

Allow writing that file to choose a different event.

A user can choose any of the memory bandwidth monitoring events listed in
/sys/fs/resctrl/info/L3_mon/mon_features independently for each CTRL_MON group
by writing to each of the "mba_MBps_event" files.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20241206163148.83828-8-tony.luck@intel.com
2024-12-12 11:27:37 +01:00
Tony Luck
f5cd0e316f x86/resctrl: Add "mba_MBps_event" file to CTRL_MON directories
The "mba_MBps" mount option provides an alternate method to control memory
bandwidth. Instead of specifying allowable bandwidth as a percentage of
maximum possible, the user provides a MiB/s limit value.

In preparation to allow the user to pick the memory bandwidth monitoring event
used as input to the feedback loop, provide a file in each CTRL_MON group
directory that shows the event currently in use. Note that this file is only
visible when the "mba_MBps" mount option is in use.

Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20241206163148.83828-7-tony.luck@intel.com
2024-12-12 11:27:28 +01:00
Raag Jadav
3560a023a9 x86/cpu: Fix typo in x86_match_cpu()'s doc
Fix typo in x86_match_cpu()'s description.

  [ bp: Massage commit message. ]

Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20241030065804.407793-1-raag.jadav@intel.com
2024-12-10 21:40:02 +01:00
Ingo Molnar
05453d36a2 Merge branch 'linus' into x86/cleanups, to resolve conflict
These two commits interact:

 upstream:     73da582a47 ("x86/cpu/topology: Remove limit of CPUs due to disabled IO/APIC")
 x86/cleanups: 13148e22c1 ("x86/apic: Remove "disablelapic" cmdline option")

Resolve it.

 Conflicts:
	arch/x86/kernel/cpu/topology.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-12-10 19:33:03 +01:00
Borislav Petkov (AMD)
13148e22c1 x86/apic: Remove "disablelapic" cmdline option
The convention is "no<something>" and there already is "nolapic". Drop
the disable one.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20241202190011.11979-2-bp@kernel.org
2024-12-10 18:55:08 +01:00
Borislav Petkov (AMD)
ab0e7f2076 Documentation: Merge x86-specific boot options doc into kernel-parameters.txt
Documentation/arch/x86/x86_64/boot-options.rst is causing unnecessary
confusion by being a second place where one can put x86 boot options.
Move them into the main one.

Drop removed ones like "acpi=ht", while at it.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20241202190011.11979-1-bp@kernel.org
2024-12-10 18:25:40 +01:00
Tony Luck
141cb5c482 x86/resctrl: Make mba_sc use total bandwidth if local is not supported
The default input measurement to the mba_sc feedback loop for memory bandwidth
control when the user mounts with the "mba_MBps" option is the local bandwidth
event. But some systems may not support a local bandwidth event.

When local bandwidth event is not supported, check for support of total
bandwidth and use that instead.

Relax the mount option check to allow use of the "mba_MBps" option for systems
when only total bandwidth monitoring is supported. Also update the error
message.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20241206163148.83828-6-tony.luck@intel.com
2024-12-10 16:15:14 +01:00
Tony Luck
2c272fadb5 x86/resctrl: Compute memory bandwidth for all supported events
Switching between local and total memory bandwidth events as the input
to the mba_sc feedback loop would be cumbersome and take effect slowly
in the current implementation as the bandwidth is only known after two
consecutive readings of the same event.

Compute the bandwidth for all supported events. This doesn't add
significant overhead and will make changing which event is used
simple.

Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20241206163148.83828-5-tony.luck@intel.com
2024-12-10 16:13:48 +01:00
Ard Biesheuvel
35aafa1d41 x86/boot/64: Fix spurious undefined reference when CONFIG_X86_5LEVEL=n, on GCC-12
In __startup_64(), the bool 'la57' can only assume the 'true' value if
CONFIG_X86_5LEVEL is enabled in the build, and generally, the compiler
can make this inference at build time, and elide any references to the
symbol 'level4_kernel_pgt', which may be undefined if 'la57' is false.

As it turns out, GCC 12 gets this wrong sometimes, and gives up with a
build error:

   ld: arch/x86/kernel/head64.o: in function `__startup_64':
   head64.c:(.head.text+0xbd): undefined reference to `level4_kernel_pgt'

even though the reference is in unreachable code. Fix this by
duplicating the IS_ENABLED(CONFIG_X86_5LEVEL) in the conditional that
tests the value of 'la57'.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20241209094105.762857-2-ardb+git@google.com
Closes: https://lore.kernel.org/oe-kbuild-all/202412060403.efD8Kgb7-lkp@intel.com/
2024-12-10 11:16:32 +01:00
Tony Luck
481d363748 x86/resctrl: Modify update_mba_bw() to use per CTRL_MON group event
update_mba_bw() hard codes use of the memory bandwidth local event which
prevents more flexible options from being deployed.

Change this function to use the event specified in the rdtgroup that is
being processed.

Mount time checks for the "mba_MBps" option ensure that local memory
bandwidth is enabled. So drop the redundant is_mbm_local_enabled() check.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20241206163148.83828-4-tony.luck@intel.com
2024-12-10 11:15:19 +01:00
Tony Luck
3b49c37a2f x86/resctrl: Prepare for per-CTRL_MON group mba_MBps control
Resctrl uses local memory bandwidth event as input to the feedback loop when
the mba_MBps mount option is used. This means that this mount option cannot be
used on systems that only support monitoring of total bandwidth.

Prepare to allow users to choose the input event independently for each
CTRL_MON group by adding a global variable "mba_mbps_default_event" used to
set the default event for each CTRL_MON group, and a new field
"mba_mbps_event" in struct rdtgroup to track which event is used for each
CTRL_MON group.

Notes:

1) Both of these are only used when the user mounts the filesystem with the
   "mba_MBps" option.
2) Only check for support of local bandwidth event when initializing
   mba_mbps_default_event. Support for total bandwidth event can be added
   after other routines in resctrl have been updated to handle total bandwidth
   event.

  [ bp: Move mba_mbps_default_event extern into the arch header. ]

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20241206163148.83828-3-tony.luck@intel.com
2024-12-10 11:13:48 +01:00
Babu Moger
2937f9c361 x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags
thread_throttle_mode_init() and mbm_config_rftype_init() both initialize
fflags for resctrl files.

Adding new files will involve adding another function to initialize
the fflags. This can be simplified by adding a new function
resctrl_file_fflags_init() and passing the file name and flags
to be initialized.

Consolidate fflags initialization into resctrl_file_fflags_init() and
remove thread_throttle_mode_init() and mbm_config_rftype_init().

  [ Tony: Drop __init attribute so resctrl_file_fflags_init() can be used at
    run time. ]

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20241206163148.83828-2-tony.luck@intel.com
2024-12-09 21:37:01 +01:00
Frederic Weisbecker
135eef38d7 x86/resctrl: Use kthread_run_on_cpu()
Use the proper API instead of open coding it.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20240807160228.26206-3-frederic@kernel.org
2024-12-09 20:19:48 +01:00
Naman Jain
bcc80dec91 x86/hyperv: Fix hv tsc page based sched_clock for hibernation
read_hv_sched_clock_tsc() assumes that the Hyper-V clock counter is
bigger than the variable hv_sched_clock_offset, which is cached during
early boot, but depending on the timing this assumption may be false
when a hibernated VM starts again (the clock counter starts from 0
again) and is resuming back (Note: hv_init_tsc_clocksource() is not
called during hibernation/resume); consequently,
read_hv_sched_clock_tsc() may return a negative integer (which is
interpreted as a huge positive integer since the return type is u64)
and new kernel messages are prefixed with huge timestamps before
read_hv_sched_clock_tsc() grows big enough (which typically takes
several seconds).

Fix the issue by saving the Hyper-V clock counter just before the
suspend, and using it to correct the hv_sched_clock_offset in
resume. This makes hv tsc page based sched_clock continuous and ensures
that post resume, it starts from where it left off during suspend.
Override x86_platform.save_sched_clock_state and
x86_platform.restore_sched_clock_state routines to correct this as soon
as possible.

Note: if Invariant TSC is available, the issue doesn't happen because
1) we don't register read_hv_sched_clock_tsc() for sched clock:
See commit e5313f1c54 ("clocksource/drivers/hyper-v: Rework
clocksource and sched clock setup");
2) the common x86 code adjusts TSC similarly: see
__restore_processor_state() ->  tsc_verify_tsc_adjust(true) and
x86_platform.restore_sched_clock_state().

Cc: stable@vger.kernel.org
Fixes: 1349401ff1 ("clocksource/drivers/hyper-v: Suspend/resume Hyper-V clocksource for hibernation")
Co-developed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Link: https://lore.kernel.org/r/20240917053917.76787-1-namjain@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20240917053917.76787-1-namjain@linux.microsoft.com>
2024-12-09 18:42:42 +00:00
Damien Le Moal
aeb6893761 x86: Fix build regression with CONFIG_KEXEC_JUMP enabled
Build 6.13-rc12 for x86_64 with gcc 14.2.1 fails with the error:

  ld: vmlinux.o: in function `virtual_mapped':
  linux/arch/x86/kernel/relocate_kernel_64.S:249:(.text+0x5915b): undefined reference to `saved_context_gdt_desc'

when CONFIG_KEXEC_JUMP is enabled.

This was introduced by commit 07fa619f2a ("x86/kexec: Restore GDT on
return from ::preserve_context kexec") which introduced a use of
saved_context_gdt_desc without a declaration for it.

Fix that by including asm/asm-offsets.h where saved_context_gdt_desc
is defined (indirectly in include/generated/asm-offsets.h which
asm/asm-offsets.h includes).

Fixes: 07fa619f2a ("x86/kexec: Restore GDT on return from ::preserve_context kexec")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Closes: https://lore.kernel.org/oe-kbuild-all/202411270006.ZyyzpYf8-lkp@intel.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-12-09 10:13:28 -08:00
Kirill A. Shutemov
dd4059634d x86/mtrr: Rename mtrr_overwrite_state() to guest_force_mtrr_state()
Rename the helper to better reflect its function.

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Link: https://lore.kernel.org/r/20241202073139.448208-1-kirill.shutemov@linux.intel.com
2024-12-06 20:06:52 +01:00
Sean Christopherson
492077668f x86/CPU/AMD: WARN when setting EFER.AUTOIBRS if and only if the WRMSR fails
When ensuring EFER.AUTOIBRS is set, WARN only on a negative return code
from msr_set_bit(), as '1' is used to indicate the WRMSR was successful
('0' indicates the MSR bit was already set).

Fixes: 8cc68c9c9e ("x86/CPU/AMD: Make sure EFER[AIBRSE] is set")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/Z1MkNofJjt7Oq0G6@google.com
Closes: https://lore.kernel.org/all/20241205220604.GA2054199@thelio-3990X
2024-12-06 19:57:05 +01:00
Ricardo Neri
9677be09e5 x86/cacheinfo: Delete global num_cache_leaves
Linux remembers cpu_cachinfo::num_leaves per CPU, but x86 initializes all
CPUs from the same global "num_cache_leaves".

This is erroneous on systems such as Meteor Lake, where each CPU has a
distinct num_leaves value. Delete the global "num_cache_leaves" and
initialize num_leaves on each CPU.

init_cache_level() no longer needs to set num_leaves. Also, it never had to
set num_levels as it is unnecessary in x86. Keep checking for zero cache
leaves. Such condition indicates a bug.

  [ bp: Cleanup. ]

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org # 6.3+
Link: https://lore.kernel.org/r/20241128002247.26726-3-ricardo.neri-calderon@linux.intel.com
2024-12-06 13:13:36 +01:00
Thomas Weißschuh
a3eaa2be70 x86/sysfs: Constify 'struct bin_attribute'
The sysfs core now allows instances of 'struct bin_attribute' to be
moved into read-only memory. Make use of that to protect them against
accidental or malicious modifications.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20241202-sysfs-const-bin_attr-x86-v1-1-b767d5f0ac5c@weissschuh.net
2024-12-06 11:06:14 +01:00
Juergen Gross
29188c1600 x86/paravirt: Remove the WBINVD callback
The pv_ops::cpu.wbinvd paravirt callback is a leftover of lguest times.
Today it is no longer needed, as all users use the native WBINVD
implementation.

Remove the callback and rename native_wbinvd() to wbinvd().

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20241203071550.26487-1-jgross@suse.com
2024-12-06 11:01:36 +01:00
Sohil Mehta
7a470e826d x86/cpufeatures: Free up unused feature bits
Linux defined feature bits X86_FEATURE_P3 and X86_FEATURE_P4 are not
used anywhere. Commit f31d731e44 ("x86: use X86_FEATURE_NOPL in
alternatives") got rid of the last usage in 2008. Remove the related
mappings and code.

Just like all X86_FEATURE bits, the raw bit numbers can be exposed to
userspace via MODULE_DEVICE_TABLE(). There is a very small theoretical
chance of userspace getting confused if these bits got reassigned and
changed logical meaning.  But these bits were never used for a device
table, so it's highly unlikely this will ever happen in practice.

[ dhansen: clarify userspace visibility of these bits ]

Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/all/20241107233000.2742619-1-sohil.mehta%40intel.com
2024-12-06 10:57:44 +01:00
David Woodhouse
5a82223e07 x86/kexec: Mark relocate_kernel page as ROX instead of RWX
All writes to the page now happen before it gets marked as executable
(or after it's already switched to the identmap page tables where it's
OK to be RWX).

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20241205153343.3275139-14-dwmw2@infradead.org
2024-12-06 10:42:01 +01:00
David Woodhouse
93e489ad7a x86/kexec: Clean up register usage in relocate_kernel()
The memory encryption flag is passed in %r8 because that's where the
calling convention puts it. Instead of moving it to %r12 and then using
%r8 for other things, just leave it in %r8 and use other registers
instead.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lore.kernel.org/r/20241205153343.3275139-13-dwmw2@infradead.org
2024-12-06 10:42:00 +01:00
David Woodhouse
b7155dfd49 x86/kexec: Eliminate writes through kernel mapping of relocate_kernel page
All writes to the relocate_kernel control page are now done *after* the
%cr3 switch via simple %rip-relative addressing, which means the DATA()
macro with its pointer arithmetic can also now be removed.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lore.kernel.org/r/20241205153343.3275139-12-dwmw2@infradead.org
2024-12-06 10:42:00 +01:00
David Woodhouse
b3adabae8a x86/kexec: Drop page_list argument from relocate_kernel()
The kernel's virtual mapping of the relocate_kernel page currently needs
to be RWX because it is written to before the %cr3 switch.

Now that the relocate_kernel page has its own .data section and local
variables, it can also have *global* variables. So eliminate the separate
page_list argument, and write the same information directly to variables
in the relocate_kernel page instead. This way, the relocate_kernel code
itself doesn't need to copy it.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lore.kernel.org/r/20241205153343.3275139-11-dwmw2@infradead.org
2024-12-06 10:42:00 +01:00
David Woodhouse
8dbec5c77b x86/kexec: Add data section to relocate_kernel
Now that the relocate_kernel page is handled sanely by a linker script
we can have actual data, and just use %rip-relative addressing to access
it.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lore.kernel.org/r/20241205153343.3275139-10-dwmw2@infradead.org
2024-12-06 10:42:00 +01:00
David Woodhouse
cb33ff9e06 x86/kexec: Move relocate_kernel to kernel .data section
Now that the copy is executed instead of the original, the relocate_kernel
page can live in the kernel's .text section. This will allow subsequent
commits to actually add real data to it and clean up the code somewhat as
well as making the control page ROX.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lore.kernel.org/r/20241205153343.3275139-9-dwmw2@infradead.org
2024-12-06 10:41:59 +01:00
David Woodhouse
eeebbde571 x86/kexec: Invoke copy of relocate_kernel() instead of the original
This currently calls set_memory_x() from machine_kexec_prepare() just
like the 32-bit version does. That's actually a bit earlier than I'd
like, as it leaves the page RWX all the time the image is even *loaded*.

Subsequent commits will eliminate all the writes to the page between the
point it's marked executable in machine_kexec_prepare() the time that
relocate_kernel() is running and has switched to the identmap %cr3, so
that it can be ROX. But that can't happen until it's moved to the .data
section of the kernel, and *that* can't happen until we start executing
the copy instead of executing it in place in the kernel .text. So break
the circular dependency in those commits by letting it be RWX for now.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lore.kernel.org/r/20241205153343.3275139-8-dwmw2@infradead.org
2024-12-06 10:41:59 +01:00
David Woodhouse
6a750b4c00 x86/kexec: Copy control page into place in machine_kexec_prepare()
There's no need for this to wait until the actual machine_kexec() invocation;
future changes will need to make the control page read-only and executable,
so all writes should be completed before machine_kexec_prepare() returns.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lore.kernel.org/r/20241205153343.3275139-7-dwmw2@infradead.org
2024-12-06 10:41:59 +01:00
David Woodhouse
4b5bc2ec9a x86/kexec: Allocate PGD for x86_64 transition page tables separately
Now that the following fix:

  d0ceea662d ("x86/mm: Add _PAGE_NOPTISHADOW bit to avoid updating userspace page tables")

stops kernel_ident_mapping_init() from scribbling over the end of a
4KiB PGD by assuming the following 4KiB will be a userspace PGD,
there's no good reason for the kexec PGD to be part of a single
8KiB allocation with the control_code_page.

( It's not clear that that was the reason for x86_64 kexec doing it that
  way in the first place either; there were no comments to that effect and
  it seems to have been the case even before PTI came along. It looks like
  it was just a happy accident which prevented memory corruption on kexec. )

Either way, it definitely isn't needed now. Just allocate the PGD
separately on x86_64, like i386 already does.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lore.kernel.org/r/20241205153343.3275139-6-dwmw2@infradead.org
2024-12-06 10:41:59 +01:00
David Woodhouse
9e5683e2d0 x86/kexec: Only swap pages for ::preserve_context mode
There's no need to swap pages (which involves three memcopies for each
page) in the plain kexec case. Just do a single copy from source to
destination page.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lore.kernel.org/r/20241205153343.3275139-5-dwmw2@infradead.org
2024-12-06 10:41:59 +01:00
David Woodhouse
46d4e205e2 x86/kexec: Use named labels in swap_pages in relocate_kernel_64.S
Make the code a little more readable.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Kai Huang <kai.huang@intel.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lore.kernel.org/r/20241205153343.3275139-4-dwmw2@infradead.org
2024-12-06 10:41:58 +01:00
David Woodhouse
207bdf7f72 x86/kexec: Clean up and document register use in relocate_kernel_64.S
Add more comments explaining what each register contains, and save the
preserve_context flag to a non-clobbered register sooner, to keep things
simpler.

No change in behavior intended.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Kai Huang <kai.huang@intel.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lore.kernel.org/r/20241205153343.3275139-3-dwmw2@infradead.org
2024-12-06 10:41:58 +01:00
Ingo Molnar
fe8ec69baa Merge branch 'x86/urgent' into x86/boot, to pick up dependent fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-12-06 10:36:14 +01:00
David Woodhouse
07fa619f2a x86/kexec: Restore GDT on return from ::preserve_context kexec
The restore_processor_state() function explicitly states that "the asm code
that gets us here will have restored a usable GDT". That wasn't true in the
case of returning from a ::preserve_context kexec. Make it so.

Without this, the kernel was depending on the called function to reload a
GDT which is appropriate for the kernel before returning.

Test program:

 #include <unistd.h>
 #include <errno.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <linux/kexec.h>
 #include <linux/reboot.h>
 #include <sys/reboot.h>
 #include <sys/syscall.h>

 int main (void)
 {
        struct kexec_segment segment = {};
	unsigned char purgatory[] = {
		0x66, 0xba, 0xf8, 0x03,	// mov $0x3f8, %dx
		0xb0, 0x42,		// mov $0x42, %al
		0xee,			// outb %al, (%dx)
		0xc3,			// ret
	};
	int ret;

	segment.buf = &purgatory;
	segment.bufsz = sizeof(purgatory);
	segment.mem = (void *)0x400000;
	segment.memsz = 0x1000;
	ret = syscall(__NR_kexec_load, 0x400000, 1, &segment, KEXEC_PRESERVE_CONTEXT);
	if (ret) {
		perror("kexec_load");
		exit(1);
	}

	ret = syscall(__NR_reboot, LINUX_REBOOT_MAGIC1, LINUX_REBOOT_MAGIC2, LINUX_REBOOT_CMD_KEXEC);
	if (ret) {
		perror("kexec reboot");
		exit(1);
	}
	printf("Success\n");
	return 0;
 }

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20241205153343.3275139-2-dwmw2@infradead.org
2024-12-06 10:35:49 +01:00
Fernando Fernandez Mancera
73da582a47 x86/cpu/topology: Remove limit of CPUs due to disabled IO/APIC
The rework of possible CPUs management erroneously disabled SMP when the
IO/APIC is disabled either by the 'noapic' command line parameter or during
IO/APIC setup. SMP is possible without IO/APIC.

Remove the ioapic_is_disabled conditions from the relevant possible CPU
management code paths to restore the orgininal behaviour.

Fixes: 7c0edad364 ("x86/cpu/topology: Rework possible CPU management")
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20241202145905.1482-1-ffmancera@riseup.net
2024-12-05 14:43:32 +01:00
Ard Biesheuvel
a6a4ae9c3f x86/boot: Move .head.text into its own output section
In order to be able to double check that vmlinux is emitted without
absolute symbol references in .head.text, it needs to be distinguishable
from the rest of .text in the ELF metadata.

So move .head.text into its own ELF section.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lore.kernel.org/r/20241205112804.3416920-15-ardb+git@google.com
2024-12-05 13:18:55 +01:00
Ard Biesheuvel
35350eb689 x86/kernel: Move ENTRY_TEXT to the start of the image
Since commit:

  7734a0f31e ("x86/boot: Robustify calling startup_{32,64}() from the decompressor code")

it is no longer necessary for .head.text to appear at the start of the
image. Since ENTRY_TEXT needs to appear PMD-aligned, it is easier to
just place it at the start of the image, rather than line it up with the
end of the .text section. The amount of padding required should be the
same, but this arrangement also permits .head.text to be split off and
emitted separately, which is needed by a subsequent change.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lore.kernel.org/r/20241205112804.3416920-14-ardb+git@google.com
2024-12-05 13:18:55 +01:00
Ard Biesheuvel
0d9b9a328c x86/boot/64: Avoid intentional absolute symbol references in .head.text
The code in .head.text executes from a 1:1 mapping and cannot generally
refer to global variables using their kernel virtual addresses. However,
there are some occurrences of such references that are valid: the kernel
virtual addresses of _text and _end are needed to populate the page
tables correctly, and some other section markers are used in a similar
way.

To avoid the need for making exceptions to the rule that .head.text must
not contain any absolute symbol references, derive these addresses from
the RIP-relative 1:1 mapped physical addresses, which can be safely
determined using RIP_REL_REF().

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lore.kernel.org/r/20241205112804.3416920-12-ardb+git@google.com
2024-12-05 13:18:54 +01:00
Ard Biesheuvel
093562198e x86/boot/64: Determine VA/PA offset before entering C code
Implicit absolute symbol references (e.g., taking the address of a
global variable) must be avoided in the C code that runs from the early
1:1 mapping of the kernel, given that this is a practice that violates
assumptions on the part of the toolchain. I.e., RIP-relative and
absolute references are expected to produce the same values, and so the
compiler is free to choose either. However, the code currently assumes
that RIP-relative references are never emitted here.

So an explicit virtual-to-physical offset needs to be used instead to
derive the kernel virtual addresses of _text and _end, instead of simply
taking the addresses and assuming that the compiler will not choose to
use a RIP-relative references in this particular case.

Currently, phys_base is already used to perform such calculations, but
it is derived from the kernel virtual address of _text, which is taken
using an implicit absolute symbol reference. So instead, derive this
VA-to-PA offset in asm code, and pass it to the C startup code.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lore.kernel.org/r/20241205112804.3416920-11-ardb+git@google.com
2024-12-05 13:18:54 +01:00
Len Brown
c9a4b55431 x86/cpu: Add Lunar Lake to list of CPUs with a broken MONITOR implementation
Under some conditions, MONITOR wakeups on Lunar Lake processors
can be lost, resulting in significant user-visible delays.

Add Lunar Lake to X86_BUG_MONITOR so that wake_up_idle_cpu()
always sends an IPI, avoiding this potential delay.

Reported originally here:

	https://bugzilla.kernel.org/show_bug.cgi?id=219364

[ dhansen: tweak subject ]

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/a4aa8842a3c3bfdb7fe9807710eef159cbf0e705.1731463305.git.len.brown%40intel.com
2024-12-04 12:30:14 -08:00
Kirill A. Shutemov
6a5abeea9c x86/mtrr: Rename mtrr_overwrite_state() to guest_force_mtrr_state()
Rename the helper to better reflect its function.

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lore.kernel.org/all/20241202073139.448208-1-kirill.shutemov%40linux.intel.com
2024-12-04 10:46:19 -08:00
Aruna Ramakrishna
ae6012d72f x86/pkeys: Ensure updated PKRU value is XRSTOR'd
When XSTATE_BV[i] is 0, and XRSTOR attempts to restore state component
'i' it ignores any value in the XSAVE buffer and instead restores the
state component's init value.

This means that if XSAVE writes XSTATE_BV[PKRU]=0 then XRSTOR will
ignore the value that update_pkru_in_sigframe() writes to the XSAVE buffer.

XSTATE_BV[PKRU] only gets written as 0 if PKRU is in its init state. On
Intel CPUs, basically never happens because the kernel usually
overwrites the init value (aside: this is why we didn't notice this bug
until now). But on AMD, the init tracker is more aggressive and will
track PKRU as being in its init state upon any wrpkru(0x0).
Unfortunately, sig_prepare_pkru() does just that: wrpkru(0x0).

This writes XSTATE_BV[PKRU]=0 which makes XRSTOR ignore the PKRU value
in the sigframe.

To fix this, always overwrite the sigframe XSTATE_BV with a value that
has XSTATE_BV[PKRU]==1.  This ensures that XRSTOR will not ignore what
update_pkru_in_sigframe() wrote.

The problematic sequence of events is something like this:

Userspace does:
	* wrpkru(0xffff0000) (or whatever)
	* Hardware sets: XINUSE[PKRU]=1
Signal happens, kernel is entered:
	* sig_prepare_pkru() => wrpkru(0x00000000)
	* Hardware sets: XINUSE[PKRU]=0 (aggressive AMD init tracker)
	* XSAVE writes most of XSAVE buffer, including
	  XSTATE_BV[PKRU]=XINUSE[PKRU]=0
	* update_pkru_in_sigframe() overwrites PKRU in XSAVE buffer
... signal handling
	* XRSTOR sees XSTATE_BV[PKRU]==0, ignores just-written value
	  from update_pkru_in_sigframe()

Fixes: 70044df250 ("x86/pkeys: Update PKRU to enable all pkeys before XSAVE")
Suggested-by: Rudi Horn <rudi.horn@oracle.com>
Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20241119174520.3987538-3-aruna.ramakrishna%40oracle.com
2024-12-02 15:25:29 -08:00
Aruna Ramakrishna
6a1853bdf1 x86/pkeys: Change caller of update_pkru_in_sigframe()
update_pkru_in_sigframe() will shortly need some information which
is only available inside xsave_to_user_sigframe(). Move
update_pkru_in_sigframe() inside the other function to make it
easier to provide it that information.

No functional changes.

Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20241119174520.3987538-2-aruna.ramakrishna%40oracle.com
2024-12-02 15:25:21 -08:00
Peter Zijlstra
2190966fbc x86: Convert unreachable() to BUG()
Avoid unreachable() as it can (and will in the absence of UBSAN)
generate fallthrough code. Use BUG() so we get a UD2 trap (with
unreachable annotation).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lore.kernel.org/r/20241128094312.028316261@infradead.org
2024-12-02 12:01:43 +01:00
K Prateek Nayak
e4b4443477 x86/topology: Introduce topology_logical_core_id()
On x86, topology_core_id() returns a unique core ID within the PKG
domain. Looking at match_smt() suggests that a core ID just needs to be
unique within a LLC domain. For use cases such as the core RAPL PMU,
there exists a need for a unique core ID across the entire system with
multiple PKG domains. Introduce topology_logical_core_id() to derive a
unique core ID across the system.

Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: "Gautham R. Shenoy" <gautham.shenoy@amd.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Link: https://lore.kernel.org/r/20241115060805.447565-3-Dhananjay.Ugwekar@amd.com
2024-12-02 12:01:35 +01:00
Linus Torvalds
58ac609b99 - Add a terminating zero end-element to the array describing AMD CPUs affected
by erratum 1386 so that the matching loop actually terminates instead of
   going off into the weeds
 
 - Update the boot protocol documentation to mention the fact that the
   preferred address to load the kernel to is considered in the relocatable
   kernel case too
 
 - Flush the memory buffer containing the microcode patch after applying
   microcode on AMD Zen1 and Zen2, to avoid unnecessary slowdowns
 
 - Make sure the PPIN CPU feature flag is cleared on all CPUs if PPIN has been
   disabled
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmdMPvcACgkQEsHwGGHe
 VUo2xw//QvwzIfWU/l+UnZppbpRL5gvLy41EgNOwhMBVDd81Fdx87KImg7luDDvM
 FHsydVpSmqS6gMX0n6JQfr7IMz4HLWHff/yJjq2Pgb5BS7HBk8RyQ8YPCaBbXP33
 NsV2fSL2INgLL6z6iefrnStQouIP2iRp+bN1kXSRe0Yhs+RBj6DyKsD6BdN/x342
 AFkP65rY/1+7jLIftI2YulKEB5RmlbNqa9Nzbq1kOfO6I0TPUZmK5XI1xcRKHiwK
 yFaMKufZq94rULhNsbjwPhNqK5LG34AeQ2xpaiujA1uHdQssChmAnGuJzrK2s3T0
 YUo7WzI5LBsRDw0UGtfKjvl6JMFhDvhiSY9f8stS8B8GIiIeErkwKxkzVqAR5rQM
 JbukE0Di/JABXk5sMzwyamFCJ3TgbuSWivK5ujxsiDTU6d/X89f1CRECU02lZT6u
 fc7GIjZ09voep/YruknmyZbha/hh0EofN3GbIkwBsKX6dsypSKAuSkSBysZfRYXE
 z1hZuVyBGQJj0OQMtbIaGGmKJWPcQq18xiKKo1XYIDOL+Ag0RQ0ZrVYA7Wt96MB7
 ImoeduD1ssvU00IJ9QMx/EPdmrZHxzX3C1XGEm1DyW4fTYc8TPJnowxjXuB9Hir6
 IhCARvXni5/8vAeNOb8xx+izr64jRCy3w2rCcjjebqFW/oEvPrQ=
 =RC2M
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v6.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Add a terminating zero end-element to the array describing AMD CPUs
   affected by erratum 1386 so that the matching loop actually
   terminates instead of going off into the weeds

 - Update the boot protocol documentation to mention the fact that the
   preferred address to load the kernel to is considered in the
   relocatable kernel case too

 - Flush the memory buffer containing the microcode patch after applying
   microcode on AMD Zen1 and Zen2, to avoid unnecessary slowdowns

 - Make sure the PPIN CPU feature flag is cleared on all CPUs if PPIN
   has been disabled

* tag 'x86_urgent_for_v6.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/CPU/AMD: Terminate the erratum_1386_microcode array
  x86/Documentation: Update algo in init_size description of boot protocol
  x86/microcode/AMD: Flush patch buffer mapping after application
  x86/mm: Carve out INVLPG inline asm for use by others
  x86/cpu: Fix PPIN initialization
2024-12-01 12:35:37 -08:00
Linus Torvalds
6a34dfa15d Kbuild updates for v6.13
- Add generic support for built-in boot DTB files
 
  - Enable TAB cycling for dialog buttons in nconfig
 
  - Fix issues in streamline_config.pl
 
  - Refactor Kconfig
 
  - Add support for Clang's AutoFDO (Automatic Feedback-Directed
    Optimization)
 
  - Add support for Clang's Propeller, a profile-guided optimization.
 
  - Change the working directory to the external module directory for M=
    builds
 
  - Support building external modules in a separate output directory
 
  - Enable objtool for *.mod.o and additional kernel objects
 
  - Use lz4 instead of deprecated lz4c
 
  - Work around a performance issue with "git describe"
 
  - Refactor modpost
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmdKGgEVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGrFoQAIgioJPRG+HC6bGmjy4tL4bq1RAm
 78nbD12grrAa+NvQGRZYRs264rWxBGwrNfGGS9BDvlWJZ3fmKEuPlfCIxC0nkKk8
 LVLNxSVvgpHE47RQ+E4V+yYhrlZKb4aWZjH3ZICn7vxRgbQ5Veq61aatluVHyn9c
 I8g+APYN/S1A4JkFzaLe8GV7v5OM3+zGSn3M9n7/DxVkoiNrMOXJm5hRdRgYfEv/
 kMppheY2PPshZsaL+yLAdrJccY5au5vYE/v8wHkMtvM/LF6YwjgqPVDRFQ30BuLM
 sAMMd6AUoopiDZQOpqmXYukU0b0MQPswg3jmB+PWUBrlsuydRa8kkyPwUaFrDd+w
 9d0jZRc8/O/lxUdD1AefRkNcA/dIJ4lTPr+2NpxwHuj2UFo0gLQmtjBggMFHaWvs
 0NQRBPlxfOE4+Htl09gyg230kHuWq+rh7xqbyJCX+hBOaZ6kI2lryl6QkgpAoS+x
 KDOcUKnsgGMGARQRrvCOAXnQs+rjkW8fEm6t7eSBFPuWJMK85k4LmxOog8GVYEdM
 JcwCnCHt9TtcHlSxLRnVXj2aqGTFNLJXE1aLyCp9u8MxZ7qcx01xOuCmwp6FRzNq
 ghal7ngA58Y/S4K/oJ+CW2KupOb6CFne8mpyotpYeWI7MR64t0YWs4voZkuK46E2
 CEBfA4PDehA4BxQe
 =GDKD
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Add generic support for built-in boot DTB files

 - Enable TAB cycling for dialog buttons in nconfig

 - Fix issues in streamline_config.pl

 - Refactor Kconfig

 - Add support for Clang's AutoFDO (Automatic Feedback-Directed
   Optimization)

 - Add support for Clang's Propeller, a profile-guided optimization.

 - Change the working directory to the external module directory for M=
   builds

 - Support building external modules in a separate output directory

 - Enable objtool for *.mod.o and additional kernel objects

 - Use lz4 instead of deprecated lz4c

 - Work around a performance issue with "git describe"

 - Refactor modpost

* tag 'kbuild-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (85 commits)
  kbuild: rename .tmp_vmlinux.kallsyms0.syms to .tmp_vmlinux0.syms
  gitignore: Don't ignore 'tags' directory
  kbuild: add dependency from vmlinux to resolve_btfids
  modpost: replace tdb_hash() with hash_str()
  kbuild: deb-pkg: add python3:native to build dependency
  genksyms: reduce indentation in export_symbol()
  modpost: improve error messages in device_id_check()
  modpost: rename alias symbol for MODULE_DEVICE_TABLE()
  modpost: rename variables in handle_moddevtable()
  modpost: move strstarts() to modpost.h
  modpost: convert do_usb_table() to a generic handler
  modpost: convert do_of_table() to a generic handler
  modpost: convert do_pnp_device_entry() to a generic handler
  modpost: convert do_pnp_card_entries() to a generic handler
  modpost: call module_alias_printf() from all do_*_entry() functions
  modpost: pass (struct module *) to do_*_entry() functions
  modpost: remove DEF_FIELD_ADDR_VAR() macro
  modpost: deduplicate MODULE_ALIAS() for all drivers
  modpost: introduce module_alias_printf() helper
  modpost: remove unnecessary check in do_acpi_entry()
  ...
2024-11-30 13:41:50 -08:00
Rong Xu
d5dc958361 kbuild: Add Propeller configuration for kernel build
Add the build support for using Clang's Propeller optimizer. Like
AutoFDO, Propeller uses hardware sampling to gather information
about the frequency of execution of different code paths within a
binary. This information is then used to guide the compiler's
optimization decisions, resulting in a more efficient binary.

The support requires a Clang compiler LLVM 19 or later, and the
create_llvm_prof tool
(https://github.com/google/autofdo/releases/tag/v0.30.1). This
commit is limited to x86 platforms that support PMU features
like LBR on Intel machines and AMD Zen3 BRS.

Here is an example workflow for building an AutoFDO+Propeller
optimized kernel:

1) Build the kernel on the host machine, with AutoFDO and Propeller
   build config
      CONFIG_AUTOFDO_CLANG=y
      CONFIG_PROPELLER_CLANG=y
   then
      $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo_profile>

“<autofdo_profile>” is the profile collected when doing a non-Propeller
AutoFDO build. This step builds a kernel that has the same optimization
level as AutoFDO, plus a metadata section that records basic block
information. This kernel image runs as fast as an AutoFDO optimized
kernel.

2) Install the kernel on test/production machines.

3) Run the load tests. The '-c' option in perf specifies the sample
   event period. We suggest using a suitable prime number,
   like 500009, for this purpose.
   For Intel platforms:
      $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> \
        -o <perf_file> -- <loadtest>
   For AMD platforms:
      The supported system are: Zen3 with BRS, or Zen4 with amd_lbr_v2
      # To see if Zen3 support LBR:
      $ cat proc/cpuinfo | grep " brs"
      # To see if Zen4 support LBR:
      $ cat proc/cpuinfo | grep amd_lbr_v2
      # If the result is yes, then collect the profile using:
      $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a \
        -N -b -c <count> -o <perf_file> -- <loadtest>

4) (Optional) Download the raw perf file to the host machine.

5) Generate Propeller profile:
   $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> \
     --format=propeller --propeller_output_module_name \
     --out=<propeller_profile_prefix>_cc_profile.txt \
     --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt

   “create_llvm_prof” is the profile conversion tool, and a prebuilt
   binary for linux can be found on
   https://github.com/google/autofdo/releases/tag/v0.30.1 (can also build
   from source).

   "<propeller_profile_prefix>" can be something like
   "/home/user/dir/any_string".

   This command generates a pair of Propeller profiles:
   "<propeller_profile_prefix>_cc_profile.txt" and
   "<propeller_profile_prefix>_ld_profile.txt".

6) Rebuild the kernel using the AutoFDO and Propeller profile files.
      CONFIG_AUTOFDO_CLANG=y
      CONFIG_PROPELLER_CLANG=y
   and
      $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo_profile> \
        CLANG_PROPELLER_PROFILE_PREFIX=<propeller_profile_prefix>

Co-developed-by: Han Shen <shenhan@google.com>
Signed-off-by: Han Shen <shenhan@google.com>
Signed-off-by: Rong Xu <xur@google.com>
Suggested-by: Sriraman Tallam <tmsriram@google.com>
Suggested-by: Krzysztof Pszeniczny <kpszeniczny@google.com>
Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Suggested-by: Stephane Eranian <eranian@google.com>
Tested-by: Yonghong Song <yonghong.song@linux.dev>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-11-27 09:38:27 +09:00
Sebastian Andrzej Siewior
ff6cdc407f x86/CPU/AMD: Terminate the erratum_1386_microcode array
The erratum_1386_microcode array requires an empty entry at the end.
Otherwise x86_match_cpu_with_stepping() will continue iterate the array after
it ended.

Add an empty entry to erratum_1386_microcode to its end.

Fixes: 29ba89f189 ("x86/CPU/AMD: Improve the erratum 1386 workaround")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20241126134722.480975-1-bigeasy@linutronix.de
2024-11-26 15:12:00 +01:00
David Laight
573f45a9f9 x86: fix off-by-one in access_ok()
When the size isn't a small constant, __access_ok() will call
valid_user_address() with the address after the last byte of the user
buffer.

It is valid for a buffer to end with the last valid user address so
valid_user_address() must allow accesses to the base of the guard page.

[ This introduces an off-by-one in the other direction for the plain
  non-sized accesses, but since we have that guard region that is a
  whole page, those checks "allowing" accesses to that guard region
  don't really matter. The access will fault anyway, whether to the
  guard page or if the address has been masked to all ones - Linus ]

Fixes: 86e6b1547b ("x86: fix user address masking non-canonical speculation issue")
Signed-off-by: David Laight <david.laight@aculab.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-11-25 12:19:05 -08:00
Borislav Petkov (AMD)
c809b0d0e5 x86/microcode/AMD: Flush patch buffer mapping after application
Due to specific requirements while applying microcode patches on Zen1
and 2, the patch buffer mapping needs to be flushed from the TLB after
application. Do so.

If not, unnecessary and unnatural delays happen in the boot process.

Reported-by: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com>
Cc: <stable@kernel.org> # f1d84b59cb ("x86/mm: Carve out INVLPG inline asm for use by others")
Link: https://lore.kernel.org/r/ZyulbYuvrkshfsd2@antipodes
2024-11-25 11:43:21 +01:00
Tony Luck
d9bb405446 x86/cpu: Fix PPIN initialization
On systems that enumerate PPIN (protected processor inventory
number) using CPUID, but where the BIOS locked the MSR to
prevent access /proc/cpuinfo reports "intel_ppin" feature as
present on all logical CPUs except for CPU 0.

This happens because ppin_init() uses x86_match_cpu() to
determine whether PPIN is supported. When called on CPU 0
the test for locked PPIN MSR results in:

	clear_cpu_cap(c, info->feature);

This clears the X86 FEATURE bit in boot_cpu_data. When other
CPUs are brought online the x86_match_cpu() fails, and the
PPIN FEATURE bit remains set for those other CPUs.

Fix by using setup_clear_cpu_cap() instead of clear_cpu_cap()
which force clears the FEATURE bit for all CPUS.

Reported-by: Adeel Ashad <adeel.arshad@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20241122234212.27451-1-tony.luck@intel.com
2024-11-25 10:11:33 +01:00
Linus Torvalds
5c00ff742b - The series "zram: optimal post-processing target selection" from
Sergey Senozhatsky improves zram's post-processing selection algorithm.
   This leads to improved memory savings.
 
 - Wei Yang has gone to town on the mapletree code, contributing several
   series which clean up the implementation:
 
 	- "refine mas_mab_cp()"
 	- "Reduce the space to be cleared for maple_big_node"
 	- "maple_tree: simplify mas_push_node()"
 	- "Following cleanup after introduce mas_wr_store_type()"
 	- "refine storing null"
 
 - The series "selftests/mm: hugetlb_fault_after_madv improvements" from
   David Hildenbrand fixes this selftest for s390.
 
 - The series "introduce pte_offset_map_{ro|rw}_nolock()" from Qi Zheng
   implements some rationaizations and cleanups in the page mapping code.
 
 - The series "mm: optimize shadow entries removal" from Shakeel Butt
   optimizes the file truncation code by speeding up the handling of shadow
   entries.
 
 - The series "Remove PageKsm()" from Matthew Wilcox completes the
   migration of this flag over to being a folio-based flag.
 
 - The series "Unify hugetlb into arch_get_unmapped_area functions" from
   Oscar Salvador implements a bunch of consolidations and cleanups in the
   hugetlb code.
 
 - The series "Do not shatter hugezeropage on wp-fault" from Dev Jain
   takes away the wp-fault time practice of turning a huge zero page into
   small pages.  Instead we replace the whole thing with a THP.  More
   consistent cleaner and potentiall saves a large number of pagefaults.
 
 - The series "percpu: Add a test case and fix for clang" from Andy
   Shevchenko enhances and fixes the kernel's built in percpu test code.
 
 - The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett
   optimizes mremap() by avoiding doing things which we didn't need to do.
 
 - The series "Improve the tmpfs large folio read performance" from
   Baolin Wang teaches tmpfs to copy data into userspace at the folio size
   rather than as individual pages.  A 20% speedup was observed.
 
 - The series "mm/damon/vaddr: Fix issue in
   damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON splitting.
 
 - The series "memcg-v1: fully deprecate charge moving" from Shakeel Butt
   removes the long-deprecated memcgv2 charge moving feature.
 
 - The series "fix error handling in mmap_region() and refactor" from
   Lorenzo Stoakes cleanup up some of the mmap() error handling and
   addresses some potential performance issues.
 
 - The series "x86/module: use large ROX pages for text allocations" from
   Mike Rapoport teaches x86 to use large pages for read-only-execute
   module text.
 
 - The series "page allocation tag compression" from Suren Baghdasaryan
   is followon maintenance work for the new page allocation profiling
   feature.
 
 - The series "page->index removals in mm" from Matthew Wilcox remove
   most references to page->index in mm/.  A slow march towards shrinking
   struct page.
 
 - The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs
   interface tests" from Andrew Paniakin performs maintenance work for
   DAMON's self testing code.
 
 - The series "mm: zswap swap-out of large folios" from Kanchana Sridhar
   improves zswap's batching of compression and decompression.  It is a
   step along the way towards using Intel IAA hardware acceleration for
   this zswap operation.
 
 - The series "kasan: migrate the last module test to kunit" from
   Sabyrzhan Tasbolatov completes the migration of the KASAN built-in tests
   over to the KUnit framework.
 
 - The series "implement lightweight guard pages" from Lorenzo Stoakes
   permits userapace to place fault-generating guard pages within a single
   VMA, rather than requiring that multiple VMAs be created for this.
   Improved efficiencies for userspace memory allocators are expected.
 
 - The series "memcg: tracepoint for flushing stats" from JP Kobryn uses
   tracepoints to provide increased visibility into memcg stats flushing
   activity.
 
 - The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky
   fixes a zram buglet which potentially affected performance.
 
 - The series "mm: add more kernel parameters to control mTHP" from
   Maíra Canal enhances our ability to control/configuremultisize THP from
   the kernel boot command line.
 
 - The series "kasan: few improvements on kunit tests" from Sabyrzhan
   Tasbolatov has a couple of fixups for the KASAN KUnit tests.
 
 - The series "mm/list_lru: Split list_lru lock into per-cgroup scope"
   from Kairui Song optimizes list_lru memory utilization when lockdep is
   enabled.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZzwFqgAKCRDdBJ7gKXxA
 jkeuAQCkl+BmeYHE6uG0hi3pRxkupseR6DEOAYIiTv0/l8/GggD/Z3jmEeqnZaNq
 xyyenpibWgUoShU2wZ/Ha8FE5WDINwg=
 =JfWR
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - The series "zram: optimal post-processing target selection" from
   Sergey Senozhatsky improves zram's post-processing selection
   algorithm. This leads to improved memory savings.

 - Wei Yang has gone to town on the mapletree code, contributing several
   series which clean up the implementation:
	- "refine mas_mab_cp()"
	- "Reduce the space to be cleared for maple_big_node"
	- "maple_tree: simplify mas_push_node()"
	- "Following cleanup after introduce mas_wr_store_type()"
	- "refine storing null"

 - The series "selftests/mm: hugetlb_fault_after_madv improvements" from
   David Hildenbrand fixes this selftest for s390.

 - The series "introduce pte_offset_map_{ro|rw}_nolock()" from Qi Zheng
   implements some rationaizations and cleanups in the page mapping
   code.

 - The series "mm: optimize shadow entries removal" from Shakeel Butt
   optimizes the file truncation code by speeding up the handling of
   shadow entries.

 - The series "Remove PageKsm()" from Matthew Wilcox completes the
   migration of this flag over to being a folio-based flag.

 - The series "Unify hugetlb into arch_get_unmapped_area functions" from
   Oscar Salvador implements a bunch of consolidations and cleanups in
   the hugetlb code.

 - The series "Do not shatter hugezeropage on wp-fault" from Dev Jain
   takes away the wp-fault time practice of turning a huge zero page
   into small pages. Instead we replace the whole thing with a THP. More
   consistent cleaner and potentiall saves a large number of pagefaults.

 - The series "percpu: Add a test case and fix for clang" from Andy
   Shevchenko enhances and fixes the kernel's built in percpu test code.

 - The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett
   optimizes mremap() by avoiding doing things which we didn't need to
   do.

 - The series "Improve the tmpfs large folio read performance" from
   Baolin Wang teaches tmpfs to copy data into userspace at the folio
   size rather than as individual pages. A 20% speedup was observed.

 - The series "mm/damon/vaddr: Fix issue in
   damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON
   splitting.

 - The series "memcg-v1: fully deprecate charge moving" from Shakeel
   Butt removes the long-deprecated memcgv2 charge moving feature.

 - The series "fix error handling in mmap_region() and refactor" from
   Lorenzo Stoakes cleanup up some of the mmap() error handling and
   addresses some potential performance issues.

 - The series "x86/module: use large ROX pages for text allocations"
   from Mike Rapoport teaches x86 to use large pages for
   read-only-execute module text.

 - The series "page allocation tag compression" from Suren Baghdasaryan
   is followon maintenance work for the new page allocation profiling
   feature.

 - The series "page->index removals in mm" from Matthew Wilcox remove
   most references to page->index in mm/. A slow march towards shrinking
   struct page.

 - The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs
   interface tests" from Andrew Paniakin performs maintenance work for
   DAMON's self testing code.

 - The series "mm: zswap swap-out of large folios" from Kanchana Sridhar
   improves zswap's batching of compression and decompression. It is a
   step along the way towards using Intel IAA hardware acceleration for
   this zswap operation.

 - The series "kasan: migrate the last module test to kunit" from
   Sabyrzhan Tasbolatov completes the migration of the KASAN built-in
   tests over to the KUnit framework.

 - The series "implement lightweight guard pages" from Lorenzo Stoakes
   permits userapace to place fault-generating guard pages within a
   single VMA, rather than requiring that multiple VMAs be created for
   this. Improved efficiencies for userspace memory allocators are
   expected.

 - The series "memcg: tracepoint for flushing stats" from JP Kobryn uses
   tracepoints to provide increased visibility into memcg stats flushing
   activity.

 - The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky
   fixes a zram buglet which potentially affected performance.

 - The series "mm: add more kernel parameters to control mTHP" from
   Maíra Canal enhances our ability to control/configuremultisize THP
   from the kernel boot command line.

 - The series "kasan: few improvements on kunit tests" from Sabyrzhan
   Tasbolatov has a couple of fixups for the KASAN KUnit tests.

 - The series "mm/list_lru: Split list_lru lock into per-cgroup scope"
   from Kairui Song optimizes list_lru memory utilization when lockdep
   is enabled.

* tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (215 commits)
  cma: enforce non-zero pageblock_order during cma_init_reserved_mem()
  mm/kfence: add a new kunit test test_use_after_free_read_nofault()
  zram: fix NULL pointer in comp_algorithm_show()
  memcg/hugetlb: add hugeTLB counters to memcg
  vmstat: call fold_vm_zone_numa_events() before show per zone NUMA event
  mm: mmap_lock: check trace_mmap_lock_$type_enabled() instead of regcount
  zram: ZRAM_DEF_COMP should depend on ZRAM
  MAINTAINERS/MEMORY MANAGEMENT: add document files for mm
  Docs/mm/damon: recommend academic papers to read and/or cite
  mm: define general function pXd_init()
  kmemleak: iommu/iova: fix transient kmemleak false positive
  mm/list_lru: simplify the list_lru walk callback function
  mm/list_lru: split the lock to per-cgroup scope
  mm/list_lru: simplify reparenting and initial allocation
  mm/list_lru: code clean up for reparenting
  mm/list_lru: don't export list_lru_add
  mm/list_lru: don't pass unnecessary key parameters
  kasan: add kunit tests for kmalloc_track_caller, kmalloc_node_track_caller
  kasan: change kasan_atomics kunit test as KUNIT_CASE_SLOW
  kasan: use EXPORT_SYMBOL_IF_KUNIT to export symbols
  ...
2024-11-23 09:58:07 -08:00
Linus Torvalds
5af5d43f84 - Rework some CPU setup code to keep LLVM happy on 32-bit
- Correct RSB terminology in Kconfig text
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmc9OLoACgkQaDWVMHDJ
 krDIwA/5AcBevnfyIzDdMMnUJMFt5ppTqkzDVevdPh9d2Q07DxsHWzX9KCyKm1e3
 +E4FNfzg3JMJ3PmYnHLlhX78g5pYgDzmGVftfU5AmlQLeqZamgZKreyhEBIYZNkd
 CsgmZRBx2SmMfsThwAF+EFbi2Lysqgt1PceGNawVYjNQiv1qS7p091V0q3gqOTAf
 rr/LfiPfPe3nfGTW1CFqDPI9u8fsa87nelykp8/E0ZbOpIEecOAke8SkpjCpSORP
 zthUH4/UzUPuVTKiBztfM9HATqTWuRaGGyklFszFHHwLd/XZhsIKNQ/ptpl0qHIr
 d8f6LIrBPG/4Nyovjk6AYfs3NSc22tZngRbkN6DbaYtx7Io1lUJItd/ZIIuYVBs8
 MQgdmNkf6XU3WzkcCY7sT5WdXFOB95wi6TFdv33xRGmTovbothT5uJJHivotuOvZ
 lH9ym7WQANZ0ggvSh5Y4bzk6OyUE74POZ332c4c+F6nXlG9NkSEzWUVyaJMQ6yMD
 bPkdlxEZGF17xAm5VXT9qFmH6IClZL4j7G+Jh3GjvdGm3FO/pp1w5AkC8G7otQ7x
 b1TFBMeS7xkVvOPAMzwzezU8DsXT/jJGleSuLNN7YI5rzDCAw6/TyJjlWVOW1khr
 mV6xuhjb8P1ku5L+/nI5ghxQcgM8+HxN10/Gg4HQsXmsk17vtKk=
 =02xm
 -----END PGP SIGNATURE-----

Merge tag 'x86_misc_for_6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 updates from Dave Hansen:
 "As usual for this branch, these are super random: a compile fix for
  some newish LLVM checks and making sure a Kconfig text reference to
  'RSB' matches the normal definition:

   - Rework some CPU setup code to keep LLVM happy on 32-bit

   - Correct RSB terminology in Kconfig text"

* tag 'x86_misc_for_6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Make sure flag_is_changeable_p() is always being used
  x86/bugs: Correct RSB terminology in Kconfig
2024-11-22 12:52:03 -08:00
Linus Torvalds
be9318cd5a - Use vmalloc_array() instead of vmalloc()
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmc9NkQACgkQaDWVMHDJ
 krCs8g//dtj6LVV+RqAh62Xg3lwIJX3aGfjCJ3sZR22l8sE63FIdHKulmXVOtwUi
 b7APij265sYJwsPQQyD+3TXaR2oDM5SJ3ms/JAi0NiJ80sH7Vl8+wpk4KdKDoFNr
 CvK4cHpjfn05KuMCDT0pOmJui+aiOdMTdrrIvZhJY7C1hHTKrSS4Vvte2MweuQWf
 p40FvF3PIH3fHoB1jiXsgN06LsVv07hwZxbahVqopQT0uQmyTpOx2tzJ4Vq9emjd
 e3xPw4EWP65OF4YTnYBtTOcGPk9HwKLyiNejtou1NNPIXSSqFSMIZQRgEMQqricf
 y+ptxGB932VE8Zi6ys07dAuDbzTDh0J32ly6NxdnZ2aTSumZwz35WnpfgrKXBrAc
 gHxYWgaM9ou8ctIOX5DWm5NCNUBXMh42OQ1aS458DTMsFmQ/GTBgywaU8BVjrtVW
 NTCJg5E1JefkRRzORhEksJUByW+HRL6zPf3kPBZAaMF55cIWowaZU0TuNiI9J2Gf
 42Bw0Jv3YGWDrfaBi6sN4eVXfIKfRine3hrEriKemyP2zUyHR3DPBkYF06dQN8M8
 vXchAORRo2uGmmtFO2ACjmIsFMDrgj/O3QfMmcOo6ApLRN1QRhWhC3kd2yJnjSYK
 LR7eFjPlUMvqcAJW6igXaQVfNxHk04rMK0GoWZ7cplqNhS/zcYw=
 =grp1
 -----END PGP SIGNATURE-----

Merge tag 'x86_sgx_for_6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull sgx update from Dave Hansen:

 - Use vmalloc_array() instead of vmalloc()

* tag 'x86_sgx_for_6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sgx: Use vmalloc_array() instead of vmalloc()
2024-11-22 12:50:00 -08:00
Linus Torvalds
28eb75e178 drm for 6.13-rc1
core:
 - split DSC helpers from DP helpers
 - clang build fixes for drm/mm test
 - drop simple pipeline support for gem vram
 - document submission error signaling
 - move drm_rect to drm core module from kms helper
 - add default client setup to most drivers
 - move to video aperture helpers instead of drm ones
 
 tests:
 - new framebuffer tests
 
 ttm:
 - remove swapped and pinned BOs from TTM lru
 
 panic:
 - fix uninit spinlock
 - add ABGR2101010 support
 
 bridge:
 - add TI TDP158 support
 - use standard PM OPS
 
 dma-fence:
 - use read_trylock instead of read_lock to help lockdep
 
 scheduler:
 - add errno to sched start to report different errors
 - add locking to drm_sched_entity_modify_sched
 - improve documentation
 
 xe:
 - add drm_line_printer
 - lots of refactoring
 - Enable Xe2 + PES disaggregation
 - add new ARL PCI ID
 - SRIOV development work
 - fix exec unnecessary implicit fence
 - define and parse OA sync props
 - forcewake refactoring
 
 i915:
 - Enable BMG/LNL ultra joiner
 - Enable 10bpx + CCS scanout on ICL+, fp16/CCS on TGL+
 - use DSB for plane/color mgmt
 - Arrow lake PCI IDs
 - lots of i915/xe display refactoring
 - enable PXP GuC autoteardown
 - Pantherlake (PTL) Xe3 LPD display enablement
 - Allow fastset HDR infoframe changes
 - write DP source OUI for non-eDP sinks
 - share PCI IDs between i915 and xe
 
 amdgpu:
 - SDMA queue reset support
 - SMU 13.0.6, JPEG 4.0.3 updates
 - Initial runtime repartitioning support
 - rework IP structs for multiple IP instances
 - Fetch EDID from _DDC if available
 - SMU13 zero rpm user control
 - lots of fixes/cleanups
 
 amdkfd:
 - Increase event FIFO size
 - add topology cap flag for per queue reset
 
 msm:
 - DPU:
 - SA8775P support
 - (disabled by default) MSM8917, MSM8937, MSM8953 and MSM8996 support
 - Enable large framebuffer support
 - Drop MSM8998 and SDM845
 - DP:
 - SA8775P support
 - GPU:
 - a7xx preemption support
 - Adreno A663 support
 
 ast:
 - warn about unsupported TX chips
 
 ivpu:
 - add coredump
 - add pantherlake support
 
 rockchip:
 - 4K@60Hz display enablement
 - generate pll programming tables
 
 panthor:
 - add timestamp query API
 - add realtime group priority
 - add fdinfo support
 
 etnaviv:
 - improve handling of DMA address limits
 - improve GPU hangcheck
 
 exynos:
 - Decon Exynos7870 support
 
 mediatek:
 - add OF graph support
 
 omap:
 - locking fixes
 
 bochs:
 - convert to gem/shmem from simpledrm
 
 v3d:
 - support big/super pages
 - add gemfs
 
 vc4:
 - BCM2712 support refactoring
 - add YUV444 format support
 
 udmabuf:
 - folio related fixes
 
 nouveau:
 - add panic support on nv50+
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmc+efwACgkQDHTzWXnE
 hr6Dyg/9HVVI3lxuWAz9MEt3w+BON5KTJAxg5Zhvc5DwiUbDXghu8sfkUfanDWS5
 /MqyPqLt5srXrtKTRDnzEI0Vf8YHeiDEcaydjpshEpCfteHZ7SADpvem8fp6/otV
 iYt8U6tMcGe9I+M2kwDkOTrKJIiyCKPi5hfBIAkxEAh6806ifPRtLkeMGbaSwBxH
 x6kZTE9ygGWAY7bAgbmVmm3JwrXG9mYDl9dW3cbi9gZ6PGAXHPZRUPvZoHhvfC2A
 UVgROH76Spm4rdWYGI3azj+gW3HsdGgUHcysb+lu37i261E+sT7kuV2UYtnOMzr5
 igO1RlQ+rcfPYLG4n+oNXDMu5d1OQXELrlQzXptym4Konpd7b/GSeVctWV0wHWuv
 nG8g7DWAFFnLAdeWqLZpf1Brze33h5+572D3BioWB4LYSEATjwoTwcBKsdRuc4Wk
 RHxjumCidybTdo/8EB1ElGlH39m/mDQA0scMlVhS/BuiIssfgcBRfltI8S3HzHcW
 YQYq6xH7F9E3shs3/TYbWR4clm66ZTnZV6ClDfGJolzyF/hbV0rsbeSpDelpooE8
 1Js7KuwVa+HvA4jtupY9vqxMTdXWwoGPfuUgKpOAreYibnd1T9Q1zVme/B1bUH05
 518IjiMGCxDnBvFWaPT9DcX4zg7pS3yzjw3hGkdz3reUqat0Gy8=
 =8cUI
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2024-11-21' of https://gitlab.freedesktop.org/drm/kernel

Pull drm updates from Dave Airlie:
 "There's a lot of rework, the panic helper support is being added to
  more drivers, v3d gets support for HW superpages, scheduler
  documentation, drm client and video aperture reworks, some new
  MAINTAINERS added, amdgpu has the usual lots of IP refactors, Intel
  has some Pantherlake enablement and xe is getting some SRIOV bits, but
  just lots of stuff everywhere.

  core:
   - split DSC helpers from DP helpers
   - clang build fixes for drm/mm test
   - drop simple pipeline support for gem vram
   - document submission error signaling
   - move drm_rect to drm core module from kms helper
   - add default client setup to most drivers
   - move to video aperture helpers instead of drm ones

  tests:
   - new framebuffer tests

  ttm:
   - remove swapped and pinned BOs from TTM lru

  panic:
   - fix uninit spinlock
   - add ABGR2101010 support

  bridge:
   - add TI TDP158 support
   - use standard PM OPS

  dma-fence:
   - use read_trylock instead of read_lock to help lockdep

  scheduler:
   - add errno to sched start to report different errors
   - add locking to drm_sched_entity_modify_sched
   - improve documentation

  xe:
   - add drm_line_printer
   - lots of refactoring
   - Enable Xe2 + PES disaggregation
   - add new ARL PCI ID
   - SRIOV development work
   - fix exec unnecessary implicit fence
   - define and parse OA sync props
   - forcewake refactoring

  i915:
   - Enable BMG/LNL ultra joiner
   - Enable 10bpx + CCS scanout on ICL+, fp16/CCS on TGL+
   - use DSB for plane/color mgmt
   - Arrow lake PCI IDs
   - lots of i915/xe display refactoring
   - enable PXP GuC autoteardown
   - Pantherlake (PTL) Xe3 LPD display enablement
   - Allow fastset HDR infoframe changes
   - write DP source OUI for non-eDP sinks
   - share PCI IDs between i915 and xe

  amdgpu:
   - SDMA queue reset support
   - SMU 13.0.6, JPEG 4.0.3 updates
   - Initial runtime repartitioning support
   - rework IP structs for multiple IP instances
   - Fetch EDID from _DDC if available
   - SMU13 zero rpm user control
   - lots of fixes/cleanups

  amdkfd:
   - Increase event FIFO size
   - add topology cap flag for per queue reset

  msm:
   - DPU:
      - SA8775P support
      - (disabled by default) MSM8917, MSM8937, MSM8953 and MSM8996 support
      - Enable large framebuffer support
      - Drop MSM8998 and SDM845
   - DP:
      - SA8775P support
   - GPU:
      - a7xx preemption support
      - Adreno A663 support

  ast:
   - warn about unsupported TX chips

  ivpu:
   - add coredump
   - add pantherlake support

  rockchip:
   - 4K@60Hz display enablement
   - generate pll programming tables

  panthor:
   - add timestamp query API
   - add realtime group priority
   - add fdinfo support

  etnaviv:
   - improve handling of DMA address limits
   - improve GPU hangcheck

  exynos:
   - Decon Exynos7870 support

  mediatek:
   - add OF graph support

  omap:
   - locking fixes

  bochs:
   - convert to gem/shmem from simpledrm

  v3d:
   - support big/super pages
   - add gemfs

  vc4:
   - BCM2712 support refactoring
   - add YUV444 format support

  udmabuf:
   - folio related fixes

  nouveau:
   - add panic support on nv50+"

* tag 'drm-next-2024-11-21' of https://gitlab.freedesktop.org/drm/kernel: (1583 commits)
  drm/xe/guc: Fix dereference before NULL check
  drm/amd: Fix initialization mistake for NBIO 7.7.0
  Revert "drm/amd/display: parse umc_info or vram_info based on ASIC"
  drm/amd/display: Fix failure to read vram info due to static BP_RESULT
  drm/amdgpu: enable GTT fallback handling for dGPUs only
  drm/amd/amdgpu: limit single process inside MES
  drm/fourcc: add AMD_FMT_MOD_TILE_GFX9_4K_D_X
  drm/amdgpu/mes12: correct kiq unmap latency
  drm/amdgpu: Support vcn and jpeg error info parsing
  drm/amd : Update MES API header file for v11 & v12
  drm/amd/amdkfd: add/remove kfd queues on start/stop KFD scheduling
  drm/amdkfd: change kfd process kref count at creation
  drm/amdgpu: Cleanup shift coding style
  drm/amd/amdgpu: Increase MES log buffer to dump mes scratch data
  drm/amdgpu: Implement virt req_ras_err_count
  drm/amdgpu: VF Query RAS Caps from Host if supported
  drm/amdgpu: Add msg handlers for SRIOV RAS Telemetry
  drm/amdgpu: Update SRIOV Exchange Headers for RAS Telemetry Support
  drm/amd/display: 3.2.309
  drm/amd/display: Adjust VSDB parser for replay feature
  ...
2024-11-21 14:56:17 -08:00
Linus Torvalds
e6de688e93 Devicetree updates for v6.13:
Bindings:
 
 - Enable dtc "interrupt_provider" warnings for binding examples.
   Fix the warnings in fsl,mu-msi and ti,sci-inta due to this.
 
 - Convert zii,rave-sp-wdt, zii,rave-sp-pwrbutton,  and
   altr,fpga-passive-serial to DT schema format
 
 - Add some documentation on the different forms of YAML text blocks
   which are a constant source of review comments
 
 - Fix some schema errors in constraints for arrays
 
 - Add compatibles for qcom,sar2130p-pdc and onnn,adt7462
 
 DT core:
 
 - Allow overlay kunit tests to run CONFIG_OF_OVERLAY=n
 
 - Add some warnings on deprecated address handling
 
 - Rework early_init_dt_scan() so the arch can pass in the phys address
   of the DTB as __pa() is not always valid to use. This fixes a warning
   for arm64 with kexec.
 
 - Add and use some new DT graph iterators for iterating over ports and
   endpoints
 
 - Rework reserved-memory handling to be sized dynamically for fixed
   regions
 
 - Optimize of_modalias() to avoid a strlen() call
 
 - Constify struct device_node and property pointers where ever possible
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAmc7qaoACgkQ+vtdtY28
 YcN54g/+Ifz4hQTSWV+VBhihovMMPiQUdxZ+MfJfPnPcZ7NJzaTf+zqhZyS4wQou
 v0pdtyR0B1fCM/EvKaYD+1aTTAQFEIT5Dqac+9ePwqaYqSk+yCTxyzW9m+P3rTPV
 THo8SGRss7T+Rs+2WaUGxphTJItMGIRdbBvoqK+82EdKFXXKw2BSD8tlJTWwbTam
 9xkrpUzw7f4FvVY8vVhRyOd5i8/v+FH8D65DMIT6ME9zRn4MzKVzCg6udgYeCBld
 C2XbV+wnyewtjrN2IX+2uQ2mheb7yJu3AEI3iFR5x/sRrsSLpisxrUl38xOOpxrM
 XxYtHgE3omjagQ+y+L2PMthlKvhFrXVXIvhUH8xxje5z1Vyq3VMfiABkHlMpAnys
 5LY4xEhvqDkPNo65UmjMiHxGW/xtcKsmAZBOp+HLerZfCJIFvl380fi8mNg/Sjvz
 7ExCSpzCPsHASZg7QCTplU3BUtg+067Ch/k8Hsn/Og73Pqm3xH4IezQZKwweN9ZT
 LC6OQBI7C3Yt1hom9qgUcA4H4/aaPxTVV7i0DGuAKh8Lon6SaoX2yFpweUBgbsL/
 c9DIW4vbYBIGASxxUbHlNMKvPCKACKmpFXhsnH5Waj+VWSOwsJ8bjGpH8PfMKdFW
 dyJB/r94GqCGpCW7+FC1qGmXiQJGkCo89pKBVjSf4Kj45ht/76o=
 =NCYS
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree updates from Rob Herring:
 "Bindings:

   - Enable dtc "interrupt_provider" warnings for binding examples. Fix
     the warnings in fsl,mu-msi and ti,sci-inta due to this.

   - Convert zii,rave-sp-wdt, zii,rave-sp-pwrbutton, and
     altr,fpga-passive-serial to DT schema format

   - Add some documentation on the different forms of YAML text blocks
     which are a constant source of review comments

   - Fix some schema errors in constraints for arrays

   - Add compatibles for qcom,sar2130p-pdc and onnn,adt7462

  DT core:

   - Allow overlay kunit tests to run CONFIG_OF_OVERLAY=n

   - Add some warnings on deprecated address handling

   - Rework early_init_dt_scan() so the arch can pass in the phys
     address of the DTB as __pa() is not always valid to use. This fixes
     a warning for arm64 with kexec.

   - Add and use some new DT graph iterators for iterating over ports
     and endpoints

   - Rework reserved-memory handling to be sized dynamically for fixed
     regions

   - Optimize of_modalias() to avoid a strlen() call

   - Constify struct device_node and property pointers where ever
     possible"

* tag 'devicetree-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (36 commits)
  of: Allow overlay kunit tests to run CONFIG_OF_OVERLAY=n
  dt-bindings: interrupt-controller: qcom,pdc: Add SAR2130P compatible
  of/address: Rework bus matching to avoid warnings
  of: WARN on deprecated #address-cells/#size-cells handling
  of/fdt: Don't use default address cell sizes for address translation
  dt-bindings: Enable dtc "interrupt_provider" warnings
  of/fdt: add dt_phys arg to early_init_dt_scan and early_init_dt_verify
  dt-bindings: cache: qcom,llcc: Fix X1E80100 reg entries
  dt-bindings: watchdog: convert zii,rave-sp-wdt.txt to yaml format
  dt-bindings: input: convert zii,rave-sp-pwrbutton.txt to yaml
  media: xilinx-tpg: use new of_graph functions
  fbdev: omapfb: use new of_graph functions
  gpu: drm: omapdrm: use new of_graph functions
  ASoC: audio-graph-card2: use new of_graph functions
  ASoC: audio-graph-card: use new of_graph functions
  ASoC: test-component: use new of_graph functions
  of: property: use new of_graph functions
  of: property: add of_graph_get_next_port_endpoint()
  of: property: add of_graph_get_next_port()
  of: module: remove strlen() call in of_modalias()
  ...
2024-11-20 13:19:25 -08:00
Linus Torvalds
aad3a0d084 ftrace updates for v6.13:
- Merged tag ftrace-v6.12-rc4
 
   There was a fix to locking in register_ftrace_graph() for shadow stacks
   that was sent upstream. But this code was also being rewritten, and the
   locking fix was needed. Merging this fix was required to continue the
   work.
 
 - Restructure the function graph shadow stack to prepare it for use with
   kretprobes
 
   With the goal of merging the shadow stack logic of function graph and
   kretprobes, some more restructuring of the function shadow stack is
   required.
 
   Move out function graph specific fields from the fgraph infrastructure and
   store it on the new stack variables that can pass data from the entry
   callback to the exit callback.
 
   Hopefully, with this change, the merge of kretprobes to use fgraph shadow
   stacks will be ready by the next merge window.
 
 - Make shadow stack 4k instead of using PAGE_SIZE.
 
   Some architectures have very large PAGE_SIZE values which make its use for
   shadow stacks waste a lot of memory.
 
 - Give shadow stacks its own kmem cache.
 
   When function graph is started, every task on the system gets a shadow
   stack. In the future, shadow stacks may not be 4K in size. Have it have
   its own kmem cache so that whatever size it becomes will still be
   efficient in allocations.
 
 - Initialize profiler graph ops as it will be needed for new updates to fgraph
 
 - Convert to use guard(mutex) for several ftrace and fgraph functions
 
 - Add more comments and documentation
 
 - Show function return address in function graph tracer
 
   Add an option to show the caller of a function at each entry of the
   function graph tracer, similar to what the function tracer does.
 
 - Abstract out ftrace_regs from being used directly like pt_regs
 
   ftrace_regs was created to store a partial pt_regs. It holds only the
   registers and stack information to get to the function arguments and
   return values. On several archs, it is simply a wrapper around pt_regs.
   But some users would access ftrace_regs directly to get the pt_regs which
   will not work on all archs. Make ftrace_regs an abstract structure that
   requires all access to its fields be through accessor functions.
 
 - Show how long it takes to do function code modifications
 
   When code modification for function hooks happen, it always had the time
   recorded in how long it took to do the conversion. But this value was
   never exported. Recently the code was touched due to new ROX modification
   handling that caused a large slow down in doing the modifications and
   had a significant impact on boot times.
 
   Expose the timings in the dyn_ftrace_total_info file. This file was
   created a while ago to show information about memory usage and such to
   implement dynamic function tracing. It's also an appropriate file to store
   the timings of this modification as well. This will make it easier to see
   the impact of changes to code modification on boot up timings.
 
 - Other clean ups and small fixes
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZztrUxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qnnNAQD6w4q9VQ7oOE2qKLqtnj87h4c1GqKn
 SPkpEfC3n/ATEAD/fnYjT/eOSlHiGHuD/aTA+U/bETrT99bozGM/4mFKEgY=
 =6nCa
 -----END PGP SIGNATURE-----

Merge tag 'ftrace-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull ftrace updates from Steven Rostedt:

 - Restructure the function graph shadow stack to prepare it for use
   with kretprobes

   With the goal of merging the shadow stack logic of function graph and
   kretprobes, some more restructuring of the function shadow stack is
   required.

   Move out function graph specific fields from the fgraph
   infrastructure and store it on the new stack variables that can pass
   data from the entry callback to the exit callback.

   Hopefully, with this change, the merge of kretprobes to use fgraph
   shadow stacks will be ready by the next merge window.

 - Make shadow stack 4k instead of using PAGE_SIZE.

   Some architectures have very large PAGE_SIZE values which make its
   use for shadow stacks waste a lot of memory.

 - Give shadow stacks its own kmem cache.

   When function graph is started, every task on the system gets a
   shadow stack. In the future, shadow stacks may not be 4K in size.
   Have it have its own kmem cache so that whatever size it becomes will
   still be efficient in allocations.

 - Initialize profiler graph ops as it will be needed for new updates to
   fgraph

 - Convert to use guard(mutex) for several ftrace and fgraph functions

 - Add more comments and documentation

 - Show function return address in function graph tracer

   Add an option to show the caller of a function at each entry of the
   function graph tracer, similar to what the function tracer does.

 - Abstract out ftrace_regs from being used directly like pt_regs

   ftrace_regs was created to store a partial pt_regs. It holds only the
   registers and stack information to get to the function arguments and
   return values. On several archs, it is simply a wrapper around
   pt_regs. But some users would access ftrace_regs directly to get the
   pt_regs which will not work on all archs. Make ftrace_regs an
   abstract structure that requires all access to its fields be through
   accessor functions.

 - Show how long it takes to do function code modifications

   When code modification for function hooks happen, it always had the
   time recorded in how long it took to do the conversion. But this
   value was never exported. Recently the code was touched due to new
   ROX modification handling that caused a large slow down in doing the
   modifications and had a significant impact on boot times.

   Expose the timings in the dyn_ftrace_total_info file. This file was
   created a while ago to show information about memory usage and such
   to implement dynamic function tracing. It's also an appropriate file
   to store the timings of this modification as well. This will make it
   easier to see the impact of changes to code modification on boot up
   timings.

 - Other clean ups and small fixes

* tag 'ftrace-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (22 commits)
  ftrace: Show timings of how long nop patching took
  ftrace: Use guard to take ftrace_lock in ftrace_graph_set_hash()
  ftrace: Use guard to take the ftrace_lock in release_probe()
  ftrace: Use guard to lock ftrace_lock in cache_mod()
  ftrace: Use guard for match_records()
  fgraph: Use guard(mutex)(&ftrace_lock) for unregister_ftrace_graph()
  fgraph: Give ret_stack its own kmem cache
  fgraph: Separate size of ret_stack from PAGE_SIZE
  ftrace: Rename ftrace_regs_return_value to ftrace_regs_get_return_value
  selftests/ftrace: Fix check of return value in fgraph-retval.tc test
  ftrace: Use arch_ftrace_regs() for ftrace_regs_*() macros
  ftrace: Consolidate ftrace_regs accessor functions for archs using pt_regs
  ftrace: Make ftrace_regs abstract from direct use
  fgragh: No need to invoke the function call_filter_check_discard()
  fgraph: Simplify return address printing in function graph tracer
  function_graph: Remove unnecessary initialization in ftrace_graph_ret_addr()
  function_graph: Support recording and printing the function return address
  ftrace: Have calltime be saved in the fgraph storage
  ftrace: Use a running sleeptime instead of saving on shadow stack
  fgraph: Use fgraph data to store subtime for profiler
  ...
2024-11-20 11:34:10 -08:00
Linus Torvalds
a0e752bda2 Probes update for v6.13:
Kprobes cleanups. Functionality does not change.
 - kprobes: Cleanup the config comment
   Adjust #endif comments.
 - kprobes: Cleanup collect_one_slot() and __disable_kprobe()
   Make fail fast to reduce code nested level.
 - kprobes: Use struct_size() in __get_insn_slot()
   Use struct_size() to avoid special macro.
 - x86/kprobes: Cleanup kprobes on ftrace code
   Use macro instead of direct field access/magic number, and avoid
   redundant instruction pointer setting.
 -----BEGIN PGP SIGNATURE-----
 
 iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmc6vhwbHG1hc2FtaS5o
 aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8bxowIALFYrdLV2ofWRy7/lNkP
 6Bv1DkBQ/Xy/ABZ4lAqdgTZrf7Cz8TdPZUL1UOowxW3Cl09PYcpqlUlw/XldvI5j
 fukkwL9rXNgJfYbau+QG9E5c7mNakexDLBKCZGvnDDuKj0f1aauhwZmpJbNgz1Y6
 dUgfFgDJXSArnVKxfZvOhL1tbxYPJUhzNc339p8PVD8r/OUKEZo2EReds3DM40Zq
 wtwyKqWmawTjRud0ZtgkaWiK1d+QKa07h+GnXi1wUy98A2yGp3fcLuxvjBUMqsCD
 uzWkY3MikXIZJ/ijxUsMGBRisD4ozqozlQ4wIxCuahRntl9b/d9jXqKY7RTvy6Vw
 r+Y=
 =n4ST
 -----END PGP SIGNATURE-----

Merge tag 'probes-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull probes updates from Masami Hiramatsu:
 "Kprobes cleanups. Functionality does not change.

   - kprobes: Cleanup the config comment

     Adjust #endif comments.

   - kprobes: Cleanup collect_one_slot() and __disable_kprobe()

     Make fail fast to reduce code nested level.

   - kprobes: Use struct_size() in __get_insn_slot()

     Use struct_size() to avoid special macro.

   - x86/kprobes: Cleanup kprobes on ftrace code

     Use macro instead of direct field access/magic number, and avoid
     redundant instruction pointer setting"

* tag 'probes-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  x86/kprobes: Cleanup kprobes on ftrace code
  kprobes: Use struct_size() in __get_insn_slot()
  kprobes: Cleanup collect_one_slot() and __disable_kprobe()
  kprobes: Cleanup the config comment
2024-11-20 09:36:05 -08:00
Linus Torvalds
0352387523 First step of consolidating the VDSO data page handling:
The VDSO data page handling is architecture specific for historical
   reasons, but there is no real technical reason to do so.
 
   Aside of that VDSO data has become a dump ground for various mechanisms
   and fail to provide a clear separation of the functionalities.
 
   Clean this up by:
 
     * consolidating the VDSO page data by getting rid of architecture
       specific warts especially in x86 and PowerPC.
 
     * removing the last includes of header files which are pulling in other
       headers outside of the VDSO namespace.
 
     * seperating timekeeping and other VDSO data accordingly.
 
   Further consolidation of the VDSO page handling is done in subsequent
   changes scheduled for the next merge window.
 
   This also lays the ground for expanding the VDSO time getters for
   independent PTP clocks in a generic way without making every architecture
   add support seperately.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmc7kyoTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoVBjD/9awdN2YeCGIM9rlHIktUdNRmRSL2SL
 6av1CPffN5DenONYTXWrDYPkC4yfjUwIs8H57uzFo10yA7RQ/Qfq+O68k5GnuFew
 jvpmmYSZ6TT21AmAaCIhn+kdl9YbEJFvN2AWH85Bl29k9FGB04VzJlQMMjfEZ1a5
 Mhwv+cfYNuPSZmU570jcxW2XgbyTWlLZBByXX/Tuz9bwpmtszba507bvo45x6gIP
 twaWNzrsyJpdXfMrfUnRiChN8jHlDN7I6fgQvpsoRH5FOiVwIFo0Ip2rKbk+ONfD
 W/rcU5oeqRIxRVDHzf2Sv8WPHMCLRv01ZHBcbJOtgvZC3YiKgKYoeEKabu9ZL1BH
 6VmrxjYOBBFQHOYAKPqBuS7BgH5PmtMbDdSZXDfRaAKaCzhCRysdlWW7z48r2R//
 zPufb7J6Tle23AkuZWhFjvlGgSBl4zxnTFn31HYOyQps3TMI4y50Z2DhE/EeU8a6
 DRl8/k1KQVDUZ6udJogS5kOr1J8pFtUPrA2uhR8UyLdx7YKiCzcdO1qWAjtXlVe8
 oNpzinU+H9bQqGe9IyS7kCG9xNaCRZNkln5Q1WfnkTzg5f6ihfaCvIku3l4bgVpw
 3HmcxYiC6RxQB+ozwN7hzCCKT4L9aMhr/457TNOqRkj2Elw3nvJ02L4aI86XAKLE
 jwO9Fkp9qcCxCw==
 =q5eD
 -----END PGP SIGNATURE-----

Merge tag 'timers-vdso-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull vdso data page handling updates from Thomas Gleixner:
 "First steps of consolidating the VDSO data page handling.

  The VDSO data page handling is architecture specific for historical
  reasons, but there is no real technical reason to do so.

  Aside of that VDSO data has become a dump ground for various
  mechanisms and fail to provide a clear separation of the
  functionalities.

  Clean this up by:

   - consolidating the VDSO page data by getting rid of architecture
     specific warts especially in x86 and PowerPC.

   - removing the last includes of header files which are pulling in
     other headers outside of the VDSO namespace.

   - seperating timekeeping and other VDSO data accordingly.

  Further consolidation of the VDSO page handling is done in subsequent
  changes scheduled for the next merge window.

  This also lays the ground for expanding the VDSO time getters for
  independent PTP clocks in a generic way without making every
  architecture add support seperately"

* tag 'timers-vdso-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits)
  x86/vdso: Add missing brackets in switch case
  vdso: Rename struct arch_vdso_data to arch_vdso_time_data
  powerpc: Split systemcfg struct definitions out from vdso
  powerpc: Split systemcfg data out of vdso data page
  powerpc: Add kconfig option for the systemcfg page
  powerpc/pseries/lparcfg: Use num_possible_cpus() for potential processors
  powerpc/pseries/lparcfg: Fix printing of system_active_processors
  powerpc/procfs: Propagate error of remap_pfn_range()
  powerpc/vdso: Remove offset comment from 32bit vdso_arch_data
  x86/vdso: Split virtual clock pages into dedicated mapping
  x86/vdso: Delete vvar.h
  x86/vdso: Access vdso data without vvar.h
  x86/vdso: Move the rng offset to vsyscall.h
  x86/vdso: Access rng vdso data without vvar.h
  x86/vdso: Access timens vdso data without vvar.h
  x86/vdso: Allocate vvar page from C code
  x86/vdso: Access rng data from kernel without vvar
  x86/vdso: Place vdso_data at beginning of vvar page
  x86/vdso: Use __arch_get_vdso_data() to access vdso data
  x86/mm/mmap: Remove arch_vma_name()
  ...
2024-11-19 16:09:13 -08:00
Linus Torvalds
5c2b050848 A set of updates for the interrupt subsystem:
- Tree wide:
 
     * Make nr_irqs static to the core code and provide accessor functions
       to remove existing and prevent future aliasing problems with local
       variables or function arguments of the same name.
 
   - Core code:
 
     * Prevent freeing an interrupt in the devres code which is not managed
       by devres in the first place.
 
     * Use seq_put_decimal_ull_width() for decimal values output in
       /proc/interrupts which increases performance significantly as it
       avoids parsing the format strings over and over.
 
     * Optimize raising the timer and hrtimer soft interrupts by using the
       'set bit only' variants instead of the combined version which checks
       whether ksoftirqd should be woken up. The latter is a pointless
       exercise as both soft interrupts are raised in the context of the
       timer interrupt and therefore never wake up ksoftirqd.
 
     * Delegate timer/hrtimer soft interrupt processing to a dedicated thread
       on RT.
 
       Timer and hrtimer soft interrupts are always processed in ksoftirqd
       on RT enabled kernels. This can lead to high latencies when other
       soft interrupts are delegated to ksoftirqd as well.
 
       The separate thread allows to run them seperately under a RT
       scheduling policy to reduce the latency overhead.
 
   - Drivers:
 
     * New drivers or extensions of existing drivers to support Renesas
       RZ/V2H(P), Aspeed AST27XX, T-HEAD C900 and ATMEL sam9x7 interrupt
       chips
 
     * Support for multi-cluster GICs on MIPS.
 
       MIPS CPUs can come with multiple CPU clusters, where each CPU cluster
       has its own GIC (Generic Interrupt Controller). This requires to
       access the GIC of a remote cluster through a redirect register block.
 
       This is encapsulated into a set of helper functions to keep the
       complexity out of the actual code paths which handle the GIC details.
 
     * Support for encrypted guests in the ARM GICV3 ITS driver
 
       The ITS page needs to be shared with the hypervisor and therefore
       must be decrypted.
 
     * Small cleanups and fixes all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmc7ggcTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoaf7D/9G6FgJXx/60zqnpnOr9Yx0hxjaI47x
 PFyCd3P05qyVMBYXfI99vrSKuVdMZXJ/fH5L83y+sOaTASyLTzg37igZycIDJzLI
 FnHh/m/+UA8k2aIC5VUiNAjne2RLaTZiRN15uEHFVjByC5Y+YTlCNUE4BBhg5RfQ
 hKmskeffWdtui3ou13CSNvbFn+pmqi4g6n1ysUuLhiwM2E5b1rZMprcCOnun/cGP
 IdUQsODNWTTv9eqPJez985M6A1x2SCGNv7Z73h58B9N0pBRPEC1xnhUnCJ1sA0cJ
 pnfde2C1lztEjYbwDngy0wgq0P6LINjQ5Ma2YY2F2hTMsXGJxGPDZm24/u5uR46x
 N/gsOQMXqw6f5yvbiS7Asx9WzR6ry8rJl70QRgTyozz7xxJTaiNm2HqVFe2wc+et
 Q/BzaKdhmUJj1GMZmqD2rrgwYeDcb4wWYNtwjM4PVHHxYlJVq0mEF1kLLS8YDyjf
 HuGPVqtSkt3E0+Br3FKcv5ltUQP8clXbudc6L1u98YBfNK12hW8L+c3YSvIiFoYM
 ZOAeANPM7VtQbP2Jg2q81Dd3CShImt5jqL2um+l8g7+mUE7l9gyuO/w/a5dQ57+b
 kx7mHHIW2zCeHrkZZbRUYzI2BJfMCCOVN4Ax5OZxTLnLsL9VEehy8NM8QYT4TS8R
 XmTOYW3U9XR3gw==
 =JqxC
 -----END PGP SIGNATURE-----

Merge tag 'irq-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull interrupt subsystem updates from Thomas Gleixner:
 "Tree wide:

   - Make nr_irqs static to the core code and provide accessor functions
     to remove existing and prevent future aliasing problems with local
     variables or function arguments of the same name.

  Core code:

   - Prevent freeing an interrupt in the devres code which is not
     managed by devres in the first place.

   - Use seq_put_decimal_ull_width() for decimal values output in
     /proc/interrupts which increases performance significantly as it
     avoids parsing the format strings over and over.

   - Optimize raising the timer and hrtimer soft interrupts by using the
     'set bit only' variants instead of the combined version which
     checks whether ksoftirqd should be woken up. The latter is a
     pointless exercise as both soft interrupts are raised in the
     context of the timer interrupt and therefore never wake up
     ksoftirqd.

   - Delegate timer/hrtimer soft interrupt processing to a dedicated
     thread on RT.

     Timer and hrtimer soft interrupts are always processed in ksoftirqd
     on RT enabled kernels. This can lead to high latencies when other
     soft interrupts are delegated to ksoftirqd as well.

     The separate thread allows to run them seperately under a RT
     scheduling policy to reduce the latency overhead.

  Drivers:

   - New drivers or extensions of existing drivers to support Renesas
     RZ/V2H(P), Aspeed AST27XX, T-HEAD C900 and ATMEL sam9x7 interrupt
     chips

   - Support for multi-cluster GICs on MIPS.

     MIPS CPUs can come with multiple CPU clusters, where each CPU
     cluster has its own GIC (Generic Interrupt Controller). This
     requires to access the GIC of a remote cluster through a redirect
     register block.

     This is encapsulated into a set of helper functions to keep the
     complexity out of the actual code paths which handle the GIC
     details.

   - Support for encrypted guests in the ARM GICV3 ITS driver

     The ITS page needs to be shared with the hypervisor and therefore
     must be decrypted.

   - Small cleanups and fixes all over the place"

* tag 'irq-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
  irqchip/riscv-aplic: Prevent crash when MSI domain is missing
  genirq/proc: Use seq_put_decimal_ull_width() for decimal values
  softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT.
  timers: Use __raise_softirq_irqoff() to raise the softirq.
  hrtimer: Use __raise_softirq_irqoff() to raise the softirq
  riscv: defconfig: Enable T-HEAD C900 ACLINT SSWI drivers
  irqchip: Add T-HEAD C900 ACLINT SSWI driver
  dt-bindings: interrupt-controller: Add T-HEAD C900 ACLINT SSWI device
  irqchip/stm32mp-exti: Use of_property_present() for non-boolean properties
  irqchip/mips-gic: Fix selection of GENERIC_IRQ_EFFECTIVE_AFF_MASK
  irqchip/mips-gic: Prevent indirect access to clusters without CPU cores
  irqchip/mips-gic: Multi-cluster support
  irqchip/mips-gic: Setup defaults in each cluster
  irqchip/mips-gic: Support multi-cluster in for_each_online_cpu_gic()
  irqchip/mips-gic: Replace open coded online CPU iterations
  genirq/irqdesc: Use str_enabled_disabled() helper in wakeup_show()
  genirq/devres: Don't free interrupt which is not managed by devres
  irqchip/gic-v3-its: Fix over allocation in itt_alloc_pool()
  irqchip/aspeed-intc: Add AST27XX INTC support
  dt-bindings: interrupt-controller: Add support for ASPEED AST27XX INTC
  ...
2024-11-19 15:54:19 -08:00
Linus Torvalds
89c45f3823 x86 cleanups for v6.13:
- x86/boot: Remove unused function atou() (Dr. David Alan Gilbert)
   - x86/cpu: Use str_yes_no() helper in show_cpuinfo_misc() (Thorsten Blum)
   - x86/platform: Switch back to struct platform_driver::remove() (Uwe Kleine-König)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmc7gbsRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gk4g//QiV5NS8d5mc+6q/UbVahST5Mf3Yktg7/
 LIgzJyqUMZctkbcASDbyTlFqZXGYfZ/ZY6VPEOuo0ANE57GfSbcuwe+gdvL9kXWQ
 9IN5Tst4BJ4iMC+FXYA+VgRAg5Ufy4xJFpw1ObnNtAQ4f2emCTeCMPonCrN8x737
 2dbHzgnJSBbDgG6lxcEJZ/PsoKoTLiq6uZZ98FRRrHCWmkPkNOn2IQx97ITdAKcj
 LnQkTYnTRIzap/eLBrqgxhhtM2R+3DrDClar3WWRSCoFMYg6c8aptidDBQcJhUKZ
 XIhb2enKBKFVtzahjPVOiqLE2i+JGGiPSyQnCpt7y2yHQUyDXkDWWMPdZKC/6WHf
 +2sLgEdl+pz2o3mGDMCkkG58bDgEf0+5gGva+M5F6h7GslYM9Gp0doOYZDt6l+FH
 3rrmXj2WKwI2kM0SWDwfgdLhhmPEsIm8DMOjOftEfR0pwqyUw5JmPfD1dnudrPX0
 wfpRavlUAzzW0IbWaL+S2VhqQnLhLwJ/T/Gb6Gi6LOkTaRuD42hu0tkctlfJ/VTR
 7Bk+mV2PHpBek3gWnKEUegYQ9LK9DGIucvYi693MWjXVFN6vSyNEsaefRHbT2rGi
 QuLnO2Laecg6SGNBjLYJBWjv72H2CohDRuJdFgzzJAEzebLwdCJE79qjqWRhpKAx
 sEzSy0+6waw=
 =l+KH
 -----END PGP SIGNATURE-----

Merge tag 'x86-cleanups-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cleanups from Ingo Molnar:

 - x86/boot: Remove unused function atou() (Dr. David Alan Gilbert)

 - x86/cpu: Use str_yes_no() helper in show_cpuinfo_misc() (Thorsten
   Blum)

 - x86/platform: Switch back to struct platform_driver::remove() (Uwe
   Kleine-König)

* tag 'x86-cleanups-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Remove unused function atou()
  x86/cpu: Use str_yes_no() helper in show_cpuinfo_misc()
  x86/platform: Switch back to struct platform_driver::remove()
2024-11-19 14:46:39 -08:00
Linus Torvalds
0892d74213 x86/splitlock changes for v6.13:
- Move Split and Bus lock code to a dedicated file (Ravi Bangoria)
  - Add split/bus lock support for AMD (Ravi Bangoria)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmc7gMERHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hEaQ//YRk2Dc3VkiwC+ZE44Bi4ZlztACzjvkL/
 sFjOqX4dSWJLMFDPfISGGEN4e20IFA46uYXwoZQOZEz5RY4tPaJYw+o1aBP5YYEN
 EEv4iRc20FIIYckkyCShP00dKoZlmb6FbxyUysRRwZW0XJuMVLyJnGNmZs0peVvt
 5c8+7erl0CPN9RaR66lULT4YenyvUZ7DChfeB3a1LbazC5+IrEumiIysLJUKj6zN
 075+FeQ084156sFR+LUSjblxLKzY/OqT/727osST2WlMo/HWLIJImCXodHMHG+LC
 dRI0NFFU9zn2G6rGcoltLNsU/TSJfaWoGS8pm6c96kItEZly/BFz5MF1IQIbCfDx
 YFJpil1zJQQeV3FUXldhKGoSio0fv0KWcqC0TLjj/DhqprjdktJGuGIX6ChmkytA
 TDLZPWZxInZdVnWVMBuaJ6defMRBLART02u9DRIoXYEX6aDLjJ1JFTRe5hU9vVab
 cq+GR3ZSeDM9gSGjfW6dGG5746KXX+Wwxv4stxSoygSxmrLPH38CrZ5m66edtKzq
 P+V2/utvhdHZSKawsIpM4Xz5u7fweySkVFQjJyEEeMWyXnfC+alP9OUsVTKS8mFa
 zKbX7mEgnBDcEE9w6O5itL4nIgB3Kooci5uEWDRTAYUee82Hqk09Ycyb5XQkJ7bs
 Cl65CoY+XAA=
 =QpKp
 -----END PGP SIGNATURE-----

Merge tag 'x86-splitlock-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 splitlock updates from Ingo Molnar:

 - Move Split and Bus lock code to a dedicated file (Ravi Bangoria)

 - Add split/bus lock support for AMD (Ravi Bangoria)

* tag 'x86-splitlock-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/bus_lock: Add support for AMD
  x86/split_lock: Move Split and Bus lock code to a dedicated file
2024-11-19 14:34:02 -08:00
Linus Torvalds
f41dac3efb Performance events changes for v6.13:
- Uprobes:
     - Add BPF session support (Jiri Olsa)
     - Switch to RCU Tasks Trace flavor for better performance (Andrii Nakryiko)
     - Massively increase uretprobe SMP scalability by SRCU-protecting
       the uretprobe lifetime (Andrii Nakryiko)
     - Kill xol_area->slot_count (Oleg Nesterov)
 
  - Core facilities:
     - Implement targeted high-frequency profiling by adding the ability
       for an event to "pause" or "resume" AUX area tracing (Adrian Hunter)
 
  - VM profiling/sampling:
     - Correct perf sampling with guest VMs (Colton Lewis)
 
  - New hardware support:
     - x86/intel: Add PMU support for Intel ArrowLake-H CPUs (Dapeng Mi)
 
  - Misc fixes and enhancements:
     - x86/intel/pt: Fix buffer full but size is 0 case (Adrian Hunter)
     - x86/amd: Warn only on new bits set (Breno Leitao)
     - x86/amd/uncore: Avoid a false positive warning about snprintf
                       truncation in amd_uncore_umc_ctx_init (Jean Delvare)
     - uprobes: Re-order struct uprobe_task to save some space (Christophe JAILLET)
     - x86/rapl: Move the pmu allocation out of CPU hotplug (Kan Liang)
     - x86/rapl: Clean up cpumask and hotplug (Kan Liang)
     - uprobes: Deuglify xol_get_insn_slot/xol_free_insn_slot paths (Oleg Nesterov)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmc7eKERHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1i57A/+KQ6TrIoICVTE+BPlDfUw8NU+N3DagVb0
 dzoyDxlDRsnsYzeXZipPn+3IitX1w+DrGxBNIojSoiFVCLnHIKgo4uHbj7cVrR7J
 fBTVSnoJ94SGAk5ySebvLwMLce/YhXBeHK2lx6W/pI6acNcxzDfIabjjETeqltUo
 g7hmT9lo10pzZEZyuUfYX9khlWBxda1dKHc9pMIq7baeLe4iz/fCGlJ0K4d4M4z3
 NPZw239Np6iHUwu3Lcs4gNKe4rcDe7Bt47hpedemHe0Y+7c4s2HaPxbXWxvDtE76
 mlsg93i28f8SYxeV83pREn0EOCptXcljhiek+US+GR7NSbltMnV+uUiDfPKIE9+Y
 vYP/DYF9hx73FsOucEFrHxYYcePorn3pne5/khBYWdQU6TnlrBYWpoLQsjgCKTTR
 4JhCFlBZ5cDpc6ihtpwCwVTQ4Q/H7vM1XOlDwx0hPhcIPPHDreaQD/wxo61jBdXf
 PY0EPAxh3BcQxfPYuDS+XiYjQ8qO8MtXMKz5bZyHBZlbHwccV6T4ExjsLKxFk5As
 6BG8pkBWLg7drXAgVdleIY0ux+34w/Zzv7gemdlQxvWLlZrVvpjiG93oU3PTpZeq
 A2UD9eAOuXVD6+HsF/dmn88sFmcLWbrMskFWujkvhEUmCvSGAnz3YSS/mLEawBiT
 2xI8xykNWSY=
 =ItOT
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull performance events updates from Ingo Molnar:
 "Uprobes:
    - Add BPF session support (Jiri Olsa)
    - Switch to RCU Tasks Trace flavor for better performance (Andrii
      Nakryiko)
    - Massively increase uretprobe SMP scalability by SRCU-protecting
      the uretprobe lifetime (Andrii Nakryiko)
    - Kill xol_area->slot_count (Oleg Nesterov)

  Core facilities:
    - Implement targeted high-frequency profiling by adding the ability
      for an event to "pause" or "resume" AUX area tracing (Adrian
      Hunter)

  VM profiling/sampling:
    - Correct perf sampling with guest VMs (Colton Lewis)

  New hardware support:
    - x86/intel: Add PMU support for Intel ArrowLake-H CPUs (Dapeng Mi)

  Misc fixes and enhancements:
    - x86/intel/pt: Fix buffer full but size is 0 case (Adrian Hunter)
    - x86/amd: Warn only on new bits set (Breno Leitao)
    - x86/amd/uncore: Avoid a false positive warning about snprintf
      truncation in amd_uncore_umc_ctx_init (Jean Delvare)
    - uprobes: Re-order struct uprobe_task to save some space
      (Christophe JAILLET)
    - x86/rapl: Move the pmu allocation out of CPU hotplug (Kan Liang)
    - x86/rapl: Clean up cpumask and hotplug (Kan Liang)
    - uprobes: Deuglify xol_get_insn_slot/xol_free_insn_slot paths (Oleg
      Nesterov)"

* tag 'perf-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
  perf/core: Correct perf sampling with guest VMs
  perf/x86: Refactor misc flag assignments
  perf/powerpc: Use perf_arch_instruction_pointer()
  perf/core: Hoist perf_instruction_pointer() and perf_misc_flags()
  perf/arm: Drop unused functions
  uprobes: Re-order struct uprobe_task to save some space
  perf/x86/amd/uncore: Avoid a false positive warning about snprintf truncation in amd_uncore_umc_ctx_init
  perf/x86/intel: Do not enable large PEBS for events with aux actions or aux sampling
  perf/x86/intel/pt: Add support for pause / resume
  perf/core: Add aux_pause, aux_resume, aux_start_paused
  perf/x86/intel/pt: Fix buffer full but size is 0 case
  uprobes: SRCU-protect uretprobe lifetime (with timeout)
  uprobes: allow put_uprobe() from non-sleepable softirq context
  perf/x86/rapl: Clean up cpumask and hotplug
  perf/x86/rapl: Move the pmu allocation out of CPU hotplug
  uprobe: Add support for session consumer
  uprobe: Add data pointer to consumer handlers
  perf/x86/amd: Warn only on new bits set
  uprobes: fold xol_take_insn_slot() into xol_get_insn_slot()
  uprobes: kill xol_area->slot_count
  ...
2024-11-19 13:34:06 -08:00
Linus Torvalds
9d7d4ad222 Objtool changes for v6.13:
- Detect non-relocated text references for more robust
    IBT sealing (Josh Poimboeuf)
 
  - Fix build error when building stripped down
    UAPI headers (HONG Yifan)
 
  - Exclude __tracepoints data from ENDBR checks to fix
    false positives on clang builds (Peter Zijlstra)
 
  - Fix ORC unwind for newly forked tasks (Zheng Yejian)
 
  - Fix readelf related faddr2line regression (Carlos Llamas)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmc7FYARHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1g5nQ//W6Hs6dUlQpHyq5af/QU7OHdxo8EQTvWR
 6i2Y+D/+N+Va69N3IaaCmiYkvO9AETl8IrVjur6XAHhHm5ylFAJ9AhRCs/DUcDqs
 OCzN263x2fH38GHf4WOc6mSUKJhR2/FN6/qfuf3bFKQytOciAZLn7GYRrtQdFd5v
 8rRbETMaqkRexEfFukEJr3gPggy7NGcyBhOz2RZHiEO3aUQxkhVcHkP2Sr2pUDkb
 8e+X30wYMzMbi9ZhBuu4prb4L3GPfDRIuPvBQfldQvMuayKJlZ10NMnlzDfFrFDd
 R34DXhELlheWGxBGGj9Rq2GbQLpjneZDOX7i8XtuaHljiYfUaQyAMdsLkvMiKnmb
 44iHIOEExm0MKcnMO3dWSYXPCT2bBiqnvFnh3J2eWBaXF9i9an7/b54xBnNpC3Al
 KYJWmOiDzr6NZ8UyDclqaNc0Zv31fBPYAji0T5gSFe2qsN6XpyjNOmyl0vH61eEI
 WQfYBjopbVsMW/7Dh95qXwK55D94QUhSba4yZepzwd9meOlgnO3QNNx2MwciGG7i
 G1TKPdsT8ndbkaDuk6iTsrGi5UpGhDyQ45agXM7w/K3EXMQLEP6eBno/U213jvQV
 kPBp0X0EbD7LezDsvC0Q4khfNSDELVq8F493ctQCOzqmQa4ypAwVz0HevZQsXSxO
 7bCNMxA7Klk=
 =eio4
 -----END PGP SIGNATURE-----

Merge tag 'objtool-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool updates from Ingo Molnar:

 - Detect non-relocated text references for more robust
   IBT sealing (Josh Poimboeuf)

 - Fix build error when building stripped down
   UAPI headers (HONG Yifan)

 - Exclude __tracepoints data from ENDBR checks to fix
   false positives on clang builds (Peter Zijlstra)

 - Fix ORC unwind for newly forked tasks (Zheng Yejian)

 - Fix readelf related faddr2line regression (Carlos Llamas)

* tag 'objtool-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Exclude __tracepoints data from ENDBR checks
  Revert "scripts/faddr2line: Check only two symbols when calculating symbol size"
  x86/unwind/orc: Fix unwind for newly forked tasks
  objtool: Also include tools/include/uapi
  objtool: Detect non-relocated text references
2024-11-19 13:27:52 -08:00
Linus Torvalds
364eeb79a2 Locking changes for v6.13 are:
- lockdep:
     - Enable PROVE_RAW_LOCK_NESTING with PROVE_LOCKING (Sebastian Andrzej Siewior)
     - Add lockdep_cleanup_dead_cpu() (David Woodhouse)
 
  - futexes:
     - Use atomic64_inc_return() in get_inode_sequence_number() (Uros Bizjak)
     - Use atomic64_try_cmpxchg_relaxed() in get_inode_sequence_number() (Uros Bizjak)
 
  - RT locking:
     - Add sparse annotation PREEMPT_RT's locking (Sebastian Andrzej Siewior)
 
  - spinlocks:
     - Use atomic_try_cmpxchg_release() in osq_unlock() (Uros Bizjak)
 
  - atomics:
     - x86: Use ALT_OUTPUT_SP() for __alternative_atomic64() (Uros Bizjak)
     - x86: Use ALT_OUTPUT_SP() for __arch_{,try_}cmpxchg64_emu() (Uros Bizjak)
 
  - KCSAN, seqlocks:
     - Support seqcount_latch_t (Marco Elver)
 
  - <linux/cleanup.h>:
     - Add if_not_cond_guard() conditional guard helper (David Lechner)
     - Adjust scoped_guard() macros to avoid potential warning (Przemek Kitszel)
     - Remove address space of returned pointer (Uros Bizjak)
 
  - WW mutexes:
     - locking/ww_mutex: Adjust to lockdep nest_lock requirements (Thomas Hellström)
 
  - Rust integration:
     - Fix raw_spin_lock initialization on PREEMPT_RT (Eder Zulian)
 
  - miscellaneous cleanups & fixes:
     - lockdep: Fix wait-type check related warnings (Ahmed Ehab)
     - lockdep: Use info level for initial info messages (Jiri Slaby)
     - spinlocks: Make __raw_* lock ops static (Geert Uytterhoeven)
     - pvqspinlock: Convert fields of 'enum vcpu_state' to uppercase (Qiuxu Zhuo)
     - iio: magnetometer: Fix if () scoped_guard() formatting (Stephen Rothwell)
     - rtmutex: Fix misleading comment (Peter Zijlstra)
     - percpu-rw-semaphores: Fix grammar in percpu-rw-semaphore.rst (Xiu Jianfeng)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmc7AkQRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hGqQ/+KWR5arkoJjH/Nf5IyezYitOwqK7YAdJk
 mrWoZcez0DRopNTf8yZMv1m8jyx7W9KUQumEO/ghqJRlBW+AbxZ1t99kmqWI5Aw0
 +zmhpyo06JHeMYQAfKJXX3iRt2Rt59BPHtGzoop6b0e2i55+uPE+DZTNm2+FwCV9
 4vxmfpYyg5/sJB9/v5b0N9TTDe9a8caOHXU5F+HA1yWuxMmqFuDFIcpKrgS/sUeP
 NelOLbh2L3UOPWP6tRRfpajxCQTmRoeZOQQv0L9dd3jYpyQOCesgKqOhqNTCU8KK
 qamTPig2N00smSLp6I/OVyJ96vFYZrbhyq0kwMayaafAU7mB8lzcfUj+8qP0c90k
 1PROtD1XpF3Nobp1F+YUp3sQxEGdCgs+9VeLWWObv2b/Vt3MDZijdEiC/3OkRAUh
 LPCfl/ky41BmT8AlaxRDjkyrN7hH4oUOkGUdVx6yR389J0OR9MSwEX9qNaMw8bBg
 1ALvv9+OR3QhTWyG30PGqUf3Um230oIdWuWxwFrhaoMmDVEVMRZQMtvQahi5hDYq
 zyX79DKWtExEe/f2hY1m/6eNm6st5HE7X7scOba3TamQzvOzJkjzo7XoS2yeUAjb
 eByO2G0PvTrA0TFls6Hyrl6db5OW5KjQnVWr6W3fiWL5YIdh0SQMkWeaGVvGyfy8
 Q3vhk7POaZo=
 =BvPn
 -----END PGP SIGNATURE-----

Merge tag 'locking-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:
 "Lockdep:
   - Enable PROVE_RAW_LOCK_NESTING with PROVE_LOCKING (Sebastian Andrzej
     Siewior)
   - Add lockdep_cleanup_dead_cpu() (David Woodhouse)

  futexes:
   - Use atomic64_inc_return() in get_inode_sequence_number() (Uros
     Bizjak)
   - Use atomic64_try_cmpxchg_relaxed() in get_inode_sequence_number()
     (Uros Bizjak)

  RT locking:
   - Add sparse annotation PREEMPT_RT's locking (Sebastian Andrzej
     Siewior)

  spinlocks:
   - Use atomic_try_cmpxchg_release() in osq_unlock() (Uros Bizjak)

  atomics:
   - x86: Use ALT_OUTPUT_SP() for __alternative_atomic64() (Uros Bizjak)
   - x86: Use ALT_OUTPUT_SP() for __arch_{,try_}cmpxchg64_emu() (Uros
     Bizjak)

  KCSAN, seqlocks:
   - Support seqcount_latch_t (Marco Elver)

  <linux/cleanup.h>:
   - Add if_not_guard() conditional guard helper (David Lechner)
   - Adjust scoped_guard() macros to avoid potential warning (Przemek
     Kitszel)
   - Remove address space of returned pointer (Uros Bizjak)

  WW mutexes:
   - locking/ww_mutex: Adjust to lockdep nest_lock requirements (Thomas
     Hellström)

  Rust integration:
   - Fix raw_spin_lock initialization on PREEMPT_RT (Eder Zulian)

  Misc cleanups & fixes:
   - lockdep: Fix wait-type check related warnings (Ahmed Ehab)
   - lockdep: Use info level for initial info messages (Jiri Slaby)
   - spinlocks: Make __raw_* lock ops static (Geert Uytterhoeven)
   - pvqspinlock: Convert fields of 'enum vcpu_state' to uppercase
     (Qiuxu Zhuo)
   - iio: magnetometer: Fix if () scoped_guard() formatting (Stephen
     Rothwell)
   - rtmutex: Fix misleading comment (Peter Zijlstra)
   - percpu-rw-semaphores: Fix grammar in percpu-rw-semaphore.rst (Xiu
     Jianfeng)"

* tag 'locking-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits)
  locking/Documentation: Fix grammar in percpu-rw-semaphore.rst
  iio: magnetometer: fix if () scoped_guard() formatting
  rust: helpers: Avoid raw_spin_lock initialization for PREEMPT_RT
  kcsan, seqlock: Fix incorrect assumption in read_seqbegin()
  seqlock, treewide: Switch to non-raw seqcount_latch interface
  kcsan, seqlock: Support seqcount_latch_t
  time/sched_clock: Broaden sched_clock()'s instrumentation coverage
  time/sched_clock: Swap update_clock_read_data() latch writes
  locking/atomic/x86: Use ALT_OUTPUT_SP() for __arch_{,try_}cmpxchg64_emu()
  locking/atomic/x86: Use ALT_OUTPUT_SP() for __alternative_atomic64()
  cleanup: Add conditional guard helper
  cleanup: Adjust scoped_guard() macros to avoid potential warning
  locking/osq_lock: Use atomic_try_cmpxchg_release() in osq_unlock()
  cleanup: Remove address space of returned pointer
  locking/rtmutex: Fix misleading comment
  locking/rt: Annotate unlock followed by lock for sparse.
  locking/rt: Add sparse annotation for RCU.
  locking/rt: Remove one __cond_lock() in RT's spin_trylock_irqsave()
  locking/rt: Add sparse annotation PREEMPT_RT's sleeping locks.
  locking/pvqspinlock: Convert fields of 'enum vcpu_state' to uppercase
  ...
2024-11-19 12:43:11 -08:00
Linus Torvalds
d8d78a90e7 - Add a feature flag which denotes AMD CPUs supporting workload classification
with the purpose of using such hints when making scheduling decisions
 
 - Determine the boost enumerator for each AMD core based on its type: efficiency
   or performance, in the cppc driver
 
 - Add the type of a CPU to the topology CPU descriptor with the goal of
   supporting and making decisions based on the type of the respective core
 
 - Add a feature flag to denote AMD cores which have heterogeneous topology and
   enable SD_ASYM_PACKING for those
 
 - Check microcode revisions before disabling PCID on Intel
 
 - Cleanups and fixlets
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmc7q0UACgkQEsHwGGHe
 VUq27Q//TADIn/rZj95OuWLYFXduOpzdyfF6BAOabRjUpIWTGJ5YdKjj1TCA2wUE
 6SiHZWQxQropB3NgeICcDT+3OGdGzE2qywzpXspUDsBPraWx+9CA56qREYafpRps
 88ZQZJWHla2/0kHN5oM4fYe05mWMLAFgIhG4tPH/7sj54Zqar40nhVksz3WjKAid
 yEfzbdVeRI5sNoujyHzGANXI0Fo98nAyi5Qj9kXL9W/UV1JmoQ78Rq7V9IIgOBsc
 l6Gv/h0CNtH9voqfrfUb07VHk8ZqSJ37xUnrnKdidncWGCWEAoZRr7wU+I9CHKIs
 tzdx+zq6JC3YN0IwsZCjk4me+BqVLJxW2oDgW7esPifye6ElyEo4T9UO9LEpE1qm
 ReAByoIMdSXWwXuITwy4NxLPKPCpU7RyJCiqFzpJp0g4qUq2cmlyERDirf6eknXL
 s+dmRaglEdcQT/EL+Y+vfFdQtLdwJmOu+nPPjjFxeRcIDB+u1sXJMEFbyvkLL6FE
 HOdNxL+5n/3M8Lbh77KIS5uCcjXL2VCkZK2/hyoifUb+JZR/ENoqYjElkMXOplyV
 KQIfcTzVCLRVvZApf/MMkTO86cpxMDs7YLYkgFxDsBjRdoq/Mzub8yzWn6kLZtmP
 ANNH4uYVtjrHE1nxJSA0JgYQlJKYeNU5yhLiTLKhHL5BwDYfiz8=
 =420r
 -----END PGP SIGNATURE-----

Merge tag 'x86_cpu_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cpuid updates from Borislav Petkov:

 - Add a feature flag which denotes AMD CPUs supporting workload
   classification with the purpose of using such hints when making
   scheduling decisions

 - Determine the boost enumerator for each AMD core based on its type:
   efficiency or performance, in the cppc driver

 - Add the type of a CPU to the topology CPU descriptor with the goal of
   supporting and making decisions based on the type of the respective
   core

 - Add a feature flag to denote AMD cores which have heterogeneous
   topology and enable SD_ASYM_PACKING for those

 - Check microcode revisions before disabling PCID on Intel

 - Cleanups and fixlets

* tag 'x86_cpu_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Remove redundant CONFIG_NUMA guard around numa_add_cpu()
  x86/cpu: Fix FAM5_QUARK_X1000 to use X86_MATCH_VFM()
  x86/cpu: Fix formatting of cpuid_bits[] in scattered.c
  x86/cpufeatures: Add X86_FEATURE_AMD_WORKLOAD_CLASS feature bit
  x86/amd: Use heterogeneous core topology for identifying boost numerator
  x86/cpu: Add CPU type to struct cpuinfo_topology
  x86/cpu: Enable SD_ASYM_PACKING for PKG domain on AMD
  x86/cpufeatures: Add X86_FEATURE_AMD_HETEROGENEOUS_CORES
  x86/cpufeatures: Rename X86_FEATURE_FAST_CPPC to have AMD prefix
  x86/mm: Don't disable PCID when INVLPG has been fixed by microcode
2024-11-19 12:27:19 -08:00
Linus Torvalds
ab713e7099 - Remove the unconditional cache writeback and invalidation after loading the
microcode patch on Intel as this was addressing a microcode bug for which
   there is a concrete microcode revision check instead
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmc7Vv4ACgkQEsHwGGHe
 VUpZtA//erQJlwD+GLWfVoao76Y24LPUVAd20CjvtExDqvOsmvA7lYID89tFXxx0
 dD3NDQBW8MI0oZ+Upc9fdRLMWE7SEOQDqk4WlMDMckc1+lv54+lqVA5wv6ee7goY
 ksUoMQ30MAMZ8ka4vGyjLFsLiYFhVgNCivuJdyi2igABHCGQFSRyl8zog6gWtUTU
 3YzTjYjNBY1nZAL99Ibda6VpnmWLOyP24X8hqMp8DD/qqHwhKZfnnqTdoIAkalOq
 OBK/7vQVaWj5YyUwH/VoJ2iorTv0/XtrNsBRkrxmQaS996/WR8yN4mHbuvThiATc
 i7b9IEVuoc6KhEuy4sPqt4Yl3QomEed+85/p1NJ5jD6cF4x2+GSnpJF6v/bNRR7T
 j46jStvJ7f15OcugExRGFI9DjC64ZbHumSrumhLWmKmAqjfAqlYChWUOicHi8l3+
 fshTnTRBSUmglGqbkgEVPIRv9e4BkuMpnFCDqAiq8mUm0dzqOZyvvh/gfHTUA+h0
 2HykwBitgHEC/e4EHiPpT/Bnm93uL/qBNOhgOWCmOfLGJgdcAE93qP48urj1cbES
 D5wbNqM5iUvA8jw4frWl222nf7r0FuHVD05pkhJ3m2uvJINWCC3EHFD2Bs3vTEr9
 xSGN+nOuWFR2D65RvsMr0eO9x1OTYo679TSyfeTHChnNMWRBcXQ=
 =wDsa
 -----END PGP SIGNATURE-----

Merge tag 'x86_microcode_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 microcode loader update from Borislav Petkov:

 - Remove the unconditional cache writeback and invalidation after
   loading the microcode patch on Intel as this was addressing a
   microcode bug for which there is a concrete microcode revision check
   instead

* tag 'x86_microcode_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/intel: Remove unnecessary cache writeback and invalidation
2024-11-19 12:13:16 -08:00
Linus Torvalds
5a4b3fbb48 - Add support for 6-node sub-NUMA clustering on Intel
- Cleanup
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmc7UqEACgkQEsHwGGHe
 VUoheQ//RYJFLA8K4JLykSSlzY6ixf4wr6aorO7cn0xPLi73ZKRWHMLpL7jWO1z4
 SgKgWM0w93DjbNK81EcQLmTBQPSSzHnMzACafVJBkfQQiAjdsHxaOcJ6Y7tbFkXB
 BVf8wFmiNamz00OxtXMUv3ZwvKn9xrar1R67yYOdrZVCV1w4A3IzfNPPpmc3FE5r
 n+Z8tZybBzQqyyMnlyirKjHx1/bGURN/fF8+UuRy9T8L7XcI5Y8QUCP9N0K1OiUg
 8XBDoy4Wc0F47XdnKqLxY/eR2D+qqH+KgIyjVThGJ5tJHRRpN5maK2s4s5eeT3Js
 /o8GAInChsXnyr9CAs4rUk4/f4Ai5QlccFGv+LPHBfUxcRfw+tXDe6sBQdIhUFpR
 q87CGnkbfp2yLrADxIjne59l5RqI8S1gGXGTtBJSVzu993Xu/w1Vpfkb52rLBVGZ
 JqQsGzHtbZyEA3mSnbWY1kHOSkwqccQejGjD5ufygdhojO1H0stvOsOxY0doFeII
 DQWlKDY78J1yN0r3RXdwzmD6W5SJ1r+03jbSRUb/3CaQpdKUiAk+8vV6mw4EfHEo
 gnCqU21jGZTqhCRquqehpzD8SK56gndbA6TEC6avx+8eTizA4uNKQWe07lKMM4oe
 glHCZvAf1k+hGvGz/8xioCQSpHgfZvVlZoR4Kv1enOj+05uiohc=
 =7r2j
 -----END PGP SIGNATURE-----

Merge tag 'x86_cache_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cache resource control updates from Borislav Petkov:

 - Add support for 6-node sub-NUMA clustering on Intel

 - Cleanup

* tag 'x86_cache_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/resctrl: Support Sub-NUMA cluster mode SNC6
  x86/resctrl: Slightly clean-up mbm_config_show()
2024-11-19 12:11:28 -08:00
Linus Torvalds
c1f2ffe207 - Log and handle twp new AMD-specific MCA registers: SYND1 and SYND2 and
report the Field Replaceable Unit text info reported through them
 
 - Add support for handling variable-sized SMCA BERT records
 
 - Add the capability for reporting vendor-specific RAS error info without
   adding vendor-specific fields to struct mce
 
 - Cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmc7OlEACgkQEsHwGGHe
 VUpXihAAgVdZExo/1Rmbh6s/259BH38GP6fL+ePaT1SlUzNi770TY2b7I4OYlms4
 xa9t8LAIVMrrIMIg6w6q8JN4YHAQoVdcbRBvHQYB1a24xtoyxaEJxLKQNLA1soUQ
 Jc9asWMHBuXnLfR/4S8Y2vWrzByOSwxqDBzQCu0Ryqvbg7vdRicNt+Hk9oHHIAYy
 cquZpoDGL3W6BA8sXONbEW/6rcQ33JsEQ+Ub4qr1q2g+kNwXrrFuXZlojmz2MxIs
 xgqeYKyrxK6heX0l8dSiipCATA+sOXXWWzbZtdPjFtDGzwIlV3p4yXN3fucrmHm1
 4Fg1gW5a1V82Qosn0FbGiZPojsahhOE2k1bz+yEMDM3Sg2qeRWcK+V3jiS5zKzPd
 WWqUbRtcaxayoEsAXnWrxrp3vxhlUUf1Ivtgk8mlMjhHPLijV5iranrRj+XHEikR
 H0D3Vm0T1LHCPf9AUsbmo0GAfAOeO9DTAB9LJdKv+OJ4ESVgSPJW/9NKWLXKq41p
 hhs7seJTYNw8sp67cL23TnkSp3S+9kd2U7Od3T1kubtd4fVxVnlowu8Fc6kjqd8v
 n+GbdLxhX7GbOgnT0z2OG5Xmc1pNW1JtRbuxSK59NFNia7r6ZkR7BE/OCtL82Rfm
 u7i76z1O0lV91y93GMCyP9DYn8K1ceU7gVCveY6mx/AHgzc87d8=
 =djpG
 -----END PGP SIGNATURE-----

Merge tag 'ras_core_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RAS updates from Borislav Petkov:

 - Log and handle twp new AMD-specific MCA registers: SYND1 and SYND2
   and report the Field Replaceable Unit text info reported through them

 - Add support for handling variable-sized SMCA BERT records

 - Add the capability for reporting vendor-specific RAS error info
   without adding vendor-specific fields to struct mce

 - Cleanups

* tag 'ras_core_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  EDAC/mce_amd: Add support for FRU text in MCA
  x86/mce/apei: Handle variable SMCA BERT record size
  x86/MCE/AMD: Add support for new MCA_SYND{1,2} registers
  tracing: Add __print_dynamic_array() helper
  x86/mce: Add wrapper for struct mce to export vendor specific info
  x86/mce/intel: Use MCG_BANKCNT_MASK instead of 0xff
  x86/mce/mcelog: Use xchg() to get and clear the flags
2024-11-19 12:04:51 -08:00
Rik van Riel
209954cbc7 x86/mm/tlb: Update mm_cpumask lazily
On busy multi-threaded workloads, there can be significant contention
on the mm_cpumask at context switch time.

Reduce that contention by updating mm_cpumask lazily, setting the CPU bit
at context switch time (if not already set), and clearing the CPU bit at
the first TLB flush sent to a CPU where the process isn't running.

When a flurry of TLB flushes for a process happen, only the first one
will be sent to CPUs where the process isn't running. The others will
be sent to CPUs where the process is currently running.

On an AMD Milan system with 36 cores, there is a noticeable difference:
$ hackbench --groups 20 --loops 10000

  Before: ~4.5s +/- 0.1s
  After:  ~4.2s +/- 0.1s

Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Link: https://lore.kernel.org/r/20241114152723.1294686-2-riel@surriel.com
2024-11-19 12:02:46 +01:00
Linus Torvalds
158f238aa6 xen: branch for v6.13-rc1
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCZzd0YQAKCRCAXGG7T9hj
 vm1JAQDBZKpEkPFrxy969YFNCSUiUj42+rDRI3OJt8y0CGTS2AEA53+B+/DYDm8i
 syw5eaSJv4uohary659ywZUdSljPig4=
 =MHOT
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-6.13-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from Juergen Gross:

 - a series for booting as a PVH guest, doing some cleanups after the
   previous work to make PVH boot code position independent

 - a fix of the xenbus driver avoiding a leak in an error case

* tag 'for-linus-6.13-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: Fix the issue of resource not being properly released in xenbus_dev_probe()
  x86/pvh: Avoid absolute symbol references in .head.text
  x86/xen: Avoid relocatable quantities in Xen ELF notes
  x86/pvh: Omit needless clearing of phys_base
  x86/pvh: Use correct size value in GDT descriptor
  x86/pvh: Call C code via the kernel virtual mapping
2024-11-18 18:26:57 -08:00
Linus Torvalds
0f25f0e4ef the bulk of struct fd memory safety stuff
Making sure that struct fd instances are destroyed in the same
 scope where they'd been created, getting rid of reassignments
 and passing them by reference, converting to CLASS(fd{,_pos,_raw}).
 
 We are getting very close to having the memory safety of that stuff
 trivial to verify.
 
 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZzdikAAKCRBZ7Krx/gZQ
 69nJAQCmbQHK3TGUbQhOw6MJXOK9ezpyEDN3FZb4jsu38vTIdgEA6OxAYDO2m2g9
 CN18glYmD3wRyU6Bwl4vGODouSJvDgA=
 =gVH3
 -----END PGP SIGNATURE-----

Merge tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull 'struct fd' class updates from Al Viro:
 "The bulk of struct fd memory safety stuff

  Making sure that struct fd instances are destroyed in the same scope
  where they'd been created, getting rid of reassignments and passing
  them by reference, converting to CLASS(fd{,_pos,_raw}).

  We are getting very close to having the memory safety of that stuff
  trivial to verify"

* tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (28 commits)
  deal with the last remaing boolean uses of fd_file()
  css_set_fork(): switch to CLASS(fd_raw, ...)
  memcg_write_event_control(): switch to CLASS(fd)
  assorted variants of irqfd setup: convert to CLASS(fd)
  do_pollfd(): convert to CLASS(fd)
  convert do_select()
  convert vfs_dedupe_file_range().
  convert cifs_ioctl_copychunk()
  convert media_request_get_by_fd()
  convert spu_run(2)
  switch spufs_calls_{get,put}() to CLASS() use
  convert cachestat(2)
  convert do_preadv()/do_pwritev()
  fdget(), more trivial conversions
  fdget(), trivial conversions
  privcmd_ioeventfd_assign(): don't open-code eventfd_ctx_fdget()
  o2hb_region_dev_store(): avoid goto around fdget()/fdput()
  introduce "fd_pos" class, convert fdget_pos() users to it.
  fdget_raw() users: switch to CLASS(fd_raw)
  convert vmsplice() to CLASS(fd)
  ...
2024-11-18 12:24:06 -08:00
Linus Torvalds
f66d6acccb - Make sure a kdump kernel with CONFIG_IMA_KEXEC enabled and booted on an AMD
SME enabled hardware properly decrypts the ima_kexec buffer information
   passed to it from the previous kernel
 
 - Fix building the kernel with Clang where a non-TLS definition of the stack
   protector guard cookie leads to bogus code generation
 
 - Clear a wrongly advertised virtualized VMLOAD/VMSAVE feature flag on some
   Zen4 client systems as those insns are not supported on client
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmc54zkACgkQEsHwGGHe
 VUplzw//d1ysRrT+YNFwQrvGUc6mjP3dtOo2NCBxKhgvC2OKzDlj8ZIx20Yn97At
 pboOPbewV+cCSd7PXo9ymNsJMjFD0NAyKasRaHhUEGU6/AvQ2EwwN7d6q444HFso
 eIFLak17zha1VhSyu0ozZJSWjUbebwBcI2I3Re5VzJXGzBybw2nLM/PPQn94tsAQ
 w6ag0Ol+VbIPd/EdMOcRI1VXeFl44V5kiAYNvCunBvpTae9i/pC/k7iHW6tWDJUu
 Ud0hpIsq/Rn9Bj2oG/ovz1CvlcEryhL23HpQFyj/C1Gwl3fjpSqWUDm8U366VIVV
 I0EFSpYpH7CD9ArL37R6ziaCH1yojMcrXZSu1kjtc9rPFoqmcxOBkW0dVidWwKv4
 3hFVrWMkFm/qMW+SPss7ZX48z87bnAwplQpgF/sFBeHJSXr909ne3nOdPAS8Hp4t
 5VMGQkXWQCWGgRNxAqpU8pZaPKsvuFvJIJg6B6JcDe/qSv3i78geg64lT2WNhs2m
 AGH/E+ERF3X5jaPn/2oRXc6o9Kfmoy0+QLE4UvLVGunhE8ZNK0Zg3097Dn7N1flR
 ayhAt6otgeksiRpskfCtnxlfZoWIwnS0ueVkb0qJDS+1PWlMkk48cvxwpikEn6rD
 ebCllm8dBysnxgyf+Mf2rKsDbsnrICT7olPr1MTuEUHwf6Uqs5Q=
 =5i7R
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Make sure a kdump kernel with CONFIG_IMA_KEXEC enabled and booted on
   an AMD SME enabled hardware properly decrypts the ima_kexec buffer
   information passed to it from the previous kernel

 - Fix building the kernel with Clang where a non-TLS definition of the
   stack protector guard cookie leads to bogus code generation

 - Clear a wrongly advertised virtualized VMLOAD/VMSAVE feature flag on
   some Zen4 client systems as those insns are not supported on client

* tag 'x86_urgent_for_v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Fix a kdump kernel failure on SME system when CONFIG_IMA_KEXEC=y
  x86/stackprotector: Work around strict Clang TLS symbol requirements
  x86/CPU/AMD: Clear virtualized VMLOAD/VMSAVE on Zen4 client
2024-11-17 09:35:51 -08:00
Thorsten Blum
f060c89dc1 x86/sgx: Use vmalloc_array() instead of vmalloc()
Use vmalloc_array() instead of vmalloc() to calculate the number of
bytes to allocate.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/all/20241112182633.172944-2-thorsten.blum%40linux.dev
2024-11-12 11:11:42 -08:00
Shivank Garg
f74642d81c x86/cpu: Remove redundant CONFIG_NUMA guard around numa_add_cpu()
Remove unnecessary CONFIG_NUMA #ifdef around numa_add_cpu() since the
function is already properly handled in <asm/numa.h> for both NUMA and
non-NUMA configurations. For !CONFIG_NUMA builds, numa_add_cpu() is
defined as an empty function.

Simplify the code without any functionality change.

Testing: Build CONFIG_NUMA=n

Signed-off-by: Shivank Garg <shivankg@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20241112072346.428623-1-shivankg@amd.com
2024-11-12 11:00:50 +01:00
Andy Shevchenko
62e724494d x86/cpu: Make sure flag_is_changeable_p() is always being used
When flag_is_changeable_p() is unused, it prevents kernel builds
with clang, `make W=1` and CONFIG_WERROR=y:

arch/x86/kernel/cpu/common.c:351:19: error: unused function 'flag_is_changeable_p' [-Werror,-Wunused-function]
  351 | static inline int flag_is_changeable_p(u32 flag)
      |                   ^~~~~~~~~~~~~~~~~~~~

Fix this by moving core around to make sure flag_is_changeable_p() is
always being used.

See also commit 6863f5643d ("kbuild: allow Clang to find unused static
inline functions for W=1 build").

While at it, fix the argument type to be unsigned long along with
the local variables, although it currently only runs in 32-bit cases.
Besides that, makes it return boolean instead of int. This induces
the change of the returning type of have_cpuid_p() to be boolean
as well.

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Link: https://lore.kernel.org/all/20241108153105.1578186-1-andriy.shevchenko%40linux.intel.com
2024-11-08 09:08:48 -08:00
Ard Biesheuvel
577c134d31 x86/stackprotector: Work around strict Clang TLS symbol requirements
GCC and Clang both implement stack protector support based on Thread Local
Storage (TLS) variables, and this is used in the kernel to implement per-task
stack cookies, by copying a task's stack cookie into a per-CPU variable every
time it is scheduled in.

Both now also implement -mstack-protector-guard-symbol=, which permits the TLS
variable to be specified directly. This is useful because it will allow to
move away from using a fixed offset of 40 bytes into the per-CPU area on
x86_64, which requires a lot of special handling in the per-CPU code and the
runtime relocation code.

However, while GCC is rather lax in its implementation of this command line
option, Clang actually requires that the provided symbol name refers to a TLS
variable (i.e., one declared with __thread), although it also permits the
variable to be undeclared entirely, in which case it will use an implicit
declaration of the right type.

The upshot of this is that Clang will emit the correct references to the stack
cookie variable in most cases, e.g.,

  10d:       64 a1 00 00 00 00       mov    %fs:0x0,%eax
                     10f: R_386_32   __stack_chk_guard

However, if a non-TLS definition of the symbol in question is visible in the
same compilation unit (which amounts to the whole of vmlinux if LTO is
enabled), it will drop the per-CPU prefix and emit a load from a bogus
address.

Work around this by using a symbol name that never occurs in C code, and emit
it as an alias in the linker script.

Fixes: 3fb0fdb3bb ("x86/stackprotector/32: Make the canary into a regular percpu variable")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Cc: stable@vger.kernel.org
Link: https://github.com/ClangBuiltLinux/linux/issues/1854
Link: https://lore.kernel.org/r/20241105155801.1779119-2-brgerst@gmail.com
2024-11-08 13:16:00 +01:00
Mike Rapoport (Microsoft)
9bfc4824fd x86/module: prepare module loading for ROX allocations of text
When module text memory will be allocated with ROX permissions, the memory
at the actual address where the module will live will contain invalid
instructions and there will be a writable copy that contains the actual
module code.

Update relocations and alternatives patching to deal with it.

[rppt@kernel.org: fix writable address in cfi_rewrite_endbr()]
  Link: https://lkml.kernel.org/r/ZysRwR29Ji8CcbXc@kernel.org
Link: https://lkml.kernel.org/r/20241023162711.2579610-7-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Tested-by: kdevops <kdevops@lists.linux.dev>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Brian Cain <bcain@quicinc.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Song Liu <song@kernel.org>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07 14:25:16 -08:00
Oscar Salvador
1317a5e7f7 arch/x86: teach arch_get_unmapped_area_vmflags to handle hugetlb mappings
We want to stop special casing hugetlb mappings and make them go through
generic channels, so teach arch_get_unmapped_area_{topdown_}vmflags to
handle those.

x86 specific hugetlb function does not set either info.start_gap or
info.align_offset so the same here for compatibility.

Link: https://lkml.kernel.org/r/20241007075037.267650-4-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06 20:11:10 -08:00
Mario Limonciello
b79276dcac ACPI: processor: Move arch_init_invariance_cppc() call later
arch_init_invariance_cppc() is called at the end of
acpi_cppc_processor_probe() in order to configure frequency invariance
based upon the values from _CPC.

This however doesn't work on AMD CPPC shared memory designs that have
AMD preferred cores enabled because _CPC needs to be analyzed from all
cores to judge if preferred cores are enabled.

This issue manifests to users as a warning since commit 21fb59ab4b
("ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn"):
```
Could not retrieve highest performance (-19)
```

However the warning isn't the cause of this, it was actually
commit 279f838a61 ("x86/amd: Detect preferred cores in
amd_get_boost_ratio_numerator()") which exposed the issue.

To fix this problem, change arch_init_invariance_cppc() into a new weak
symbol that is called at the end of acpi_processor_driver_init().
Each architecture that supports it can declare the symbol to override
the weak one.

Define it for x86, in arch/x86/kernel/acpi/cppc.c, and for all of the
architectures using the generic arch_topology.c code.

Fixes: 279f838a61 ("x86/amd: Detect preferred cores in amd_get_boost_ratio_numerator()")
Reported-by: Ivan Shapovalov <intelfx@intelfx.name>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219431
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://patch.msgid.link/20241104222855.3959267-1-superm1@kernel.org
[ rjw: Changelog edit ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-11-06 21:31:36 +01:00
Masami Hiramatsu (Google)
4638d7ebef x86/kprobes: Cleanup kprobes on ftrace code
Cleanup kprobes on ftrace code for x86.
 - Set instruction pointer (ip + MCOUNT_INSN_SIZE) after pre_handler only
   when p->post_handler exists.
 - Use INT3_INSN_SIZE instead of 1.
 - Use instruction_pointer/instruction_pointer_set() functions instead of
   accessing regs->ip directly.

Link: https://lore.kernel.org/all/172951436219.167263.18330240454389154327.stgit@devnote2/

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-11-07 01:16:59 +09:00
Tony Luck
9bce6e94c4 x86/resctrl: Support Sub-NUMA cluster mode SNC6
Support Sub-NUMA cluster mode with 6 nodes per L3 cache (SNC6) on some
Intel platforms.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20241031220213.17991-1-tony.luck@intel.com
2024-11-06 10:49:04 +01:00
Dave Airlie
bf99ceb6e0 Merge tag 'drm-intel-next-2024-11-04' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next
drm/i915 feature pull #2 for v6.13:

Features and functionality:

- Pantherlake (PTL) Xe3 LPD display enabling for xe driver (Clint, Suraj,
  Dnyaneshwar, Matt, Gustavo, Radhakrishna, Chaitanya, Haridhar, Juha-Pekka, Ravi)
- Enable dbuf overlap detection on Lunarlake and later (Stanislav, Vinod)
- Allow fastset for HDR infoframe changes (Chaitanya)
- Write DP source OUI also for non-eDP sinks (Imre)

Refactoring and cleanups:
- Independent platform identification for display (Jani)
- Display tracepoint fixes and cleanups (Gustavo)
- Share PCI ID headers between i915 and xe drivers (Jani)
- Use x100 version for full version and release checks (Jani)
- Conversions to struct intel_display (Jani, Ville)
- Reuse DP DPCD and AUX macros in gvt instead of duplication (Jani)
- Use string choice helpers (R Sundar, Sai Teja)
- Remove unused underrun detection irq code (Sai Teja)
- Color management debug improvements and other cleanups (Ville)
- Refactor panel fitter code to a separate file (Ville)
- Use try_cmpxchg() instead of open-coding (Uros Bizjak)

Fixes:
- PSR and Panel Replay fixes and workarounds (Jouni)
- Fix panel power during connector detection (Imre)
- Fix connector detection and modeset races (Imre)
- Fix C20 PHY TX MISC configuration (Gustavo)
- Improve panel fitter validity checks (Ville)
- Fix eDP short HPD interrupt handling while runtime suspended (Imre)
- Propagate DP MST DSC BW overhead/slice calculation errors (Imre)
- Stop hotplug polling for eDP connectors (Imre)
- Workaround panels reporting bad link status after PSR enable (Jouni)
- Panel Replay VRR VSC SDP related workaround and refactor (Animesh, Mitul)
- Fix memory leak on eDP init error path (Shuicheng)
- Fix GVT KVMGT Kconfig dependencies (Arnd Bergmann)
- Fix irq function documentation build warning (Rodrigo)
- Add platform check to power management fuse bit read (Clint)
- Revert kstrdup_const() and kfree_const() usage for clarity (Christophe JAILLET)
- Workaround horizontal odd panning issues in display versions 20 and 30 (Nemesa)
- Fix xe drive HDCP GSC firmware check (Suraj)

Merges:
- Backmerge drm-next to get some KVM changes (Rodrigo)
- Fix a build failure originating from previous backmerge (Jani)

Signed-off-by: Dave Airlie <airlied@redhat.com>

# Conflicts:
#	drivers/gpu/drm/i915/display/intel_dp_mst.c
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87h68ni0wd.fsf@intel.com
2024-11-06 09:08:53 +10:00
Mario Limonciello
a5ca1dc46a x86/CPU/AMD: Clear virtualized VMLOAD/VMSAVE on Zen4 client
A number of Zen4 client SoCs advertise the ability to use virtualized
VMLOAD/VMSAVE, but using these instructions is reported to be a cause
of a random host reboot.

These instructions aren't intended to be advertised on Zen4 client
so clear the capability.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219009
2024-11-05 17:48:32 +01:00
Marco Elver
93190bc35d seqlock, treewide: Switch to non-raw seqcount_latch interface
Switch all instrumentable users of the seqcount_latch interface over to
the non-raw interface.

Co-developed-by: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Signed-off-by: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241104161910.780003-5-elver@google.com
2024-11-05 12:55:35 +01:00
Al Viro
6348be02ee fdget(), trivial conversions
fdget() is the first thing done in scope, all matching fdput() are
immediately followed by leaving the scope.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-03 01:28:06 -05:00
Thomas Weißschuh
7175126a6d x86/vdso: Allocate vvar page from C code
Allocate the vvar page through the standard union vdso_data_store
and remove the custom linker script logic.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20241010-vdso-generic-base-v1-14-b64f0842d512@linutronix.de
2024-11-02 12:37:34 +01:00
Yazen Ghannam
e9876dafa2 x86/mce/apei: Handle variable SMCA BERT record size
The ACPI Boot Error Record Table (BERT) is being used by the kernel to report
errors that occurred in a previous boot. On some modern AMD systems, these
very errors within the BERT are reported through the x86 Common Platform Error
Record (CPER) format which consists of one or more Processor Context
Information Structures.

These context structures provide a starting address and represent an x86 MSR
range in which the data constitutes a contiguous set of MSRs starting from,
and including the starting address.

It's common, for AMD systems that implement this behavior, that the MSR range
represents the MCAX register space used for the Scalable MCA feature. The
apei_smca_report_x86_error() function decodes and passes this information
through the MCE notifier chain. However, this function assumes a fixed
register size based on the original HW/FW implementation.

This assumption breaks with the addition of two new MCAX registers viz.
MCA_SYND1 and MCA_SYND2. These registers are added at the end of the MCAX
register space, so they won't be included when decoding the CPER data.

Rework apei_smca_report_x86_error() to support a variable register array size.
This covers any case where the MSR context information starts at the MCAX
address for MCA_STATUS and ends at any other register within the MCAX register
space.

  [ Yazen: Add Avadhut as co-developer for wrapper changes.]
  [ bp: Massage. ]

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Co-developed-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://lore.kernel.org/r/20241022194158.110073-5-avadhut.naik@amd.com
2024-10-31 10:45:59 +01:00
Avadhut Naik
d4fca1358e x86/MCE/AMD: Add support for new MCA_SYND{1,2} registers
Starting with Zen4, AMD's Scalable MCA systems incorporate two new registers:
MCA_SYND1 and MCA_SYND2.

These registers will include supplemental error information in addition to the
existing MCA_SYND register. The data within these registers is considered
valid if MCA_STATUS[SyndV] is set.

Userspace error decoding tools like rasdaemon gather related hardware error
information through the tracepoints.

Therefore, export these two registers through the mce_record tracepoint so
that tools like rasdaemon can parse them and output the supplemental error
information like FRU text contained in them.

  [ bp: Massage. ]

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://lore.kernel.org/r/20241022194158.110073-4-avadhut.naik@amd.com
2024-10-31 10:36:07 +01:00
Avadhut Naik
750fd23926 x86/mce: Add wrapper for struct mce to export vendor specific info
Currently, exporting new additional machine check error information
involves adding new fields for the same at the end of the struct mce.
This additional information can then be consumed through mcelog or
tracepoint.

However, as new MSRs are being added (and will be added in the future)
by CPU vendors on their newer CPUs with additional machine check error
information to be exported, the size of struct mce will balloon on some
CPUs, unnecessarily, since those fields are vendor-specific. Moreover,
different CPU vendors may export the additional information in varying
sizes.

The problem particularly intensifies since struct mce is exposed to
userspace as part of UAPI. It's bloating through vendor-specific data
should be avoided to limit the information being sent out to userspace.

Add a new structure mce_hw_err to wrap the existing struct mce. The same
will prevent its ballooning since vendor-specifc data, if any, can now be
exported through a union within the wrapper structure and through
__dynamic_array in mce_record tracepoint.

Furthermore, new internal kernel fields can be added to the wrapper
struct without impacting the user space API.

  [ bp: Restore reverse x-mas tree order of function vars declarations. ]

Suggested-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://lore.kernel.org/r/20241022194158.110073-2-avadhut.naik@amd.com
2024-10-30 17:18:59 +01:00
Usama Arif
b2473a3597 of/fdt: add dt_phys arg to early_init_dt_scan and early_init_dt_verify
__pa() is only intended to be used for linear map addresses and using
it for initial_boot_params which is in fixmap for arm64 will give an
incorrect value. Hence save the physical address when it is known at
boot time when calling early_init_dt_scan for arm64 and use it at kexec
time instead of converting the virtual address using __pa().

Note that arm64 doesn't need the FDT region reserved in the DT as the
kernel explicitly reserves the passed in FDT. Therefore, only a debug
warning is fixed with this change.

Reported-by: Breno Leitao <leitao@debian.org>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Fixes: ac10be5cdb ("arm64: Use common of_kexec_alloc_and_setup_fdt()")
Link: https://lore.kernel.org/r/20241023171426.452688-1-usamaarif642@gmail.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2024-10-29 15:32:45 -05:00
Ard Biesheuvel
223abe96ac x86/xen: Avoid relocatable quantities in Xen ELF notes
Xen puts virtual and physical addresses into ELF notes that are treated
by the linker as relocatable by default. Doing so is not only pointless,
given that the ELF notes are only intended for consumption by Xen before
the kernel boots. It is also a KASLR leak, given that the kernel's ELF
notes are exposed via the world readable /sys/kernel/notes.

So emit these constants in a way that prevents the linker from marking
them as relocatable. This involves place-relative relocations (which
subtract their own virtual address from the symbol value) and linker
provided absolute symbols that add the address of the place to the
desired value.

Tested-by: Jason Andryuk <jason.andryuk@amd.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
Message-ID: <20241009160438.3884381-11-ardb+git@google.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2024-10-29 17:23:36 +01:00
Jani Nikula
f719c2a2d1 drm/intel/pciids: rename i915_pciids.h to just pciids.h
In preparation of sharing the PCI ID macros between i915 and xe, rename
i915_pciids.h to pciids.h.

Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Tvrtko Ursulin <tursulin@ursulin.net>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/835143845faa5310e4bb58405a8a0848392bbf06.1729590029.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2024-10-29 16:14:04 +02:00
Sabyrzhan Tasbolatov
1db272864f x86/traps: move kmsan check after instrumentation_begin
During x86_64 kernel build with CONFIG_KMSAN, the objtool warns following:

  AR      built-in.a
  AR      vmlinux.a
  LD      vmlinux.o
vmlinux.o: warning: objtool: handle_bug+0x4: call to
    kmsan_unpoison_entry_regs() leaves .noinstr.text section
  OBJCOPY modules.builtin.modinfo
  GEN     modules.builtin
  MODPOST Module.symvers
  CC      .vmlinux.export.o

Moving kmsan_unpoison_entry_regs() _after_ instrumentation_begin() fixes
the warning.

There is decode_bug(regs->ip, &imm) is left before KMSAN unpoisoining, but
it has the return condition and if we include it after
instrumentation_begin() it results the warning "return with
instrumentation enabled", hence, I'm concerned that regs will not be KMSAN
unpoisoned if `ud_type == BUG_NONE` is true.

Link: https://lkml.kernel.org/r/20241016152407.3149001-1-snovitoll@gmail.com
Fixes: ba54d194f8 ("x86/traps: avoid KMSAN bugs originating from handle_bug()")
Signed-off-by: Sabyrzhan Tasbolatov <snovitoll@gmail.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-28 21:40:39 -07:00
Qiuxu Zhuo
754269ccf0 x86/mce/intel: Use MCG_BANKCNT_MASK instead of 0xff
Use the predefined MCG_BANKCNT_MASK macro instead of the hardcoded
0xff to mask the bank number bits.

No functional changes intended.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20241025024602.24318-3-qiuxu.zhuo@intel.com
2024-10-28 14:27:34 +01:00
Qiuxu Zhuo
325c3376af x86/mce/mcelog: Use xchg() to get and clear the flags
Using xchg() to atomically get and clear the MCE log buffer flags,
streamlines the code and reduces the text size by 20 bytes.

  $ size dev-mcelog.o.*

       text	   data	    bss	    dec	    hex	filename
       3013	    360	    160	   3533	    dcd	dev-mcelog.o.old
       2993	    360	    160	   3513	    db9	dev-mcelog.o.new

No functional changes intended.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20241025024602.24318-2-qiuxu.zhuo@intel.com
2024-10-28 14:07:47 +01:00
Borislav Petkov (AMD)
e6e6a303f8 x86/cpu: Fix formatting of cpuid_bits[] in scattered.c
Realign initializers to accomodate for longer X86_FEATURE define names.

No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-10-28 13:51:05 +01:00
Perry Yuan
0c487010cb x86/cpufeatures: Add X86_FEATURE_AMD_WORKLOAD_CLASS feature bit
Add a new feature bit that indicates support for workload-based heuristic
feedback to OS for scheduling decisions.

When the bit set, threads are classified during runtime into enumerated
classes. The classes represent thread performance/power characteristics
that may benefit from special scheduling behaviors.

Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20241028020251.8085-4-mario.limonciello@amd.com
2024-10-28 13:44:44 +01:00
Linus Torvalds
ea1fda89f5 - Prevent a certain range of pages which get marked as hypervisor-only, to get
allocated to a CoCo (SNP) guest which cannot use them and thus fail booting
 
 - Fix the microcode loader on AMD to pay attention to the stepping of a patch
   and to handle the case where a BIOS config option splits the machine into
   logical NUMA nodes per L3 cache slice
 
 - Disable LAM from being built by default due to security concerns of
   a various kind
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmceGS0ACgkQEsHwGGHe
 VUqf4w/+JEzle0DbXRTCB1Gu9ID7mTuEGb3tXjK+UOTy8nXRyAf9BqgyzJeCr3gu
 0vCuNzhOTe4sfKb+bNp2yy36/c7RodAGuot1oIWXf8hiMtWCIZ1rcf2zj4GqzSmD
 FUAPexX/FDkySLQ3FOfTmpDwGgDFe3IH6rMn8ETkwAuZsh2aiYNYlUjq5AZNQjIh
 Fa3eyYBSpCppdSeVLxzq1fnbFGIg25AYiXRsWzoulwkeARHadvc0lopPIumkUbUw
 zyYWt1CrcsQTahwSF3Yt2dstve2yWHtbmElH8N4X3dvKsoP2OoVM/glVDWf2yaT+
 3dkh/OrAnqHX8CSZCakPVhpHg5VDZewkzyfMSykge2itu2J5+780Cjq9PJ4A1PV6
 oUx/wfhso16Fkn9VhXaMUcO+GJB2uOKCtktCXt6cIBnRQiSR1ka/X7duuEZbdvFA
 jiVy4KrKYnvqJHlz5GFg3FqfvUWEzFDP8dZNuJb+eqJhHo1C0gWyOYhXd9nZeyF9
 ZA5nYTp/mkf9UXhhEYAHV+qnEQYIi4yOXoICQezc5PxCnnxQrJ4+Z1FV1bO0RLN6
 FCkqGn2aSwoiUfDfWw899juDO+B+aYqwTPy7gcZBex8qnJYsB1BUyPOcARnE2Wqs
 6S3dvm2Uq78mtxBvVSlLpwxV4ZHM9ZbSoAx/sasNuOzdTjMPHS0=
 =P9TV
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v6.12_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Prevent a certain range of pages which get marked as hypervisor-only,
   to get allocated to a CoCo (SNP) guest which cannot use them and thus
   fail booting

 - Fix the microcode loader on AMD to pay attention to the stepping of a
   patch and to handle the case where a BIOS config option splits the
   machine into logical NUMA nodes per L3 cache slice

 - Disable LAM from being built by default due to security concerns

* tag 'x86_urgent_for_v6.12_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sev: Ensure that RMP table fixups are reserved
  x86/microcode/AMD: Split load_microcode_amd()
  x86/microcode/AMD: Pay attention to the stepping dynamically
  x86/lam: Disable ADDRESS_MASKING in most cases
2024-10-27 09:01:36 -10:00
Thorsten Blum
7565caab47 x86/cpu: Use str_yes_no() helper in show_cpuinfo_misc()
Remove hard-coded strings by using the str_yes_no() helper function.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20241026110808.78074-1-thorsten.blum@linux.dev
2024-10-26 15:37:15 +02:00
Mario Limonciello
3eef25ab0d x86/amd: Use heterogeneous core topology for identifying boost numerator
AMD heterogeneous designs include two types of cores:

 * Performance
 * Efficiency

Each core type has different highest performance values configured by the
platform.  Drivers such as amd_pstate need to identify the type of core to
correctly set an appropriate boost numerator to calculate the maximum
frequency.

X86_FEATURE_AMD_HETEROGENEOUS_CORES is used to identify whether the SoC
supports heterogeneous core type by reading CPUID leaf Fn_0x80000026.

On performance cores the scaling factor of 196 is used.  On efficiency cores
the scaling factor is the value reported as the highest perf.  Efficiency
cores have the same preferred core rankings.

Suggested-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20241025171459.1093-6-mario.limonciello@amd.com
2024-10-25 20:51:17 +02:00
Pawan Gupta
45239ba39a x86/cpu: Add CPU type to struct cpuinfo_topology
Sometimes it is required to take actions based on if a CPU is a performance or
efficiency core. As an example, intel_pstate driver uses the Intel core-type
to determine CPU scaling. Also, some CPU vulnerabilities only affect
a specific CPU type, like RFDS only affects Intel Atom. Hybrid systems that
have variants P+E, P-only(Core) and E-only(Atom), it is not straightforward to
identify which variant is affected by a type specific vulnerability.

Such processors do have CPUID field that can uniquely identify them. Like,
P+E, P-only and E-only enumerates CPUID.1A.CORE_TYPE identification, while P+E
additionally enumerates CPUID.7.HYBRID. Based on this information, it is
possible for boot CPU to identify if a system has mixed CPU types.

Add a new field hw_cpu_type to struct cpuinfo_topology that stores the
hardware specific CPU type. This saves the overhead of IPIs to get the CPU
type of a different CPU. CPU type is populated early in the boot process,
before vulnerabilities are enumerated.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Co-developed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20241025171459.1093-5-mario.limonciello@amd.com
2024-10-25 20:44:26 +02:00
Perry Yuan
b0979e5364 x86/cpu: Enable SD_ASYM_PACKING for PKG domain on AMD
Enable the SD_ASYM_PACKING domain flag for the PKG domain on AMD heterogeneous
processors.  This flag is beneficial for processors with one or more CCDs and
relies on x86_sched_itmt_flags().

Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Co-developed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20241025171459.1093-4-mario.limonciello@amd.com
2024-10-25 20:43:22 +02:00
Perry Yuan
1ad4667066 x86/cpufeatures: Add X86_FEATURE_AMD_HETEROGENEOUS_CORES
CPUID leaf 0x80000026 advertises core types with different efficiency
rankings.

Bit 30 indicates the heterogeneous core topology feature, if the bit
set, it means not all instances at the current hierarchical level have
the same core topology.

This is described in the AMD64 Architecture Programmers Manual Volume
2 and 3, doc ID #25493 and #25494.

Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Co-developed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20241025171459.1093-3-mario.limonciello@amd.com
2024-10-25 20:31:16 +02:00
Mario Limonciello
104edc6efc x86/cpufeatures: Rename X86_FEATURE_FAST_CPPC to have AMD prefix
This feature is an AMD unique feature of some processors, so put
AMD into the name.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20241025171459.1093-2-mario.limonciello@amd.com
2024-10-25 20:09:16 +02:00
Linus Torvalds
86e6b1547b x86: fix user address masking non-canonical speculation issue
It turns out that AMD has a "Meltdown Lite(tm)" issue with non-canonical
accesses in kernel space.  And so using just the high bit to decide
whether an access is in user space or kernel space ends up with the good
old "leak speculative data" if you have the right gadget using the
result:

  CVE-2020-12965 “Transient Execution of Non-Canonical Accesses“

Now, the kernel surrounds the access with a STAC/CLAC pair, and those
instructions end up serializing execution on older Zen architectures,
which closes the speculation window.

But that was true only up until Zen 5, which renames the AC bit [1].
That improves performance of STAC/CLAC a lot, but also means that the
speculation window is now open.

Note that this affects not just the new address masking, but also the
regular valid_user_address() check used by access_ok(), and the asm
version of the sign bit check in the get_user() helpers.

It does not affect put_user() or clear_user() variants, since there's no
speculative result to be used in a gadget for those operations.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Link: https://lore.kernel.org/all/80d94591-1297-4afb-b510-c665efd37f10@citrix.com/
Link: https://lore.kernel.org/all/20241023094448.GAZxjFkEOOF_DM83TQ@fat_crate.local/ [1]
Link: https://www.amd.com/en/resources/product-security/bulletin/amd-sb-1010.html
Link: https://arxiv.org/pdf/2108.10771
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Tested-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com> # LAM case
Fixes: 2865baf540 ("x86: support user address masking instead of non-speculative conditional")
Fixes: 6014bc2756 ("x86-64: make access_ok() independent of LAM")
Fixes: b19b74bc99 ("x86/mm: Rework address range check in get_user() and put_user()")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-10-25 09:53:03 -07:00
Chang S. Bae
9a819753b0 x86/microcode/intel: Remove unnecessary cache writeback and invalidation
Currently, an unconditional cache flush is performed during every
microcode update. Although the original changelog did not mention
a specific erratum, this measure was primarily intended to address
a specific microcode bug, the load of which has already been blocked by
is_blacklisted(). Therefore, this cache flush is no longer necessary.

Additionally, the side effects of doing this have been overlooked. It
increases CPU rendezvous time during late loading, where the cache flush
takes between 1x to 3.5x longer than the actual microcode update.

Remove native_wbinvd() and update the erratum name to align with the
latest errata documentation, document ID 334163 Version 022US.

  [ bp: Zap the flaky documentation URL. ]

Fixes: 91df9fdf51 ("x86/microcode/intel: Writeback and invalidate caches before updating microcode")
Reported-by: Yan Hua Wu <yanhua1.wu@intel.com>
Reported-by: William Xie <william.xie@intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Ashok Raj <ashok.raj@intel.com>
Tested-by: Yan Hua Wu <yanhua1.wu@intel.com>
Link: https://lore.kernel.org/r/20241001161042.465584-2-chang.seok.bae@intel.com
2024-10-25 18:12:03 +02:00
Borislav Petkov (AMD)
1d81d85d1a x86/microcode/AMD: Split load_microcode_amd()
This function should've been split a long time ago because it is used in
two paths:

1) On the late loading path, when the microcode is loaded through the
   request_firmware interface

2) In the save_microcode_in_initrd() path which collects all the
   microcode patches which are relevant for the current system before
   the initrd with the microcode container has been jettisoned.

   In that path, it is not really necessary to iterate over the nodes on
   a system and match a patch however it didn't cause any trouble so it
   was left for a later cleanup

However, that later cleanup was expedited by the fact that Jens was
enabling "Use L3 as a NUMA node" in the BIOS setting in his machine and
so this causes the NUMA CPU masks used in cpumask_of_node() to be
generated *after* 2) above happened on the first node. Which means, all
those masks were funky, wrong, uninitialized and whatnot, leading to
explosions when dereffing c->microcode in load_microcode_amd().

So split that function and do only the necessary work needed at each
stage.

Fixes: 94838d230a ("x86/microcode/AMD: Use the family,model,stepping encoded in the patch ID")
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/91194406-3fdf-4e38-9838-d334af538f74@kernel.dk
2024-10-22 16:48:00 +02:00
Borislav Petkov (AMD)
d1744a4c97 x86/microcode/AMD: Pay attention to the stepping dynamically
Commit in Fixes changed how a microcode patch is loaded on Zen and newer but
the patch matching needs to happen with different rigidity, depending on what
is being done:

1) When the patch is added to the patches cache, the stepping must be ignored
   because the driver still supports different steppings per system

2) When the patch is matched for loading, then the stepping must be taken into
   account because each CPU needs the patch matching its exact stepping

Take care of that by making the matching smarter.

Fixes: 94838d230a ("x86/microcode/AMD: Use the family,model,stepping encoded in the patch ID")
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/91194406-3fdf-4e38-9838-d334af538f74@kernel.dk
2024-10-22 16:37:13 +02:00
Linus Torvalds
d129377639 ARM64:
* Fix the guest view of the ID registers, making the relevant fields
   writable from userspace (affecting ID_AA64DFR0_EL1 and ID_AA64PFR1_EL1)
 
 * Correcly expose S1PIE to guests, fixing a regression introduced
   in 6.12-rc1 with the S1POE support
 
 * Fix the recycling of stage-2 shadow MMUs by tracking the context
   (are we allowed to block or not) as well as the recycling state
 
 * Address a couple of issues with the vgic when userspace misconfigures
   the emulation, resulting in various splats. Headaches courtesy
   of our Syzkaller friends
 
 * Stop wasting space in the HYP idmap, as we are dangerously close
   to the 4kB limit, and this has already exploded in -next
 
 * Fix another race in vgic_init()
 
 * Fix a UBSAN error when faking the cache topology with MTE
   enabled
 
 RISCV:
 
 * RISCV: KVM: use raw_spinlock for critical section in imsic
 
 x86:
 
 * A bandaid for lack of XCR0 setup in selftests, which causes trouble
   if the compiler is configured to have x86-64-v3 (with AVX) as the
   default ISA.  Proper XCR0 setup will come in the next merge window.
 
 * Fix an issue where KVM would not ignore low bits of the nested CR3
   and potentially leak up to 31 bytes out of the guest memory's bounds
 
 * Fix case in which an out-of-date cached value for the segments could
   by returned by KVM_GET_SREGS.
 
 * More cleanups for KVM_X86_QUIRK_SLOT_ZAP_ALL
 
 * Override MTRR state for KVM confidential guests, making it WB by
   default as is already the case for Hyper-V guests.
 
 Generic:
 
 * Remove a couple of unused functions
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmcVK54UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOfrgf7BRyihd28OGaqVuv2BqGYrxqfOkd6
 ZqpJDOy+X7UE3iG5NhTxw4mghCJFhOwIL7gDSZwPLe6D2k01oqPSP2pLMqXb5oOv
 /EkltRvzG0YIH3sjZY5PROrMMxnvSKkJKxETFxFQQzMKRym2v/T5LAzrium58YIT
 vWZXxo2HTPXOw/U5upAqqMYJMeeJEL3kurVHtOsPytUFjrIOl0BfeKvgjOwonDIh
 Awm4JZwk0+1d8sYfkuzsSrTQmtshDCx1jkFN1juirt90s1EwgmOvVKiHo3gMsVP9
 veDRoLTx2fM/r7TrhoHo46DTA2vbfmCltWcT0cn5x8P24BFGXXe/IDJIHA==
 =IVlI
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "ARM64:

   - Fix the guest view of the ID registers, making the relevant fields
     writable from userspace (affecting ID_AA64DFR0_EL1 and
     ID_AA64PFR1_EL1)

   - Correcly expose S1PIE to guests, fixing a regression introduced in
     6.12-rc1 with the S1POE support

   - Fix the recycling of stage-2 shadow MMUs by tracking the context
     (are we allowed to block or not) as well as the recycling state

   - Address a couple of issues with the vgic when userspace
     misconfigures the emulation, resulting in various splats. Headaches
     courtesy of our Syzkaller friends

   - Stop wasting space in the HYP idmap, as we are dangerously close to
     the 4kB limit, and this has already exploded in -next

   - Fix another race in vgic_init()

   - Fix a UBSAN error when faking the cache topology with MTE enabled

  RISCV:

   - RISCV: KVM: use raw_spinlock for critical section in imsic

  x86:

   - A bandaid for lack of XCR0 setup in selftests, which causes trouble
     if the compiler is configured to have x86-64-v3 (with AVX) as the
     default ISA. Proper XCR0 setup will come in the next merge window.

   - Fix an issue where KVM would not ignore low bits of the nested CR3
     and potentially leak up to 31 bytes out of the guest memory's
     bounds

   - Fix case in which an out-of-date cached value for the segments
     could by returned by KVM_GET_SREGS.

   - More cleanups for KVM_X86_QUIRK_SLOT_ZAP_ALL

   - Override MTRR state for KVM confidential guests, making it WB by
     default as is already the case for Hyper-V guests.

  Generic:

   - Remove a couple of unused functions"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (27 commits)
  RISCV: KVM: use raw_spinlock for critical section in imsic
  KVM: selftests: Fix out-of-bounds reads in CPUID test's array lookups
  KVM: selftests: x86: Avoid using SSE/AVX instructions
  KVM: nSVM: Ignore nCR3[4:0] when loading PDPTEs from memory
  KVM: VMX: reset the segment cache after segment init in vmx_vcpu_reset()
  KVM: x86: Clean up documentation for KVM_X86_QUIRK_SLOT_ZAP_ALL
  KVM: x86/mmu: Add lockdep assert to enforce safe usage of kvm_unmap_gfn_range()
  KVM: x86/mmu: Zap only SPs that shadow gPTEs when deleting memslot
  x86/kvm: Override default caching mode for SEV-SNP and TDX
  KVM: Remove unused kvm_vcpu_gfn_to_pfn_atomic
  KVM: Remove unused kvm_vcpu_gfn_to_pfn
  KVM: arm64: Ensure vgic_ready() is ordered against MMIO registration
  KVM: arm64: vgic: Don't check for vgic_ready() when setting NR_IRQS
  KVM: arm64: Fix shift-out-of-bounds bug
  KVM: arm64: Shave a few bytes from the EL2 idmap code
  KVM: arm64: Don't eagerly teardown the vgic on init error
  KVM: arm64: Expose S1PIE to guests
  KVM: arm64: nv: Clarify safety of allowing TLBI unmaps to reschedule
  KVM: arm64: nv: Punt stage-2 recycling to a vCPU request
  KVM: arm64: nv: Do not block when unmapping stage-2 if disallowed
  ...
2024-10-21 11:22:04 -07:00
Linus Torvalds
db87114dcf - Explicitly disable the TSC deadline timer when going idle to address
some CPU errata in that area
 
 - Do not apply the Zenbleed fix on anything else except AMD Zen2 on the
   late microcode loading path
 
 - Clear CPU buffers later in the NMI exit path on 32-bit to avoid
   register clearing while they still contain sensitive data, for the
   RDFS mitigation
 
 - Do not clobber EFLAGS.ZF with VERW on the opportunistic SYSRET exit
   path on 32-bit
 
 - Fix parsing issues of memory bandwidth specification in sysfs for
   resctrl's memory bandwidth allocation feature
 
 - Other small cleanups and improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmcU6aMACgkQEsHwGGHe
 VUqXPxAAjG0m9J11jBNlNsorPKe0dlhkgV6RpEOtCWov0mvxSAPQazT9PE0FTCvx
 Hm/IdEmj5vkkJOC/R7pga8Yz5fRwGtYwIHyS5618Wh+KAfdsXDgTFvCKaBQt0ltB
 9U5+mwmyzzL6rS6jcv/y28qwi0STd4dHKg6K9sWAtga1bQSPCyJMZjeh9op5CxNh
 QOppCJR23jrp9I9c1zFd1LJPM4GY+KTYXTa7076sfcoD2taHbxAwsC/wiMooh5A2
 k0EItyzy2UWWSUxAW8QhZJyuAWav631tHjcz9iETgNZmjgpR0sTGFGkRaYB74qkf
 vS2yyGpTSoKhxXVcBe7Z6cMf5DhUUjMa7itXZnY7kWCenvwfa3/nuSUKtIeqTPyg
 a6BXypPFyYaqRWHtCiN6KjwXaS+fbc385Fh6m8Q/NDrHnXG84oLQ3DK0WKj4Z37V
 YRflsWJ4ZRIwLALGsKJX+qbe9Oh3VDE3Q8MH9pCiJi227YB2OzyImJmCUBRY9bIC
 7Amw4aUBUxX/VUpUOC4CJnx8SOG7cIeM06E6jM7J6LgWHpee++ccbFpZNqFh3VW/
 j67AifRJFljG+JcyPLZxZ4M/bzpsGkpZ7iiW8wI8k0CPoG7lcvbkZ3pQ4eizAHIJ
 0a+WQ9jHj1/64g4bT7Ml8lZRbzfBG/ksLkRwq8Gakt+h7GQbsd4=
 =n0wZ
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Explicitly disable the TSC deadline timer when going idle to address
   some CPU errata in that area

 - Do not apply the Zenbleed fix on anything else except AMD Zen2 on the
   late microcode loading path

 - Clear CPU buffers later in the NMI exit path on 32-bit to avoid
   register clearing while they still contain sensitive data, for the
   RDFS mitigation

 - Do not clobber EFLAGS.ZF with VERW on the opportunistic SYSRET exit
   path on 32-bit

 - Fix parsing issues of memory bandwidth specification in sysfs for
   resctrl's memory bandwidth allocation feature

 - Other small cleanups and improvements

* tag 'x86_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic: Always explicitly disarm TSC-deadline timer
  x86/CPU/AMD: Only apply Zenbleed fix for Zen2 during late microcode load
  x86/bugs: Use code segment selector for VERW operand
  x86/entry_32: Clear CPU buffers after register restore in NMI return
  x86/entry_32: Do not clobber user EFLAGS.ZF
  x86/resctrl: Annotate get_mem_config() functions as __init
  x86/resctrl: Avoid overflow in MB settings in bw_validate()
  x86/amd_nb: Add new PCI ID for AMD family 1Ah model 20h
2024-10-20 12:04:32 -07:00
Kirill A. Shutemov
8e690b817e x86/kvm: Override default caching mode for SEV-SNP and TDX
AMD SEV-SNP and Intel TDX have limited access to MTRR: either it is not
advertised in CPUID or it cannot be programmed (on TDX, due to #VE on
CR0.CD clear).

This results in guests using uncached mappings where it shouldn't and
pmd/pud_set_huge() failures due to non-uniform memory type reported by
mtrr_type_lookup().

Override MTRR state, making it WB by default as the kernel does for
Hyper-V guests.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Suggested-by: Binbin Wu <binbin.wu@intel.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Message-ID: <20241015095818.357915-1-kirill.shutemov@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-20 07:07:02 -04:00
Zheng Yejian
3bf19a0fb6 x86/unwind/orc: Fix unwind for newly forked tasks
When arch_stack_walk_reliable() is called to unwind for newly forked
tasks, the return value is negative which means the call stack is
unreliable. This obviously does not meet expectations.

The root cause is that after commit 3aec4ecb3d ("x86: Rewrite
 ret_from_fork() in C"), the 'ret_addr' of newly forked task is changed
to 'ret_from_fork_asm' (see copy_thread()), then at the start of the
unwind, it is incorrectly interprets not as a "signal" one because
'ret_from_fork' is still used to determine the initial "signal" (see
__unwind_start()). Then the address gets incorrectly decremented in the
call to orc_find() (see unwind_next_frame()) and resulting in the
incorrect ORC data.

To fix it, check 'ret_from_fork_asm' rather than 'ret_from_fork' in
__unwind_start().

Fixes: 3aec4ecb3d ("x86: Rewrite ret_from_fork() in C")
Signed-off-by: Zheng Yejian <zhengyejian@huaweicloud.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2024-10-17 15:13:07 -07:00
Josh Poimboeuf
ed1cb76ebd objtool: Detect non-relocated text references
When kernel IBT is enabled, objtool detects all text references in order
to determine which functions can be indirectly branched to.

In text, such references look like one of the following:

   mov    $0x0,%rax        R_X86_64_32S     .init.text+0x7e0a0
   lea    0x0(%rip),%rax   R_X86_64_PC32    autoremove_wake_function-0x4

Either way the function pointer is denoted by a relocation, so objtool
just reads that.

However there are some "lea xxx(%rip)" cases which don't use relocations
because they're referencing code in the same translation unit.  Objtool
doesn't have visibility to those.

The only currently known instances of that are a few hand-coded asm text
references which don't actually need ENDBR.  So it's not actually a
problem at the moment.

However if we enable -fpie, the compiler would start generating them and
there would definitely be bugs in the IBT sealing.

Detect non-relocated text references and handle them appropriately.

[ Note: I removed the manual static_call_tramp check -- that should
  already be handled by the noendbr check. ]

Reported-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2024-10-17 15:13:06 -07:00
Bart Van Assche
f642974c0b x86/acpi: Switch to irq_get_nr_irqs() and irq_set_nr_irqs()
Use the irq_get_nr_irqs() and irq_set_nr_irqs() functions instead of the
global variable 'nr_irqs'. Prepare for changing 'nr_irqs' from an
exported global variable into a variable with file scope.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20241015190953.1266194-7-bvanassche@acm.org
2024-10-16 21:56:57 +02:00
Zhang Rui
ffd95846c6 x86/apic: Always explicitly disarm TSC-deadline timer
New processors have become pickier about the local APIC timer state
before entering low power modes. These low power modes are used (for
example) when you close your laptop lid and suspend. If you put your
laptop in a bag and it is not in this low power mode, it is likely
to get quite toasty while it quickly sucks the battery dry.

The problem boils down to some CPUs' inability to power down until the
CPU recognizes that the local APIC timer is shut down. The current
kernel code works in one-shot and periodic modes but does not work for
deadline mode. Deadline mode has been the supported and preferred mode
on Intel CPUs for over a decade and uses an MSR to drive the timer
instead of an APIC register.

Disable the TSC Deadline timer in lapic_timer_shutdown() by writing to
MSR_IA32_TSC_DEADLINE when in TSC-deadline mode. Also avoid writing
to the initial-count register (APIC_TMICT) which is ignored in
TSC-deadline mode.

Note: The APIC_LVTT|=APIC_LVT_MASKED operation should theoretically be
enough to tell the hardware that the timer will not fire in any of the
timer modes. But mitigating AMD erratum 411[1] also requires clearing
out APIC_TMICT. Solely setting APIC_LVT_MASKED is also ineffective in
practice on Intel Lunar Lake systems, which is the motivation for this
change.

1. 411 Processor May Exit Message-Triggered C1E State Without an Interrupt if Local APIC Timer Reaches Zero - https://www.amd.com/content/dam/amd/en/documents/archived-tech-docs/revision-guides/41322_10h_Rev_Gd.pdf

Fixes: 279f146143 ("x86: apic: Use tsc deadline for oneshot when available")
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Todd Brandt <todd.e.brandt@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20241015061522.25288-1-rui.zhang%40intel.com
2024-10-15 05:45:18 -07:00
Christophe JAILLET
29eaa79583 x86/resctrl: Slightly clean-up mbm_config_show()
'mon_info' is already zeroed in the list_for_each_entry() loop below.  There
is no need to explicitly initialize it here. It just wastes some space and
cycles.

Remove this un-needed code.

On a x86_64, with allmodconfig:

  Before:
  ======
     text	   data	    bss	    dec	    hex	filename
    74967	   5103	   1880	  81950	  1401e	arch/x86/kernel/cpu/resctrl/rdtgroup.o

  After:
  =====
     text	   data	    bss	    dec	    hex	filename
    74903	   5103	   1880	  81886	  13fde	arch/x86/kernel/cpu/resctrl/rdtgroup.o

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/b2ebc809c8b6c6440d17b12ccf7c2d29aaafd488.1720868538.git.christophe.jaillet@wanadoo.fr
2024-10-14 18:58:24 +02:00
John Allen
ee4d4e8d2c x86/CPU/AMD: Only apply Zenbleed fix for Zen2 during late microcode load
Commit

  f69759be25 ("x86/CPU/AMD: Move Zenbleed check to the Zen2 init function")

causes a bit in the DE_CFG MSR to get set erroneously after a microcode late
load.

The microcode late load path calls into amd_check_microcode() and subsequently
zen2_zenbleed_check(). Since the above commit removes the cpu_has_amd_erratum()
call from zen2_zenbleed_check(), this will cause all non-Zen2 CPUs to go
through the function and set the bit in the DE_CFG MSR.

Call into the Zenbleed fix path on Zen2 CPUs only.

  [ bp: Massage commit message, use cpu_feature_enabled(). ]

Fixes: f69759be25 ("x86/CPU/AMD: Move Zenbleed check to the Zen2 init function")
Signed-off-by: John Allen <john.allen@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20240923164404.27227-1-john.allen@amd.com
2024-10-11 21:26:45 +02:00
Steven Rostedt
7888af4166 ftrace: Make ftrace_regs abstract from direct use
ftrace_regs was created to hold registers that store information to save
function parameters, return value and stack. Since it is a subset of
pt_regs, it should only be used by its accessor functions. But because
pt_regs can easily be taken from ftrace_regs (on most archs), it is
tempting to use it directly. But when running on other architectures, it
may fail to build or worse, build but crash the kernel!

Instead, make struct ftrace_regs an empty structure and have the
architectures define __arch_ftrace_regs and all the accessor functions
will typecast to it to get to the actual fields. This will help avoid
usage of ftrace_regs directly.

Link: https://lore.kernel.org/all/20241007171027.629bdafd@gandalf.local.home/

Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>
Cc: "x86@kernel.org" <x86@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Paul  Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas  Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav  Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/20241008230628.958778821@goodmis.org
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-10 20:18:01 -04:00
Johannes Wikner
c62fa117c3 x86/bugs: Do not use UNTRAIN_RET with IBPB on entry
Since X86_FEATURE_ENTRY_IBPB will invalidate all harmful predictions
with IBPB, no software-based untraining of returns is needed anymore.
Currently, this change affects retbleed and SRSO mitigations so if
either of the mitigations is doing IBPB and the other one does the
software sequence, the latter is not needed anymore.

  [ bp: Massage commit message. ]

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Johannes Wikner <kwikner@ethz.ch>
Cc: <stable@kernel.org>
2024-10-10 10:38:21 +02:00
Johannes Wikner
0fad287864 x86/bugs: Skip RSB fill at VMEXIT
entry_ibpb() is designed to follow Intel's IBPB specification regardless
of CPU. This includes invalidating RSB entries.

Hence, if IBPB on VMEXIT has been selected, entry_ibpb() as part of the
RET untraining in the VMEXIT path will take care of all BTB and RSB
clearing so there's no need to explicitly fill the RSB anymore.

  [ bp: Massage commit message. ]

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Johannes Wikner <kwikner@ethz.ch>
Cc: <stable@kernel.org>
2024-10-10 10:35:53 +02:00
Johannes Wikner
3ea87dfa31 x86/cpufeatures: Add a IBPB_NO_RET BUG flag
Set this flag if the CPU has an IBPB implementation that does not
invalidate return target predictions. Zen generations < 4 do not flush
the RSB when executing an IBPB and this bug flag denotes that.

  [ bp: Massage. ]

Signed-off-by: Johannes Wikner <kwikner@ethz.ch>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
2024-10-10 10:34:29 +02:00
Nathan Chancellor
d5fd042bf4 x86/resctrl: Annotate get_mem_config() functions as __init
After a recent LLVM change [1] that deduces __cold on functions that only call
cold code (such as __init functions), there is a section mismatch warning from
__get_mem_config_intel(), which got moved to .text.unlikely. as a result of
that optimization:

  WARNING: modpost: vmlinux: section mismatch in reference: \
  __get_mem_config_intel+0x77 (section: .text.unlikely.) -> thread_throttle_mode_init (section: .init.text)

Mark __get_mem_config_intel() as __init as well since it is only called
from __init code, which clears up the warning.

While __rdt_get_mem_config_amd() does not exhibit a warning because it
does not call any __init code, it is a similar function that is only
called from __init code like __get_mem_config_intel(), so mark it __init
as well to keep the code symmetrical.

CONFIG_SECTION_MISMATCH_WARN_ONLY=n would turn this into a fatal error.

Fixes: 05b93417ce ("x86/intel_rdt/mba: Add primary support for Memory Bandwidth Allocation (MBA)")
Fixes: 4d05bf71f1 ("x86/resctrl: Introduce AMD QOS feature")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Cc: <stable@kernel.org>
Link: 6b11573b8c [1]
Link: https://lore.kernel.org/r/20240917-x86-restctrl-get_mem_config_intel-init-v3-1-10d521256284@kernel.org
2024-10-08 21:05:10 +02:00
Martin Kletzander
2b5648416e x86/resctrl: Avoid overflow in MB settings in bw_validate()
The resctrl schemata file supports specifying memory bandwidth associated with
the Memory Bandwidth Allocation (MBA) feature via a percentage (this is the
default) or bandwidth in MiBps (when resctrl is mounted with the "mba_MBps"
option).

The allowed range for the bandwidth percentage is from
/sys/fs/resctrl/info/MB/min_bandwidth to 100, using a granularity of
/sys/fs/resctrl/info/MB/bandwidth_gran. The supported range for the MiBps
bandwidth is 0 to U32_MAX.

There are two issues with parsing of MiBps memory bandwidth:

* The user provided MiBps is mistakenly rounded up to the granularity
  that is unique to percentage input.

* The user provided MiBps is parsed using unsigned long (thus accepting
  values up to ULONG_MAX), and then assigned to u32 that could result in
  overflow.

Do not round up the MiBps value and parse user provided bandwidth as the u32
it is intended to be. Use the appropriate kstrtou32() that can detect out of
range values.

Fixes: 8205a078ba ("x86/intel_rdt/mba_sc: Add schemata support")
Fixes: 6ce1560d35 ("x86/resctrl: Switch over to the resctrl mbps_val list")
Co-developed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Martin Kletzander <nert.pinx@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
2024-10-08 16:17:38 +02:00
Richard Gong
f8bc84b609 x86/amd_nb: Add new PCI ID for AMD family 1Ah model 20h
Add new PCI ID for Device 18h and Function 4.

Signed-off-by: Richard Gong <richard.gong@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: https://lore.kernel.org/r/20240913162903.649519-1-richard.gong@amd.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-10-07 21:04:28 +02:00
Dapeng Mi
2eb2802a41 x86/cpu/intel: Define helper to get CPU core native ID
Define helper get_this_hybrid_cpu_native_id() to return the CPU core
native ID. This core native ID combining with core type can be used to
figure out the CPU core uarch uniquely.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Link: https://lkml.kernel.org/r/20240820073853.1974746-3-dapeng1.mi@linux.intel.com
2024-10-07 09:28:43 +02:00
Paolo Bonzini
c8d430db8e KVM/arm64 fixes for 6.12, take #1
- Fix pKVM error path on init, making sure we do not change critical
   system registers as we're about to fail
 
 - Make sure that the host's vector length is at capped by a value
   common to all CPUs
 
 - Fix kvm_has_feat*() handling of "negative" features, as the current
   code is pretty broken
 
 - Promote Joey to the status of official reviewer, while James steps
   down -- hopefully only temporarly
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmb++hkACgkQI9DQutE9
 ekNDyQ/9GwamcXC4KfYFtfQrcNRl/6RtlF/PFC0R6iiD1OoqNFHv2D/zscxtOj5a
 nw3gbof1Y59eND/6dubDzk82/A1Ff6bXpygybSQ6LG6Jba7H+01XxvvB0SMTLJ1S
 7hREe6m1EBHG/4VJk2Mx8iHJ7OjgZiTivojjZ1tY2Ez3nSUecL8prjqBFft3lAhg
 rFb20iJiijoZDgEjFZq/gWDxPq5m3N51tushqPRIMJ6wt8TeLYx3uUd2DTO0MzG/
 1K2vGbc1O6010jiR+PO3szi7uJFZfb58IsKCx7/w2e9AbzpYx4BXHKCax00DlGAP
 0PiuEMqG82UXR5a58UQrLC2aonh5VNj7J1Lk3qLb0NCimu6PdYWyIGNsKzAF/f4s
 tRVTRqcPr0RN/IIoX9vFjK3CKF9FcwAtctoO7IbxLKp+OGbPXk7Fk/gmhXKRubPR
 +4L4DCcARTcBflnWDzdLaz02fr13UfhM80mekJXlS1YHlSArCfbrsvjNrh4iL+G0
 UDamq8+8ereN0kT+ZM2jw3iw+DaF2kg24OEEfEQcBHZTS9HqBNVPplqqNSWRkjTl
 WSB79q1G6iOYzMUQdULP4vFRv1OePgJzg/voqMRZ6fUSuNgkpyXT0fLf5X12weq9
 NBnJ09Eh5bWfRIpdMzI1E1Qjfsm7E6hEa79DOnHmiLgSdVk3M9o=
 =Rtrz
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-fixes-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.12, take #1

- Fix pKVM error path on init, making sure we do not change critical
  system registers as we're about to fail

- Make sure that the host's vector length is at capped by a value
  common to all CPUs

- Fix kvm_has_feat*() handling of "negative" features, as the current
  code is pretty broken

- Promote Joey to the status of official reviewer, while James steps
  down -- hopefully only temporarly
2024-10-06 03:59:22 -04:00
Paolo Bonzini
2a5fe5a016 x86/reboot: emergency callbacks are now registered by common KVM code
Guard them with CONFIG_KVM_X86_COMMON rather than the two vendor modules.
In practice this has no functional change, because CONFIG_KVM_X86_COMMON
is set if and only if at least one vendor-specific module is being built.
However, it is cleaner to specify CONFIG_KVM_X86_COMMON for functions that
are used in kvm.ko.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: 590b09b1d8 ("KVM: x86: Register "emergency disable" callbacks when virt is enabled")
Fixes: 6d55a94222 ("x86/reboot: Unconditionally define cpu_emergency_virt_cb typedef")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-06 03:55:37 -04:00
Linus Torvalds
3efc57369a x86:
* KVM currently invalidates the entirety of the page tables, not just
   those for the memslot being touched, when a memslot is moved or deleted.
   The former does not have particularly noticeable overhead, but Intel's
   TDX will require the guest to re-accept private pages if they are
   dropped from the secure EPT, which is a non starter.  Actually,
   the only reason why this is not already being done is a bug which
   was never fully investigated and caused VM instability with assigned
   GeForce GPUs, so allow userspace to opt into the new behavior.
 
 * Advertise AVX10.1 to userspace (effectively prep work for the "real" AVX10
   functionality that is on the horizon).
 
 * Rework common MSR handling code to suppress errors on userspace accesses to
   unsupported-but-advertised MSRs.  This will allow removing (almost?) all of
   KVM's exemptions for userspace access to MSRs that shouldn't exist based on
   the vCPU model (the actual cleanup is non-trivial future work).
 
 * Rework KVM's handling of x2APIC ICR, again, because AMD (x2AVIC) splits the
   64-bit value into the legacy ICR and ICR2 storage, whereas Intel (APICv)
   stores the entire 64-bit value at the ICR offset.
 
 * Fix a bug where KVM would fail to exit to userspace if one was triggered by
   a fastpath exit handler.
 
 * Add fastpath handling of HLT VM-Exit to expedite re-entering the guest when
   there's already a pending wake event at the time of the exit.
 
 * Fix a WARN caused by RSM entering a nested guest from SMM with invalid guest
   state, by forcing the vCPU out of guest mode prior to signalling SHUTDOWN
   (the SHUTDOWN hits the VM altogether, not the nested guest)
 
 * Overhaul the "unprotect and retry" logic to more precisely identify cases
   where retrying is actually helpful, and to harden all retry paths against
   putting the guest into an infinite retry loop.
 
 * Add support for yielding, e.g. to honor NEED_RESCHED, when zapping rmaps in
   the shadow MMU.
 
 * Refactor pieces of the shadow MMU related to aging SPTEs in prepartion for
   adding multi generation LRU support in KVM.
 
 * Don't stuff the RSB after VM-Exit when RETPOLINE=y and AutoIBRS is enabled,
   i.e. when the CPU has already flushed the RSB.
 
 * Trace the per-CPU host save area as a VMCB pointer to improve readability
   and cleanup the retrieval of the SEV-ES host save area.
 
 * Remove unnecessary accounting of temporary nested VMCB related allocations.
 
 * Set FINAL/PAGE in the page fault error code for EPT violations if and only
   if the GVA is valid.  If the GVA is NOT valid, there is no guest-side page
   table walk and so stuffing paging related metadata is nonsensical.
 
 * Fix a bug where KVM would incorrectly synthesize a nested VM-Exit instead of
   emulating posted interrupt delivery to L2.
 
 * Add a lockdep assertion to detect unsafe accesses of vmcs12 structures.
 
 * Harden eVMCS loading against an impossible NULL pointer deref (really truly
   should be impossible).
 
 * Minor SGX fix and a cleanup.
 
 * Misc cleanups
 
 Generic:
 
 * Register KVM's cpuhp and syscore callbacks when enabling virtualization in
   hardware, as the sole purpose of said callbacks is to disable and re-enable
   virtualization as needed.
 
 * Enable virtualization when KVM is loaded, not right before the first VM
   is created.  Together with the previous change, this simplifies a
   lot the logic of the callbacks, because their very existence implies
   virtualization is enabled.
 
 * Fix a bug that results in KVM prematurely exiting to userspace for coalesced
   MMIO/PIO in many cases, clean up the related code, and add a testcase.
 
 * Fix a bug in kvm_clear_guest() where it would trigger a buffer overflow _if_
   the gpa+len crosses a page boundary, which thankfully is guaranteed to not
   happen in the current code base.  Add WARNs in more helpers that read/write
   guest memory to detect similar bugs.
 
 Selftests:
 
 * Fix a goof that caused some Hyper-V tests to be skipped when run on bare
   metal, i.e. NOT in a VM.
 
 * Add a regression test for KVM's handling of SHUTDOWN for an SEV-ES guest.
 
 * Explicitly include one-off assets in .gitignore.  Past Sean was completely
   wrong about not being able to detect missing .gitignore entries.
 
 * Verify userspace single-stepping works when KVM happens to handle a VM-Exit
   in its fastpath.
 
 * Misc cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmb201AUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOM1gf+Ij7dpCh0KwoNYlHfW2aCHAv3PqQd
 cKMDSGxoCernbJEyPO/3qXNUK+p4zKedk3d92snW3mKa+cwxMdfthJ3i9d7uoNiw
 7hAgcfKNHDZGqAQXhx8QcVF3wgp+diXSyirR+h1IKrGtCCmjMdNC8ftSYe6voEkw
 VTVbLL+tER5H0Xo5UKaXbnXKDbQvWLXkdIqM8dtLGFGLQ2PnF/DdMP0p6HYrKf1w
 B7LBu0rvqYDL8/pS82mtR3brHJXxAr9m72fOezRLEUbfUdzkTUi/b1vEe6nDCl0Q
 i/PuFlARDLWuetlR0VVWKNbop/C/l4EmwCcKzFHa+gfNH3L9361Oz+NzBw==
 =Q7kz
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull x86 kvm updates from Paolo Bonzini:
 "x86:

   - KVM currently invalidates the entirety of the page tables, not just
     those for the memslot being touched, when a memslot is moved or
     deleted.

     This does not traditionally have particularly noticeable overhead,
     but Intel's TDX will require the guest to re-accept private pages
     if they are dropped from the secure EPT, which is a non starter.

     Actually, the only reason why this is not already being done is a
     bug which was never fully investigated and caused VM instability
     with assigned GeForce GPUs, so allow userspace to opt into the new
     behavior.

   - Advertise AVX10.1 to userspace (effectively prep work for the
     "real" AVX10 functionality that is on the horizon)

   - Rework common MSR handling code to suppress errors on userspace
     accesses to unsupported-but-advertised MSRs

     This will allow removing (almost?) all of KVM's exemptions for
     userspace access to MSRs that shouldn't exist based on the vCPU
     model (the actual cleanup is non-trivial future work)

   - Rework KVM's handling of x2APIC ICR, again, because AMD (x2AVIC)
     splits the 64-bit value into the legacy ICR and ICR2 storage,
     whereas Intel (APICv) stores the entire 64-bit value at the ICR
     offset

   - Fix a bug where KVM would fail to exit to userspace if one was
     triggered by a fastpath exit handler

   - Add fastpath handling of HLT VM-Exit to expedite re-entering the
     guest when there's already a pending wake event at the time of the
     exit

   - Fix a WARN caused by RSM entering a nested guest from SMM with
     invalid guest state, by forcing the vCPU out of guest mode prior to
     signalling SHUTDOWN (the SHUTDOWN hits the VM altogether, not the
     nested guest)

   - Overhaul the "unprotect and retry" logic to more precisely identify
     cases where retrying is actually helpful, and to harden all retry
     paths against putting the guest into an infinite retry loop

   - Add support for yielding, e.g. to honor NEED_RESCHED, when zapping
     rmaps in the shadow MMU

   - Refactor pieces of the shadow MMU related to aging SPTEs in
     prepartion for adding multi generation LRU support in KVM

   - Don't stuff the RSB after VM-Exit when RETPOLINE=y and AutoIBRS is
     enabled, i.e. when the CPU has already flushed the RSB

   - Trace the per-CPU host save area as a VMCB pointer to improve
     readability and cleanup the retrieval of the SEV-ES host save area

   - Remove unnecessary accounting of temporary nested VMCB related
     allocations

   - Set FINAL/PAGE in the page fault error code for EPT violations if
     and only if the GVA is valid. If the GVA is NOT valid, there is no
     guest-side page table walk and so stuffing paging related metadata
     is nonsensical

   - Fix a bug where KVM would incorrectly synthesize a nested VM-Exit
     instead of emulating posted interrupt delivery to L2

   - Add a lockdep assertion to detect unsafe accesses of vmcs12
     structures

   - Harden eVMCS loading against an impossible NULL pointer deref
     (really truly should be impossible)

   - Minor SGX fix and a cleanup

   - Misc cleanups

  Generic:

   - Register KVM's cpuhp and syscore callbacks when enabling
     virtualization in hardware, as the sole purpose of said callbacks
     is to disable and re-enable virtualization as needed

   - Enable virtualization when KVM is loaded, not right before the
     first VM is created

     Together with the previous change, this simplifies a lot the logic
     of the callbacks, because their very existence implies
     virtualization is enabled

   - Fix a bug that results in KVM prematurely exiting to userspace for
     coalesced MMIO/PIO in many cases, clean up the related code, and
     add a testcase

   - Fix a bug in kvm_clear_guest() where it would trigger a buffer
     overflow _if_ the gpa+len crosses a page boundary, which thankfully
     is guaranteed to not happen in the current code base. Add WARNs in
     more helpers that read/write guest memory to detect similar bugs

  Selftests:

   - Fix a goof that caused some Hyper-V tests to be skipped when run on
     bare metal, i.e. NOT in a VM

   - Add a regression test for KVM's handling of SHUTDOWN for an SEV-ES
     guest

   - Explicitly include one-off assets in .gitignore. Past Sean was
     completely wrong about not being able to detect missing .gitignore
     entries

   - Verify userspace single-stepping works when KVM happens to handle a
     VM-Exit in its fastpath

   - Misc cleanups"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits)
  Documentation: KVM: fix warning in "make htmldocs"
  s390: Enable KVM_S390_UCONTROL config in debug_defconfig
  selftests: kvm: s390: Add VM run test case
  KVM: SVM: let alternatives handle the cases when RSB filling is required
  KVM: VMX: Set PFERR_GUEST_{FINAL,PAGE}_MASK if and only if the GVA is valid
  KVM: x86/mmu: Use KVM_PAGES_PER_HPAGE() instead of an open coded equivalent
  KVM: x86/mmu: Add KVM_RMAP_MANY to replace open coded '1' and '1ul' literals
  KVM: x86/mmu: Fold mmu_spte_age() into kvm_rmap_age_gfn_range()
  KVM: x86/mmu: Morph kvm_handle_gfn_range() into an aging specific helper
  KVM: x86/mmu: Honor NEED_RESCHED when zapping rmaps and blocking is allowed
  KVM: x86/mmu: Add a helper to walk and zap rmaps for a memslot
  KVM: x86/mmu: Plumb a @can_yield parameter into __walk_slot_rmaps()
  KVM: x86/mmu: Move walk_slot_rmaps() up near for_each_slot_rmap_range()
  KVM: x86/mmu: WARN on MMIO cache hit when emulating write-protected gfn
  KVM: x86/mmu: Detect if unprotect will do anything based on invalid_list
  KVM: x86/mmu: Subsume kvm_mmu_unprotect_page() into the and_retry() version
  KVM: x86: Rename reexecute_instruction()=>kvm_unprotect_and_retry_on_failure()
  KVM: x86: Update retry protection fields when forcing retry on emulation failure
  KVM: x86: Apply retry protection to "unprotect on failure" path
  KVM: x86: Check EMULTYPE_WRITE_PF_TO_SP before unprotecting gfn
  ...
2024-09-28 09:20:14 -07:00