2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

6376 Commits

Author SHA1 Message Date
Russell Senior
bebe35bb73 x86/CPU: Fix warm boot hang regression on AMD SC1100 SoC systems
I still have some Soekris net4826 in a Community Wireless Network I
volunteer with. These devices use an AMD SC1100 SoC. I am running
OpenWrt on them, which uses a patched kernel, that naturally has
evolved over time.  I haven't updated the ones in the field in a
number of years (circa 2017), but have one in a test bed, where I have
intermittently tried out test builds.

A few years ago, I noticed some trouble, particularly when "warm
booting", that is, doing a reboot without removing power, and noticed
the device was hanging after the kernel message:

  [    0.081615] Working around Cyrix MediaGX virtual DMA bugs.

If I removed power and then restarted, it would boot fine, continuing
through the message above, thusly:

  [    0.081615] Working around Cyrix MediaGX virtual DMA bugs.
  [    0.090076] Enable Memory-Write-back mode on Cyrix/NSC processor.
  [    0.100000] Enable Memory access reorder on Cyrix/NSC processor.
  [    0.100070] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
  [    0.110058] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
  [    0.120037] CPU: NSC Geode(TM) Integrated Processor by National Semi (family: 0x5, model: 0x9, stepping: 0x1)
  [...]

In order to continue using modern tools, like ssh, to interact with
the software on these old devices, I need modern builds of the OpenWrt
firmware on the devices. I confirmed that the warm boot hang was still
an issue in modern OpenWrt builds (currently using a patched linux
v6.6.65).

Last night, I decided it was time to get to the bottom of the warm
boot hang, and began bisecting. From preserved builds, I narrowed down
the bisection window from late February to late May 2019. During this
period, the OpenWrt builds were using 4.14.x. I was able to build
using period-correct Ubuntu 18.04.6. After a number of bisection
iterations, I identified a kernel bump from 4.14.112 to 4.14.113 as
the commit that introduced the warm boot hang.

  07aaa7e3d6

Looking at the upstream changes in the stable kernel between 4.14.112
and 4.14.113 (tig v4.14.112..v4.14.113), I spotted a likely suspect:

  https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=20afb90f730982882e65b01fb8bdfe83914339c5

So, I tried reverting just that kernel change on top of the breaking
OpenWrt commit, and my warm boot hang went away.

Presumably, the warm boot hang is due to some register not getting
cleared in the same way that a loss of power does. That is
approximately as much as I understand about the problem.

More poking/prodding and coaching from Jonas Gorski, it looks
like this test patch fixes the problem on my board: Tested against
v6.6.67 and v4.14.113.

Fixes: 18fb053f9b ("x86/cpu/cyrix: Use correct macros for Cyrix calls on Geode processors")
Debugged-by: Jonas Gorski <jonas.gorski@gmail.com>
Signed-off-by: Russell Senior <russell@personaltelco.net>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/CAHP3WfOgs3Ms4Z+L9i0-iBOE21sdMk5erAiJurPjnrL9LSsgRA@mail.gmail.com
Cc: Matthew Whitehead <tedheadster@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
2025-02-25 22:44:01 +01:00
Linus Torvalds
5cf80612d3 Miscellaneous x86 fixes:
- Fix AVX-VNNI CPU feature dependency bug triggered via
    the 'noxsave' boot option
 
  - Fix typos in the SVA documentation
 
  - Add Tony Luck as RDT co-maintainer and remove Fenghua Yu
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAme52akRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hP5g/9He4e+kBsObHbnr4vUcgHi4LR3xcFdBTM
 9rHRNeLDbPZl85n1rrbKNp+z35bLqs02ULGIbM4y40UwgbRco6Ax/6VCWxbQyFMO
 CMQRUQhlmnT7k3sSJ5/AH6V/uInjv2vzH2CnBQMm756GFOqKe0Erk/rGn7/xHbUv
 Ji0K6Hm35dgJpYpyhxAPUM6047lJ2DivjFm8h/Gd646sMmjUQRr+DcHCRb4Dda1g
 goT/dr7OhTE3jBThD8TPzEZeJzyR66YoknOEQPHx4SLzvYXKrDjnrPmf9VzI8VZX
 MAHM2Cs/lYhwC1lC03p1FT0ExCAS1NWDyhPQFza0dP4zYxdTMmXwOPkgWu+BBesn
 R7mSO9LdjDIAaxXPHhs8tdpQImfZzRr/S7T5vq63eWW/yT16TMdfhhvol54X6dgT
 67wxKXRxInwWQVnA3y+YtmwxxR7l1xmqSZ5fDK5VGz4TSRTg8rZxvv4pWydcJ9Or
 sEWi88qNGrGnevXYVjowyyOrAG4KyfmC7Idgytkjezh1m0HyF+LY/0f7rlHc0eXD
 1HAMu0cGun4WkOyoAuCmgdV0WmKDycihQO/U3o2mvtSvc+kAjGGss/b4PXzJTVmk
 8FQsItveOBffvWKxdaDz9R5oOZ0Vzzo3NnIkrvEWfBSBP2XcH3Qy0Xr4prXUA3cl
 p+/KdMjNohs=
 =CR8P
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2025-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:

 - Fix AVX-VNNI CPU feature dependency bug triggered via the 'noxsave'
   boot option

 - Fix typos in the SVA documentation

 - Add Tony Luck as RDT co-maintainer and remove Fenghua Yu

* tag 'x86-urgent-2025-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  docs: arch/x86/sva: Fix two grammar errors under Background and FAQ
  x86/cpufeatures: Make AVX-VNNI depend on AVX
  MAINTAINERS: Change maintainer for RDT
2025-02-22 10:45:02 -08:00
Eric Biggers
5171207284 x86/cpufeatures: Make AVX-VNNI depend on AVX
The 'noxsave' boot option disables support for AVX, but support for the
AVX-VNNI feature was still declared on CPUs that support it.  Fix this.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20250220060124.89622-1-ebiggers@kernel.org
2025-02-21 14:19:16 +01:00
Patrick Bellasi
318e8c339c x86/cpu/kvm: SRSO: Fix possible missing IBPB on VM-Exit
In [1] the meaning of the synthetic IBPB flags has been redefined for a
better separation of concerns:
 - ENTRY_IBPB     -- issue IBPB on entry only
 - IBPB_ON_VMEXIT -- issue IBPB on VM-Exit only
and the Retbleed mitigations have been updated to match this new
semantics.

Commit [2] was merged shortly before [1], and their interaction was not
handled properly. This resulted in IBPB not being triggered on VM-Exit
in all SRSO mitigation configs requesting an IBPB there.

Specifically, an IBPB on VM-Exit is triggered only when
X86_FEATURE_IBPB_ON_VMEXIT is set. However:

 - X86_FEATURE_IBPB_ON_VMEXIT is not set for "spec_rstack_overflow=ibpb",
   because before [1] having X86_FEATURE_ENTRY_IBPB was enough. Hence,
   an IBPB is triggered on entry but the expected IBPB on VM-exit is
   not.

 - X86_FEATURE_IBPB_ON_VMEXIT is not set also when
   "spec_rstack_overflow=ibpb-vmexit" if X86_FEATURE_ENTRY_IBPB is
   already set.

   That's because before [1] this was effectively redundant. Hence, e.g.
   a "retbleed=ibpb spec_rstack_overflow=bpb-vmexit" config mistakenly
   reports the machine still vulnerable to SRSO, despite an IBPB being
   triggered both on entry and VM-Exit, because of the Retbleed selected
   mitigation config.

 - UNTRAIN_RET_VM won't still actually do anything unless
   CONFIG_MITIGATION_IBPB_ENTRY is set.

For "spec_rstack_overflow=ibpb", enable IBPB on both entry and VM-Exit
and clear X86_FEATURE_RSB_VMEXIT which is made superfluous by
X86_FEATURE_IBPB_ON_VMEXIT. This effectively makes this mitigation
option similar to the one for 'retbleed=ibpb', thus re-order the code
for the RETBLEED_MITIGATION_IBPB option to be less confusing by having
all features enabling before the disabling of the not needed ones.

For "spec_rstack_overflow=ibpb-vmexit", guard this mitigation setting
with CONFIG_MITIGATION_IBPB_ENTRY to ensure UNTRAIN_RET_VM sequence is
effectively compiled in. Drop instead the CONFIG_MITIGATION_SRSO guard,
since none of the SRSO compile cruft is required in this configuration.
Also, check only that the required microcode is present to effectively
enabled the IBPB on VM-Exit.

Finally, update the KConfig description for CONFIG_MITIGATION_IBPB_ENTRY
to list also all SRSO config settings enabled by this guard.

Fixes: 864bcaa38e ("x86/cpu/kvm: Provide UNTRAIN_RET_VM") [1]
Fixes: d893832d0e ("x86/srso: Add IBPB on VMEXIT") [2]
Reported-by: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Patrick Bellasi <derkling@google.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-02-11 10:07:52 -08:00
Joel Granados
1751f872cc treewide: const qualify ctl_tables where applicable
Add the const qualifier to all the ctl_tables in the tree except for
watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls,
loadpin_sysctl_table and the ones calling register_net_sysctl (./net,
drivers/inifiniband dirs). These are special cases as they use a
registration function with a non-const qualified ctl_table argument or
modify the arrays before passing them on to the registration function.

Constifying ctl_table structs will prevent the modification of
proc_handler function pointers as the arrays would reside in .rodata.
This is made possible after commit 78eb4ea25c ("sysctl: treewide:
constify the ctl_table argument of proc_handlers") constified all the
proc_handlers.

Created this by running an spatch followed by a sed command:
Spatch:
    virtual patch

    @
    depends on !(file in "net")
    disable optional_qualifier
    @

    identifier table_name != {
      watchdog_hardlockup_sysctl,
      iwcm_ctl_table,
      ucma_ctl_table,
      memory_allocation_profiling_sysctls,
      loadpin_sysctl_table
    };
    @@

    + const
    struct ctl_table table_name [] = { ... };

sed:
    sed --in-place \
      -e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \
      kernel/utsname_sysctl.c

Reviewed-by: Song Liu <song@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> # for kernel/trace/
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> # SCSI
Reviewed-by: Darrick J. Wong <djwong@kernel.org> # xfs
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Acked-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-01-28 13:48:37 +01:00
Linus Torvalds
382e391365 hyperv-next for v6.14
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmeTFQ4THHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXqMWB/4uHjnu50u+m00OwXAKQr6i92zh50BZ
 RQragd9s9C8tuUNwPDmS/ct2BNAhoy43KJ0ClegdZjKxT1Ys8cLv4Wr5CaGckqWq
 +WCHqTgt+cPe0vUofqahB5wiAZMsnBgzFkV/OfFwBx0wkub9y5T3qVq5KapYlaDI
 7Gftb+wg1AAsrdZ/HuLRy5ZVvkM/73rU2uoi8WXjr/T14E1krCFR/qirLd1OXo6Q
 Jb97qhnCt/N9JPwIq5/VnYWde5Mpqz6UgtA2rFLDXgNGz+h9/ND6ecWFHjZWNVdc
 AKWZTO5t+fRVBOSyahoyRoYSntPw3wlxyL7A2/54h6j4Dex7wLt6NQBj
 =empO
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-next-signed-20250123' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyperv updates from Wei Liu:

 - Introduce a new set of Hyper-V headers in include/hyperv and replace
   the old hyperv-tlfs.h with the new headers (Nuno Das Neves)

 - Fixes for the Hyper-V VTL mode (Roman Kisel)

 - Fixes for cpu mask usage in Hyper-V code (Michael Kelley)

 - Document the guest VM hibernation behaviour (Michael Kelley)

 - Miscellaneous fixes and cleanups (Jacob Pan, John Starks, Naman Jain)

* tag 'hyperv-next-signed-20250123' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  Documentation: hyperv: Add overview of guest VM hibernation
  hyperv: Do not overlap the hvcall IO areas in hv_vtl_apicid_to_vp_id()
  hyperv: Do not overlap the hvcall IO areas in get_vtl()
  hyperv: Enable the hypercall output page for the VTL mode
  hv_balloon: Fallback to generic_online_page() for non-HV hot added mem
  Drivers: hv: vmbus: Log on missing offers if any
  Drivers: hv: vmbus: Wait for boot-time offers during boot and resume
  uio_hv_generic: Add a check for HV_NIC for send, receive buffers setup
  iommu/hyper-v: Don't assume cpu_possible_mask is dense
  Drivers: hv: Don't assume cpu_possible_mask is dense
  x86/hyperv: Don't assume cpu_possible_mask is dense
  hyperv: Remove the now unused hyperv-tlfs.h files
  hyperv: Switch from hyperv-tlfs.h to hyperv/hvhdk.h
  hyperv: Add new Hyper-V headers in include/hyperv
  hyperv: Clean up unnecessary #includes
  hyperv: Move hv_connection_id to hyperv-tlfs.h
2025-01-25 09:22:55 -08:00
Linus Torvalds
858df1de21 Miscellaneous x86 cleanups and typo fixes, and also the removal
of the "disablelapic" boot parameter.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmePTD8RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jf5g//Wo1WKUXukRrBANr2nIlx9B7xJliRmUxv
 mJ0VKo49YPl6C34fjSHhBs3+nPbYD+CyWVKAz5PqkfkFRGBgpQi26EnyKaIhLVFW
 HWhW5vQm/FJfzBIrfFg7g/H1PK+rEYa4mv8JF9vhwp7BOfuqx4ABGKWQnrvOGg2B
 VivE5k7/kxWRPTg45Kgb1iwlS2gcfWCRi9qdCzdJgY/4XYE6k6hKeV0PgTT3Vojf
 pZKsgZRq8tzMaX75obtyyrX3TWj0nkRec0XbgyXBFvlFh/l3e0RswxzGGAjrC1XP
 R+qmscdCkczUwRGc1mGj9MoCqMRRffU6/hTNsjqu8o7Q2gzZzXWHcUc+X7UwOeKZ
 2guxOj4iagdn7+mIso6uAjY+OOdFVw7/C8ysbCmwo3MiaDsfaK2NkdBoT2xDWuIw
 NP/45RMpTIsgL0wG6upzXXApKgYxfWhNSq+oHDF4/TjWY4i779hjMghvtX1BI7yb
 LXIh2SsRcnmEPl42UGaz6xmdmkulWZPPxI5rghixU48Eazkngfp7ZTHYpm5NFoRP
 Qc3JNcKo7rGmkoo/sA7uwawjnaTz/H77SDNjfAufzjVAKidvUqW6xaK/8JM1fq0n
 du+9sQN5MrAqdKx5Lu624s/7ektwkDeUdQFGazqS9y0GBT25T9Rw+LQDuec7BG3p
 v8sok4IaPA0=
 =Hzj3
 -----END PGP SIGNATURE-----

Merge tag 'x86-cleanups-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cleanups from Ingo Molnar:
 "Miscellaneous x86 cleanups and typo fixes, and also the removal of
  the 'disablelapic' boot parameter"

* tag 'x86-cleanups-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ioapic: Remove a stray tab in the IO-APIC type string
  x86/cpufeatures: Remove "AMD" from the comments to the AMD-specific leaf
  Documentation/kernel-parameters: Fix a typo in kvm.enable_virt_at_load text
  x86/cpu: Fix typo in x86_match_cpu()'s doc
  x86/apic: Remove "disablelapic" cmdline option
  Documentation: Merge x86-specific boot options doc into kernel-parameters.txt
  x86/ioremap: Remove unused size parameter in remapping functions
  x86/ioremap: Simplify setup_data mapping variants
  x86/boot/compressed: Remove unused header includes from kaslr.c
2025-01-21 11:15:29 -08:00
Linus Torvalds
6c4aa896eb Performance events changes for v6.14:
- Seqlock optimizations that arose in a perf context and were
    merged into the perf tree:
 
    - seqlock: Add raw_seqcount_try_begin (Suren Baghdasaryan)
    - mm: Convert mm_lock_seq to a proper seqcount ((Suren Baghdasaryan)
    - mm: Introduce mmap_lock_speculate_{try_begin|retry} (Suren Baghdasaryan)
    - mm/gup: Use raw_seqcount_try_begin() (Peter Zijlstra)
 
  - Core perf enhancements:
 
    - Reduce 'struct page' footprint of perf by mapping pages
      in advance (Lorenzo Stoakes)
    - Save raw sample data conditionally based on sample type (Yabin Cui)
    - Reduce sampling overhead by checking sample_type in
      perf_sample_save_callchain() and perf_sample_save_brstack() (Yabin Cui)
    - Export perf_exclude_event() (Namhyung Kim)
 
  - Uprobes scalability enhancements: (Andrii Nakryiko)
 
    - Simplify find_active_uprobe_rcu() VMA checks
    - Add speculative lockless VMA-to-inode-to-uprobe resolution
    - Simplify session consumer tracking
    - Decouple return_instance list traversal and freeing
    - Ensure return_instance is detached from the list before freeing
    - Reuse return_instances between multiple uretprobes within task
    - Guard against kmemdup() failing in dup_return_instance()
 
  - AMD core PMU driver enhancements:
 
    - Relax privilege filter restriction on AMD IBS (Namhyung Kim)
 
  - AMD RAPL energy counters support: (Dhananjay Ugwekar)
 
    - Introduce topology_logical_core_id() (K Prateek Nayak)
 
    - Remove the unused get_rapl_pmu_cpumask() function
    - Remove the cpu_to_rapl_pmu() function
    - Rename rapl_pmu variables
    - Make rapl_model struct global
    - Add arguments to the init and cleanup functions
    - Modify the generic variable names to *_pkg*
    - Remove the global variable rapl_msrs
    - Move the cntr_mask to rapl_pmus struct
    - Add core energy counter support for AMD CPUs
 
  - Intel core PMU driver enhancements:
 
    - Support RDPMC 'metrics clear mode' feature (Kan Liang)
    - Clarify adaptive PEBS processing (Kan Liang)
    - Factor out functions for PEBS records processing (Kan Liang)
    - Simplify the PEBS records processing for adaptive PEBS (Kan Liang)
 
  - Intel uncore driver enhancements: (Kan Liang)
 
    - Convert buggy pmu->func_id use to pmu->registered
    - Support more units on Granite Rapids
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmeOJdQRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1i2yQ/+MXl7yfJOgdbwjBpgGGzH4burEO7ppak+
 ktzz+YjpNgjODe/xMAJGjjblouuYArCnRolc1UPvPm6M7jSY76wi42Y6c4dRtFoB
 2ReSrRqnreLOcrRS9nsTjvWRHfJHqJDVSd9TfHX6ILfzbaizCZOGYk558ZxAKRqu
 Lw7FOvLEe/Y3tg4z8dDg083jsasalKySP9wIPc0BkSqQTOfusd3KXju/Fux/9wkn
 hZcUgF4ds+0bH7xtO1/G9ILqGyeq97X1McIR9bAjln5Mxykclen4hSjRaWWHHo9O
 mzBKmd/blIATisfuuW+QLDQow3M1k3688cz7e9QOeWHHd/dJiMb9RLV90jdND/T/
 uLINC5vNemzyWEfnNiYQ31LjhG3SeuDiKWzRp36MbQcCh6EBdRXWLBgtmxq1L/3o
 ZCaCdtFu5+6epycdyOVZEpWDnjdx4GmLXMZi5WJfZ7fZ/IFjNkjk4OdzI1iRQ+i3
 Sbi75ep59ayTUhm5AB7gCJsP3R7EsZsiPHUenQdA2n9Sj6xE+IuhlS/QDQ9g5mdY
 Ijs0jHeVCGmhYoOD1xWnCZSzlnkEVU3zwfypAK+MC7pgtFMwDy5/Bu1USGxXXDy+
 aKsrJRSgHbtZ1gwoHstqkV+DeCTfElCLYkvigzI5Nmyib5Zp4vkwy2ZLWQjaNjm7
 mqRI7PugUkU=
 =c8XB
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2025-01-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull performance events updates from Ingo Molnar:
 "Seqlock optimizations that arose in a perf context and were merged
  into the perf tree:

   - seqlock: Add raw_seqcount_try_begin (Suren Baghdasaryan)
   - mm: Convert mm_lock_seq to a proper seqcount (Suren Baghdasaryan)
   - mm: Introduce mmap_lock_speculate_{try_begin|retry} (Suren
     Baghdasaryan)
   - mm/gup: Use raw_seqcount_try_begin() (Peter Zijlstra)

  Core perf enhancements:

   - Reduce 'struct page' footprint of perf by mapping pages in advance
     (Lorenzo Stoakes)
   - Save raw sample data conditionally based on sample type (Yabin Cui)
   - Reduce sampling overhead by checking sample_type in
     perf_sample_save_callchain() and perf_sample_save_brstack() (Yabin
     Cui)
   - Export perf_exclude_event() (Namhyung Kim)

  Uprobes scalability enhancements: (Andrii Nakryiko)

   - Simplify find_active_uprobe_rcu() VMA checks
   - Add speculative lockless VMA-to-inode-to-uprobe resolution
   - Simplify session consumer tracking
   - Decouple return_instance list traversal and freeing
   - Ensure return_instance is detached from the list before freeing
   - Reuse return_instances between multiple uretprobes within task
   - Guard against kmemdup() failing in dup_return_instance()

  AMD core PMU driver enhancements:

   - Relax privilege filter restriction on AMD IBS (Namhyung Kim)

  AMD RAPL energy counters support: (Dhananjay Ugwekar)

   - Introduce topology_logical_core_id() (K Prateek Nayak)
   - Remove the unused get_rapl_pmu_cpumask() function
   - Remove the cpu_to_rapl_pmu() function
   - Rename rapl_pmu variables
   - Make rapl_model struct global
   - Add arguments to the init and cleanup functions
   - Modify the generic variable names to *_pkg*
   - Remove the global variable rapl_msrs
   - Move the cntr_mask to rapl_pmus struct
   - Add core energy counter support for AMD CPUs

  Intel core PMU driver enhancements:

   - Support RDPMC 'metrics clear mode' feature (Kan Liang)
   - Clarify adaptive PEBS processing (Kan Liang)
   - Factor out functions for PEBS records processing (Kan Liang)
   - Simplify the PEBS records processing for adaptive PEBS (Kan Liang)

  Intel uncore driver enhancements: (Kan Liang)

   - Convert buggy pmu->func_id use to pmu->registered
   - Support more units on Granite Rapids"

* tag 'perf-core-2025-01-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
  perf: map pages in advance
  perf/x86/intel/uncore: Support more units on Granite Rapids
  perf/x86/intel/uncore: Clean up func_id
  perf/x86/intel: Support RDPMC metrics clear mode
  uprobes: Guard against kmemdup() failing in dup_return_instance()
  perf/x86: Relax privilege filter restriction on AMD IBS
  perf/core: Export perf_exclude_event()
  uprobes: Reuse return_instances between multiple uretprobes within task
  uprobes: Ensure return_instance is detached from the list before freeing
  uprobes: Decouple return_instance list traversal and freeing
  uprobes: Simplify session consumer tracking
  uprobes: add speculative lockless VMA-to-inode-to-uprobe resolution
  uprobes: simplify find_active_uprobe_rcu() VMA checks
  mm: introduce mmap_lock_speculate_{try_begin|retry}
  mm: convert mm_lock_seq to a proper seqcount
  mm/gup: Use raw_seqcount_try_begin()
  seqlock: add raw_seqcount_try_begin
  perf/x86/rapl: Add core energy counter support for AMD CPUs
  perf/x86/rapl: Move the cntr_mask to rapl_pmus struct
  perf/x86/rapl: Remove the global variable rapl_msrs
  ...
2025-01-21 10:52:03 -08:00
Linus Torvalds
b9d8a295ed - The first part of a restructuring of AMD's representation of a northbridge
which is legacy now, and the creation of the new AMD node concept which
   represents the Zen architecture of having a collection of I/O devices within
   an SoC. Those nodes comprise the so-called data fabric on Zen. This has
   at least one practical advantage of not having to add a PCI ID each time
   a new data fabric PCI device releases. Eventually, the lot more uniform
   provider of data fabric functionality amd_node.c will be used by all the
   drivers which need it
 
 - Smaller cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmePuPIACgkQEsHwGGHe
 VUpU6Q//S9j9+YC9EpredFoJ5W0BfERR5XOum7YjlLxq2mVTStrf9Q1ecrwmS4Q6
 4mAydIDfhqNlouUjMBgNNFJcvm8lat+/pjY78oT8ZdjumslMbMxo81VmQ3fX+6fE
 izMrL81DG4j8zeleUyz5ecJEK/KPw1s3SkY736511PeJSalOU4hLYmU819imfAk/
 5c9os2GNhszIROE1YUYZQ3zXne1t2PNXKvctzVrJYjyKpIDgFNzTj6gXhePzXBNO
 iFdApqSgKdnnsD6VsfxYVnOKP+cSIl27Tbge6dm7DHQbSs00aVL64JPcX8/hWtp6
 ExrwBYiFk6yafwsNUu7/PmqbZNKYxDgvXFq8jSOFfioh6Km/QZYs8y1/qXN3qmSU
 78Ah5jyO+U+++FsSa2o9eRpU2l84UIQqvp84PeSLylzh7iLFyFCWsMfreNeIsF9v
 Jsost58JQOCufRK3qfMiDO88QUZRKyCfFymDAVcvPoBwp5nK9R1ohlbxgXrCPsE7
 Bd7J6jrlpcoRyYc8vhshkrnK2Sk6pP77OZOh5AZ9AybnALH0afUNLzk6sBtaObkZ
 xIJcSIBkKz3P4zWFKsXmqGYHWp1IsKsYRsNjCt5FExWOF+uKKKBjynHmlKeS0l/b
 J6bwDUPVW/gfkBqDV8bILultj9Gm8L5Z8SwvD1ww69OYN+c7oVk=
 =ZAjD
 -----END PGP SIGNATURE-----

Merge tag 'x86_misc_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 updates from Borislav Petkov:

 - The first part of a restructuring of AMD's representation of a
   northbridge which is legacy now, and the creation of the new AMD node
   concept which represents the Zen architecture of having a collection
   of I/O devices within an SoC. Those nodes comprise the so-called data
   fabric on Zen.

   This has at least one practical advantage of not having to add a PCI
   ID each time a new data fabric PCI device releases. Eventually, the
   lot more uniform provider of data fabric functionality amd_node.c
   will be used by all the drivers which need it

 - Smaller cleanups

* tag 'x86_misc_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/amd_node: Use defines for SMN register offsets
  x86/amd_node: Remove dependency on AMD_NB
  x86/amd_node: Update __amd_smn_rw() error paths
  x86/amd_nb: Move SMN access code to a new amd_node driver
  x86/amd_nb, hwmon: (k10temp): Simplify amd_pci_dev_to_node_id()
  x86/amd_nb: Simplify function 3 search
  x86/amd_nb: Use topology info to get AMD node count
  x86/amd_nb: Simplify root device search
  x86/amd_nb: Simplify function 4 search
  x86: Start moving AMD node functionality out of AMD_NB
  x86/amd_nb: Clean up early_is_amd_nb()
  x86/amd_nb: Restrict init function to AMD-based systems
  x86/mtrr: Rename mtrr_overwrite_state() to guest_force_mtrr_state()
2025-01-21 09:38:52 -08:00
Linus Torvalds
48795f90cb - Remove the less generic CPU matching infra around struct x86_cpu_desc and
use the generic struct x86_cpu_id thing
 
 - Remove magic naked numbers for CPUID functions and use proper defines of the
   prefix CPUID_LEAF_*. Consolidate some of the crazy use around the tree
 
 - Smaller cleanups and improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmePjeIACgkQEsHwGGHe
 VUqRBA//TinKFcWagaQB3lsnoBRwqyg6JJZIBNMF9sBMDD9HnvEZ/JduC+3+g1rx
 iztuCmRSgQsi/QvRaEFNuDMOgk6gACyXxi7Uf6eXsQkSlsZFViaqbXsy9kqslRbl
 7QP1NS1sfdSd42JPp2UZT/lg9kluuVnn5b40zZIwy2AAzwrNFfZAS4Yg7Qe4XQDF
 xBcHi8MAF+LTm5Tv0hLmx2UcfZLhi7hXy8mTAIFS0Liww+Y5qaam33xw9KxNU5lZ
 tVepzY5my43pRs4MB1CvaQCiZ84GxvAVqz3JYsg5YhVp45xh7P2WtjBeeOqLljaW
 MkWnDLOmlaD4Y0kL4QA3ReyBVux54RbDGKC0E/t5fwYlk3dQ7gYwSEvh5358R+0z
 kwxw3NdnNngoLRXAX45EonSxj36jb6KCBHAGqXSfL73OOt30RWCqknEnixcOp/BP
 chNxCiIx7qko+rAYOD62QkguEEPFdb8roeayhIKtiKL5zUwQAr+jt/pKVx2htWLi
 xxqSaVoCFu4edWpsEJnanqhS0Es0v7YiBU3jDC37rZJ+dtzf0C2ewD7Nb1g+wUTn
 NzDkmt58hQW4jBxoxHBIclLfhEETISTEGAAObTa5I5r8IDb7Dv+ZnSv7RfjoR9fL
 RWMz1bJ1Scem+Fx7fc/IRJFSElC41giSwFlhThHdAzI1m95zJN8=
 =9Hdg
 -----END PGP SIGNATURE-----

Merge tag 'x86_cpu_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cpuid updates from Borislav Petkov:

 - Remove the less generic CPU matching infra around struct x86_cpu_desc
   and use the generic struct x86_cpu_id thing

 - Remove magic naked numbers for CPUID functions and use proper defines
   of the prefix CPUID_LEAF_*. Consolidate some of the crazy use around
   the tree

 - Smaller cleanups and improvements

* tag 'x86_cpu_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Make all all CPUID leaf names consistent
  x86/fpu: Remove unnecessary CPUID level check
  x86/fpu: Move CPUID leaf definitions to common code
  x86/tsc: Remove CPUID "frequency" leaf magic numbers.
  x86/tsc: Move away from TSC leaf magic numbers
  x86/cpu: Move TSC CPUID leaf definition
  x86/cpu: Refresh DCA leaf reading code
  x86/cpu: Remove unnecessary MwAIT leaf checks
  x86/cpu: Use MWAIT leaf definition
  x86/cpu: Move MWAIT leaf definition to common header
  x86/cpu: Remove 'x86_cpu_desc' infrastructure
  x86/cpu: Move AMD erratum 1386 table over to 'x86_cpu_id'
  x86/cpu: Replace PEBS use of 'x86_cpu_desc' use with 'x86_cpu_id'
  x86/cpu: Expose only stepping min/max interface
  x86/cpu: Introduce new microcode matching helper
  x86/cpufeature: Document cpu_feature_enabled() as the default to use
  x86/paravirt: Remove the WBINVD callback
  x86/cpufeatures: Free up unused feature bits
2025-01-21 09:30:59 -08:00
Linus Torvalds
13b6931c44 - A segmented Reverse Map table (RMP) is a across-nodes distributed
table of sorts which contains per-node descriptors of each node-local
   4K page, denoting its ownership (hypervisor, guest, etc) in the realm
   of confidential computing.  Add support for such a table in order to
   improve referential locality when accessing or modifying RMP table
   entries
 
 - Add support for reading the TSC in SNP guests by removing any
   interference or influence the hypervisor might have, with the goal of
   making a confidential guest even more independent from the hypervisor
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmeOYLsACgkQEsHwGGHe
 VUrywg//WBuywe3+TNPwF0Iw8becqtD7lKMftmUoqpcf20JhiHSCexb+3/r7U2Kb
 WL1/T5cxX1rA45HzkwovUljlvin8B9bdpY40dUqrKFPMnWLfs4ru0HPA6UxPBsAq
 r/8XrXuRrI22MLbrAeQ2xSt8dqw3DpbJyUcyr0qOb6OsbtAy05uElYCzMSyzT06F
 QsTmenosuJqSo1gIGTxfU4nKyd1o8EJ5b1ThK11hvZaIOffgLjEU6g39cG9AeF4X
 TOkh9CdIlQc3ot14rJeWMy15YEW+xBdXdMEv0ZPOSZiKzTHA7wwdl0VmPm1EK57f
 BQkZikuoJezJA0r5wSwVgslTaYO0GTXNewwL5jxK1mqRgoK06IgC6xAkX8N7NTYL
 K6DX+tfaKjSJGY1z9TYOzs+wGV4MBAXmbLwnuhcPumkTYXPFbRFZqx6ec2BLIU+Y
 bZfwhlr3q+bfFeBYMzyWPHJ87JinOjwu4Ah0uLVmkoRtgb0S3pIdlyRYZAcEl6fn
 Tgfu0/RNLGGsH/a3BF7AQdt+hOv1ms5hEMYXg++30uC59LR8XbuKnLdUPRi0nVeD
 e9xyxFybu5ySesnnXabtaO9bSUF+8HV4nkclKglFvuHpLMQ5GlPxTnBj1V1podYR
 l12G2htXKsSV5JJK4x+WfYBe6Nn3tbcpgZD8M8g0lso8kejqMjs=
 =hh1m
 -----END PGP SIGNATURE-----

Merge tag 'x86_sev_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 SEV updates from Borislav Petkov:

 - A segmented Reverse Map table (RMP) is a across-nodes distributed
   table of sorts which contains per-node descriptors of each node-local
   4K page, denoting its ownership (hypervisor, guest, etc) in the realm
   of confidential computing. Add support for such a table in order to
   improve referential locality when accessing or modifying RMP table
   entries

 - Add support for reading the TSC in SNP guests by removing any
   interference or influence the hypervisor might have, with the goal of
   making a confidential guest even more independent from the hypervisor

* tag 'x86_sev_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sev: Add the Secure TSC feature for SNP guests
  x86/tsc: Init the TSC for Secure TSC guests
  x86/sev: Mark the TSC in a secure TSC guest as reliable
  x86/sev: Prevent RDTSC/RDTSCP interception for Secure TSC enabled guests
  x86/sev: Prevent GUEST_TSC_FREQ MSR interception for Secure TSC enabled guests
  x86/sev: Change TSC MSR behavior for Secure TSC enabled guests
  x86/sev: Add Secure TSC support for SNP guests
  x86/sev: Relocate SNP guest messaging routines to common code
  x86/sev: Carve out and export SNP guest messaging init routines
  virt: sev-guest: Replace GFP_KERNEL_ACCOUNT with GFP_KERNEL
  virt: sev-guest: Remove is_vmpck_empty() helper
  x86/sev/docs: Document the SNP Reverse Map Table (RMP)
  x86/sev: Add full support for a segmented RMP table
  x86/sev: Treat the contiguous RMP table as a single RMP segment
  x86/sev: Map only the RMP table entries instead of the full RMP range
  x86/sev: Move the SNP probe routine out of the way
  x86/sev: Require the RMPREAD instruction after Zen4
  x86/sev: Add support for the RMPREAD instruction
  x86/sev: Prepare for using the RMPREAD instruction to access the RMP
2025-01-21 09:00:31 -08:00
Linus Torvalds
254d763310 - A bunch of minor cleanups
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmeOWnYACgkQEsHwGGHe
 VUpEfQ/+MaFbHwEyewHWxGqhe9bOcoFsKUxQvK1jO99KbcPYjg1sEZ1DDeLydtea
 vwFoNNMnZDmKYA/+yAeshUIkcxyKYa3Iy6Tc8MXw1Yx1Z+AduKwDm47rvxcn47ig
 oQt1Zj0zMzON6aF7Xx7xlF5wDDBPMuRLKdKBSuggiio/IJz2G+m2CNI4KTUGAN9S
 aHE+TIE64MHftcabuoKdRtZ44FUZyb3BRvtpj3+G1BpRR6e/HFl9ghM3emvOv4BE
 e9QUvMFdVHnXQyib+JV+lOyu/2MpTVuhZqS51e8vSaNToJ0KfOc+j/Sl8uRCSciL
 qXALGIB5uEHlotKIVk9p9frGeUruc8H5YvdOUt7VEmSofuqbfFaEMSdhBfSxfq+S
 1LGBl3hXtVaspgQfzmPpEN5pMQAIMF6xlMiFeZSwBnUhb6n4QVlj38lodBbjct2j
 mzjBaYtu9NiVNi1H6dr1s/16Vl69tqAeRncGFdzvbJV0ZzY6KN/yFWy7g6PCPwHx
 gXdXFuOiwsoXQSsQn1l3nsxuXq1zp+TjUFrREPnmYv7t/GyEdieL5LfPnAwXB9x5
 vBAPDLa0c5XZRn8QyaeH1gZVX3HMi88cjXBEDI9auejkGr3e1VfdDSZkQpYMIK9b
 oIWwD2Wr2oU6uz5ye6M0cZpoZBVsbuluE5/kLJ70aj8QEJT3OsM=
 =xrfO
 -----END PGP SIGNATURE-----

Merge tag 'x86_microcode_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 microcode loader updates from Borislav Petkov:

 - A bunch of minor cleanups

* tag 'x86_microcode_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/AMD: Remove ret local var in early_apply_microcode()
  x86/microcode/AMD: Have __apply_microcode_amd() return bool
  x86/microcode/AMD: Make __verify_patch_size() return bool
  x86/microcode/AMD: Remove bogus comment from parse_container()
  x86/microcode/AMD: Return bool from find_blobs_in_containers()
2025-01-21 08:33:10 -08:00
Linus Torvalds
3357d1d1f9 - Extend resctrl with the capability of total memory bandwidth monitoring,
thus accomodating systems which support only total but not local memory
   bandwidth monitoring. Add the respective new mount options
 
 - The usual cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmeOU2QACgkQEsHwGGHe
 VUoqTw/+LOt/36tCMbqUEIhUIvPciqdhAk+gBzP9XzGEWRJDcYLmgmBvlrFcYgIL
 lZVZ/tBsYqljycMFaJ4K4vishpkJr9s8GtDCOItkMA4cYOJcXSaGSpNojctC2Qs6
 42J+qABENRZYWFmWWcCkf8jTG0QWebmVRJwV9Q/4uYRicLFW/B+Em0sOTFn9n7OX
 39ZyCXKlmu2wsTW/DrQ8wX0KkW5hevLkJk/NFN3CDuWg7LOs09IotALs/U7ayj+C
 BIU5ZEAvan8LAPLX4Nrhb5HArsP6eCmB0Kbr+Mm+X9lRBLSbMmThXy58WlRDT0v6
 LEx+IbKUnEVoYJnC2QiqxPB4FghhZs9RE6cDnCQivwkigFhIau9krPZUc2klnVtv
 AmjUx8BporU3rcvrtKycInQQCVdzi9Q8ai7WRrpCqNEItHSOtNk9BipnFGqcnFDr
 va0SCZ3N1vIiVnZkTZ0CObA2F8RRLMiBCAB9XNfD0TzY+K0p2KCNqS6UnxmrOtR+
 8Fbk+C624u7LhmjbrPr7Hj30wIu2b/CqncIDHrh8O+jK2awvTEl6T0Tryd+jXxR1
 ERy0OSUhHEcnW73ckdfdn3/6fsL/9bnBZrTufRWxPzaM+3E9YR9KGmyc7hf5fkSX
 WE/ZRMse4QwELmSajdv9NK7oO48VDOije+cgUzPgme0i4Lsg4c0=
 =i4Yw
 -----END PGP SIGNATURE-----

Merge tag 'x86_cache_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 resource control updates from Borislav Petkov:

 - Extend resctrl with the capability of total memory bandwidth
   monitoring, thus accomodating systems which support only total but
   not local memory bandwidth monitoring. Add the respective new mount
   options

 - The usual cleanups

* tag 'x86_cache_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/resctrl: Document the new "mba_MBps_event" file
  x86/resctrl: Add write option to "mba_MBps_event" file
  x86/resctrl: Add "mba_MBps_event" file to CTRL_MON directories
  x86/resctrl: Make mba_sc use total bandwidth if local is not supported
  x86/resctrl: Compute memory bandwidth for all supported events
  x86/resctrl: Modify update_mba_bw() to use per CTRL_MON group event
  x86/resctrl: Prepare for per-CTRL_MON group mba_MBps control
  x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags
  x86/resctrl: Use kthread_run_on_cpu()
2025-01-21 08:31:04 -08:00
Linus Torvalds
d80825ee4a - Add support for AMD hardware which is not affected by SRSO on the
user/kernel attack vector and advertise it to guest userspace
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmeOTMwACgkQEsHwGGHe
 VUoMKhAAjMp7tYNmh8687oz8A7ujXDYvbaIh8d3zRnOKq2cEpsGKSOgkw50tbs/I
 LE5o5k2NJ6evIYEkqZZH0WvksealwzoTY1LWGqHj2zotbyP6ypZn+GKORH+MsNNL
 fUaoj6DLELqPbLrr48GJG2uabtwmPOgiElZ6bqKrFnGDPI2LSLkrY7fugM3aU4h7
 VXDUAz2N2kIRKXFedVTArZtYiVO+O4/fM1VxjIRv/KrQt0lTatsjUYc6jei/7Rqa
 xPCmw6WsYfPPY8FjsgR3oaGfUQPzs8nv96Vh9lnIFw5/ajkDbwtvRuPEwSYe9MBZ
 mE+oOqdPz4of12Mv++/BkQL/tKuVPG/e38aeZUQPo/hj2LOWdUdwdAuZuslfrqaA
 9xKZgslhPBKr0yRAku60hRpbqnp07cEHuM6JMpmFoDqN1ESnWlDapWKQj+jOpGyz
 /w0Gp00R03TVhF9QTV7KUyj/U1ykhWG+4q843G5acrgh0geWzy+fYL+jPHgtBbWp
 E+NFKmnCg9YNbTiB6y9xIcEU9siq6iMXyhp3iv0qlpwhF5WueCvc3BiUwavgpoM6
 IpVqrrJspLy6/K7tMKNVKDCIkbHvJ6vKxSM9o3yzqMTL7B3ISlG9o3MSTKQVjytR
 qEnIQAwwfsWfmeWGEDun+hh83b+HsZ+tyLyrFNleGoe4yJosZtc=
 =bWI/
 -----END PGP SIGNATURE-----

Merge tag 'x86_bugs_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 CPU speculation update from Borislav Petkov:

 - Add support for AMD hardware which is not affected by SRSO on the
   user/kernel attack vector and advertise it to guest userspace

* tag 'x86_bugs_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  KVM: x86: Advertise SRSO_USER_KERNEL_NO to userspace
  x86/bugs: Add SRSO_USER_KERNEL_NO support
2025-01-21 08:22:40 -08:00
Nuno Das Neves
ef5a3c92a8 hyperv: Switch from hyperv-tlfs.h to hyperv/hvhdk.h
Switch to using hvhdk.h everywhere in the kernel. This header
includes all the new Hyper-V headers in include/hyperv, which form a
superset of the definitions found in hyperv-tlfs.h.

This makes it easier to add new Hyper-V interfaces without being
restricted to those in the TLFS doc (reflected in hyperv-tlfs.h).

To be more consistent with the original Hyper-V code, the names of
some definitions are changed slightly. Update those where needed.

Update comments in mshyperv.h files to point to include/hyperv for
adding new definitions.

Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com>
Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com>
Link: https://lore.kernel.org/r/1732577084-2122-5-git-send-email-nunodasneves@linux.microsoft.com
Link: https://lore.kernel.org/r/20250108222138.1623703-3-romank@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-01-10 00:54:21 +00:00
Yazen Ghannam
d35fb3121a x86/mce/amd: Remove shared threshold bank plumbing
Legacy AMD systems include an integrated Northbridge that is represented
by MCA bank 4. This is the only non-core MCA bank in legacy systems. The
Northbridge is physically shared by all the CPUs within an AMD "Node".

However, in practice the "shared" MCA bank can only by managed by a
single CPU within that AMD Node. This is known as the "Node Base Core"
(NBC). For example, only the NBC will be able to read the MCA bank 4
registers; they will be Read-as-Zero for other CPUs. Also, the MCA
Thresholding interrupt will only signal the NBC; the other CPUs will not
receive it. This is enforced by hardware, and it should not be managed by
software.

The current AMD Thresholding code attempts to deal with the "shared" MCA
bank by micromanaging the bank's sysfs kobjects. However, this does not
follow the intended kobject use cases. It is also fragile, and it has
caused bugs in the past.

Modern AMD systems do not need this shared MCA bank support, and it
should not be needed on legacy systems either.

Remove the shared threshold bank code. Also, move the threshold struct
definitions to mce/amd.c, since they are no longer needed in amd_nb.c.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20241206161210.163701-2-yazen.ghannam@amd.com
2025-01-03 19:05:35 +01:00
Borislav Petkov (AMD)
ead0db14c7 x86/microcode/AMD: Remove ret local var in early_apply_microcode()
No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-12-31 14:03:41 +01:00
Borislav Petkov (AMD)
78e0aadbd4 x86/microcode/AMD: Have __apply_microcode_amd() return bool
This is the natural thing to do anyway.

No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-12-31 14:03:39 +01:00
Nikolay Borisov
d8317f3d8e x86/microcode/AMD: Make __verify_patch_size() return bool
The result of that function is in essence boolean, so simplify to return the
result of the relevant expression. It also makes it follow the convention used
by __verify_patch_section().

No functional changes.

Signed-off-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20241018155151.702350-3-nik.borisov@suse.com
2024-12-31 14:03:37 +01:00
Nikolay Borisov
db80b2efa0 x86/microcode/AMD: Remove bogus comment from parse_container()
The function doesn't return an equivalence ID, remove the false comment.

Signed-off-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20241018155151.702350-4-nik.borisov@suse.com
2024-12-31 14:03:33 +01:00
Nikolay Borisov
a85c08aaa6 x86/microcode/AMD: Return bool from find_blobs_in_containers()
Instead of open-coding the check for size/data move it inside the
function and make it return a boolean indicating whether data was found
or not.

No functional changes.

  [ bp: Write @ret in find_blobs_in_containers() only on success. ]

Signed-off-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20241018155151.702350-2-nik.borisov@suse.com
2024-12-31 14:03:30 +01:00
Qiuxu Zhuo
053d18057e x86/mce: Remove the redundant mce_hygon_feature_init()
Get HYGON to directly call mce_amd_feature_init() and remove the redundant
mce_hygon_feature_init().

Suggested-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: https://lore.kernel.org/r/20241212140103.66964-7-qiuxu.zhuo@intel.com
2024-12-31 11:12:45 +01:00
Qiuxu Zhuo
359d7a98e3 x86/mce: Convert family/model mixed checks to VFM-based checks
Convert family/model mixed checks to VFM-based checks to make the code
more compact. Simplify.

  [ bp: Drop the "what" from the commit message - it should be visible from
    the diff alone. ]

Suggested-by: Sohil Mehta <sohil.mehta@intel.com>
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: https://lore.kernel.org/r/20241212140103.66964-6-qiuxu.zhuo@intel.com
2024-12-31 11:11:08 +01:00
Tony Luck
51a12c28bb x86/mce: Break up __mcheck_cpu_apply_quirks()
Split each vendor specific part into its own helper function.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Tested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://lore.kernel.org/r/20241212140103.66964-5-qiuxu.zhuo@intel.com
2024-12-31 11:07:05 +01:00
Qiuxu Zhuo
c46945c9ca x86/mce: Make four functions return bool
Make those functions whose callers only care about success or failure return
a boolean value for better readability. Also, update the call sites
accordingly as the polarities of all the return values have been flipped.

No functional changes.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: https://lore.kernel.org/r/20241212140103.66964-4-qiuxu.zhuo@intel.com
2024-12-30 22:06:36 +01:00
Qiuxu Zhuo
64a668fbea x86/mce/threshold: Remove the redundant this_cpu_dec_return()
The 'storm' variable points to this_cpu_ptr(&storm_desc). Access the
'stormy_bank_count' field through the 'storm' to avoid calling
this_cpu_*() on the same per-CPU variable twice.

This minor optimization reduces the text size by 16 bytes.

  $ size threshold.o.*
     text	   data	    bss	    dec	    hex	filename
     1395	   1664	      0	   3059	    bf3	threshold.o.old
     1379	   1664	      0	   3043	    be3	threshold.o.new

No functional changes intended.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: https://lore.kernel.org/r/20241212140103.66964-3-qiuxu.zhuo@intel.com
2024-12-30 19:45:03 +01:00
Qiuxu Zhuo
c845cb8dbd x86/mce: Make several functions return bool
Make several functions that return 0 or 1 return a boolean value for
better readability.

No functional changes are intended.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: https://lore.kernel.org/r/20241212140103.66964-2-qiuxu.zhuo@intel.com
2024-12-30 19:05:50 +01:00
Borislav Petkov (AMD)
877818802c x86/bugs: Add SRSO_USER_KERNEL_NO support
If the machine has:

  CPUID Fn8000_0021_EAX[30] (SRSO_USER_KERNEL_NO) -- If this bit is 1,
  it indicates the CPU is not subject to the SRSO vulnerability across
  user/kernel boundaries.

have it fall back to IBPB on VMEXIT only, in the case it is going to run
VMs:

  Speculative Return Stack Overflow: Mitigation: IBPB on VMEXIT only

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: https://lore.kernel.org/r/20241202120416.6054-2-bp@kernel.org
2024-12-30 17:48:33 +01:00
Linus Torvalds
37cb0c76ac hyperv-fixes for v6.13-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmdhxFsTHHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXmGuB/0ZsXKBh9eCZnmVSZwdVkcSeE/HEHHc
 K5qeIxjqirAzajvLbAtgnTQpi5w0kLYYOT/X/8b6L1nnefcfk1cz5cbyQKxXx7J6
 Zuob5KpjX7FtJbMc7QQtzrLyApw2k2OYe1QPCRxuWkYzQjaus/6kM27ivjjYFqDs
 Qv6IKCcebomAbJTN8IwF38KsFQ2JS3moWr+kcVyna7+Kg1ymEto2QlnI01Z+M1n7
 BUYvLgYUD/5HzbNjqC0HysUuX5PsVQ45OsmTgjkvs/XzLcAGGEHJGrJHJkEccx7H
 AsiNzHoB6wUrFhCAX7IPCSE0g9vjtwI21ozqYllzOTa/Q5KnUE+q9fhJ
 =Fw3U
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-fixes-signed-20241217' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyperv fixes from Wei Liu:

 - Various fixes to Hyper-V tools in the kernel tree (Dexuan Cui, Olaf
   Hering, Vitaly Kuznetsov)

 - Fix a bug in the Hyper-V TSC page based sched_clock() (Naman Jain)

 - Two bug fixes in the Hyper-V utility functions (Michael Kelley)

 - Convert open-coded timeouts to secs_to_jiffies() in Hyper-V drivers
   (Easwar Hariharan)

* tag 'hyperv-fixes-signed-20241217' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  tools/hv: reduce resource usage in hv_kvp_daemon
  tools/hv: add a .gitignore file
  tools/hv: reduce resouce usage in hv_get_dns_info helper
  hv/hv_kvp_daemon: Pass NIC name to hv_get_dns_info as well
  Drivers: hv: util: Avoid accessing a ringbuffer not initialized yet
  Drivers: hv: util: Don't force error code to ENODEV in util_probe()
  tools/hv: terminate fcopy daemon if read from uio fails
  drivers: hv: Convert open-coded timeouts to secs_to_jiffies()
  tools: hv: change permissions of NetworkManager configuration file
  x86/hyperv: Fix hv tsc page based sched_clock for hibernation
  tools: hv: Fix a complier warning in the fcopy uio daemon
2024-12-18 09:55:55 -08:00
Dave Hansen
e5d3a57891 x86/cpu: Make all all CPUID leaf names consistent
The leaf names are not consistent.  Give them all a CPUID_LEAF_ prefix
for consistency and vertical alignment.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Dave Jiang <dave.jiang@intel.com> # for ioatdma bits
Link: https://lore.kernel.org/all/20241213205040.7B0C3241%40davehans-spike.ostc.intel.com
2024-12-18 06:17:46 -08:00
Dave Hansen
754aaac3bb x86/fpu: Move CPUID leaf definitions to common code
Move the XSAVE-related CPUID leaf definitions to common code.  Then,
use the new definition to remove the last magic number from the CPUID
level dependency table.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/all/20241213205037.43C57CDE%40davehans-spike.ostc.intel.com
2024-12-18 06:17:42 -08:00
Dave Hansen
5d82d8e0a9 x86/cpu: Refresh DCA leaf reading code
The DCA leaf number is also hard-coded in the CPUID level dependency
table. Move its definition to common code and use it.

While at it, fix up the naming and types in the probe code.  All
CPUID data is provided in 32-bit registers, not 'unsigned long'.
Also stop referring to "level_9".  Move away from test_bit()
because the type is no longer an 'unsigned long'.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/all/20241213205032.476A30FE%40davehans-spike.ostc.intel.com
2024-12-18 06:17:34 -08:00
Dave Hansen
8bd6821c9c x86/cpu: Use MWAIT leaf definition
The leaf-to-feature dependency array uses hard-coded leaf numbers.
Use the new common header definition for the MWAIT leaf.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/all/20241213205029.5B055D6E%40davehans-spike.ostc.intel.com
2024-12-18 06:17:28 -08:00
Dave Hansen
5366d8965d x86/cpu: Remove 'x86_cpu_desc' infrastructure
All the users of 'x86_cpu_desc' are gone.  Zap it from the tree.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20241213185133.AF0BF2BC%40davehans-spike.ostc.intel.com
2024-12-18 06:09:12 -08:00
Dave Hansen
f3f3251526 x86/cpu: Move AMD erratum 1386 table over to 'x86_cpu_id'
The AMD erratum 1386 detection code uses and old style 'x86_cpu_desc'
table. Replace it with 'x86_cpu_id' so the old style can be removed.

I did not create a new helper macro here. The new table is certainly
more noisy than the old and it can be improved on. But I was hesitant
to create a new macro just for a single site that is only two ugly
lines in the end.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20241213185132.07555E1D%40davehans-spike.ostc.intel.com
2024-12-18 06:09:04 -08:00
Dave Hansen
85b08180df x86/cpu: Expose only stepping min/max interface
The x86_match_cpu() infrastructure can match CPU steppings. Since
there are only 16 possible steppings, the matching infrastructure goes
all out and stores the stepping match as a bitmap. That means it can
match any possible steppings in a single list entry. Fun.

But it exposes this bitmap to each of the X86_MATCH_*() helpers when
none of them really need a bitmap. It makes up for this by exporting a
helper (X86_STEPPINGS()) which converts a contiguous stepping range
into the bitmap which every single user leverages.

Instead of a bitmap, have the main helper for this sort of thing
(X86_MATCH_VFM_STEPS()) just take a stepping range. This ends up
actually being even more compact than before.

Leave the helper in place (renamed to __X86_STEPPINGS()) to make it
more clear what is going on instead of just having a random GENMASK()
in the middle of an already complicated macro.

One oddity that I hit was this macro:

       X86_MATCH_VFM_STEPS(vfm, X86_STEPPING_MIN, max_stepping, issues)

It *could* have been converted over to take a min/max stepping value
for each entry. But that would have been a bit too verbose and would
prevent the one oddball in the list (INTEL_COMETLAKE_L stepping 0)
from sticking out.

Instead, just have it take a *maximum* stepping and imply that the match
is from 0=>max_stepping. This is functional for all the cases now and
also retains the nice property of having INTEL_COMETLAKE_L stepping 0
stick out like a sore thumb.

skx_cpuids[] is goofy. It uses the stepping match but encodes all
possible steppings. Just use a normal, non-stepping match helper.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20241213185129.65527B2A%40davehans-spike.ostc.intel.com
2024-12-17 16:14:49 -08:00
Dave Hansen
b8e10c86e6 x86/cpu: Introduce new microcode matching helper
The 'x86_cpu_id' and 'x86_cpu_desc' structures are very similar and
need to be consolidated.  There is a microcode version matching
function for 'x86_cpu_desc' but not 'x86_cpu_id'.

Create one for 'x86_cpu_id'.

This essentially just leverages the x86_cpu_id->driver_data field
to replace the less generic x86_cpu_desc->x86_microcode_rev field.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20241213185128.8F24EEFC%40davehans-spike.ostc.intel.com
2024-12-17 16:14:39 -08:00
Tom Lendacky
4972808d6f x86/sev: Require the RMPREAD instruction after Zen4
Limit usage of the non-architectural RMP format to Zen3/Zen4 processors.
The RMPREAD instruction, with architectural defined output, is available
and should be used for RMP access beyond Zen4.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikunj A Dadhania <nikunj@amd.com>
Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Reviewed-by: Ashish Kalra <ashish.kalra@amd.com>
Link: https://lore.kernel.org/r/5be0093e091778a151266ea853352f62f838eb99.1733172653.git.thomas.lendacky@amd.com
2024-12-14 10:55:28 +01:00
Juergen Gross
efbcd61d9b x86: make get_cpu_vendor() accessible from Xen code
In order to be able to differentiate between AMD and Intel based
systems for very early hypercalls without having to rely on the Xen
hypercall page, make get_cpu_vendor() non-static.

Refactor early_cpu_init() for the same reason by splitting out the
loop initializing cpu_devs() into an externally callable function.

This is part of XSA-466 / CVE-2024-53241.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2024-12-13 09:28:10 +01:00
Tony Luck
8e931105ac x86/resctrl: Add write option to "mba_MBps_event" file
The "mba_MBps" mount option provides an alternate method to control memory
bandwidth. Instead of specifying allowable bandwidth as a percentage of
maximum possible, the user provides a MiB/s limit value.

There is a file in each CTRL_MON group directory that shows the event
currently in use.

Allow writing that file to choose a different event.

A user can choose any of the memory bandwidth monitoring events listed in
/sys/fs/resctrl/info/L3_mon/mon_features independently for each CTRL_MON group
by writing to each of the "mba_MBps_event" files.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20241206163148.83828-8-tony.luck@intel.com
2024-12-12 11:27:37 +01:00
Tony Luck
f5cd0e316f x86/resctrl: Add "mba_MBps_event" file to CTRL_MON directories
The "mba_MBps" mount option provides an alternate method to control memory
bandwidth. Instead of specifying allowable bandwidth as a percentage of
maximum possible, the user provides a MiB/s limit value.

In preparation to allow the user to pick the memory bandwidth monitoring event
used as input to the feedback loop, provide a file in each CTRL_MON group
directory that shows the event currently in use. Note that this file is only
visible when the "mba_MBps" mount option is in use.

Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20241206163148.83828-7-tony.luck@intel.com
2024-12-12 11:27:28 +01:00
Raag Jadav
3560a023a9 x86/cpu: Fix typo in x86_match_cpu()'s doc
Fix typo in x86_match_cpu()'s description.

  [ bp: Massage commit message. ]

Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20241030065804.407793-1-raag.jadav@intel.com
2024-12-10 21:40:02 +01:00
Ingo Molnar
05453d36a2 Merge branch 'linus' into x86/cleanups, to resolve conflict
These two commits interact:

 upstream:     73da582a47 ("x86/cpu/topology: Remove limit of CPUs due to disabled IO/APIC")
 x86/cleanups: 13148e22c1 ("x86/apic: Remove "disablelapic" cmdline option")

Resolve it.

 Conflicts:
	arch/x86/kernel/cpu/topology.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-12-10 19:33:03 +01:00
Borislav Petkov (AMD)
13148e22c1 x86/apic: Remove "disablelapic" cmdline option
The convention is "no<something>" and there already is "nolapic". Drop
the disable one.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20241202190011.11979-2-bp@kernel.org
2024-12-10 18:55:08 +01:00
Tony Luck
141cb5c482 x86/resctrl: Make mba_sc use total bandwidth if local is not supported
The default input measurement to the mba_sc feedback loop for memory bandwidth
control when the user mounts with the "mba_MBps" option is the local bandwidth
event. But some systems may not support a local bandwidth event.

When local bandwidth event is not supported, check for support of total
bandwidth and use that instead.

Relax the mount option check to allow use of the "mba_MBps" option for systems
when only total bandwidth monitoring is supported. Also update the error
message.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20241206163148.83828-6-tony.luck@intel.com
2024-12-10 16:15:14 +01:00
Tony Luck
2c272fadb5 x86/resctrl: Compute memory bandwidth for all supported events
Switching between local and total memory bandwidth events as the input
to the mba_sc feedback loop would be cumbersome and take effect slowly
in the current implementation as the bandwidth is only known after two
consecutive readings of the same event.

Compute the bandwidth for all supported events. This doesn't add
significant overhead and will make changing which event is used
simple.

Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20241206163148.83828-5-tony.luck@intel.com
2024-12-10 16:13:48 +01:00
Tony Luck
481d363748 x86/resctrl: Modify update_mba_bw() to use per CTRL_MON group event
update_mba_bw() hard codes use of the memory bandwidth local event which
prevents more flexible options from being deployed.

Change this function to use the event specified in the rdtgroup that is
being processed.

Mount time checks for the "mba_MBps" option ensure that local memory
bandwidth is enabled. So drop the redundant is_mbm_local_enabled() check.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20241206163148.83828-4-tony.luck@intel.com
2024-12-10 11:15:19 +01:00
Tony Luck
3b49c37a2f x86/resctrl: Prepare for per-CTRL_MON group mba_MBps control
Resctrl uses local memory bandwidth event as input to the feedback loop when
the mba_MBps mount option is used. This means that this mount option cannot be
used on systems that only support monitoring of total bandwidth.

Prepare to allow users to choose the input event independently for each
CTRL_MON group by adding a global variable "mba_mbps_default_event" used to
set the default event for each CTRL_MON group, and a new field
"mba_mbps_event" in struct rdtgroup to track which event is used for each
CTRL_MON group.

Notes:

1) Both of these are only used when the user mounts the filesystem with the
   "mba_MBps" option.
2) Only check for support of local bandwidth event when initializing
   mba_mbps_default_event. Support for total bandwidth event can be added
   after other routines in resctrl have been updated to handle total bandwidth
   event.

  [ bp: Move mba_mbps_default_event extern into the arch header. ]

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20241206163148.83828-3-tony.luck@intel.com
2024-12-10 11:13:48 +01:00
Babu Moger
2937f9c361 x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags
thread_throttle_mode_init() and mbm_config_rftype_init() both initialize
fflags for resctrl files.

Adding new files will involve adding another function to initialize
the fflags. This can be simplified by adding a new function
resctrl_file_fflags_init() and passing the file name and flags
to be initialized.

Consolidate fflags initialization into resctrl_file_fflags_init() and
remove thread_throttle_mode_init() and mbm_config_rftype_init().

  [ Tony: Drop __init attribute so resctrl_file_fflags_init() can be used at
    run time. ]

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20241206163148.83828-2-tony.luck@intel.com
2024-12-09 21:37:01 +01:00
Frederic Weisbecker
135eef38d7 x86/resctrl: Use kthread_run_on_cpu()
Use the proper API instead of open coding it.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20240807160228.26206-3-frederic@kernel.org
2024-12-09 20:19:48 +01:00