2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

86 Commits

Author SHA1 Message Date
Dan Williams
9d948b8804 Merge branch 'for-6.16/tsm-mr' into tsm-next
Pick up a couple fixes for issues noticed in linux-next (constification
of bin_attrs and missing 'static').
2025-05-13 11:28:25 -07:00
Cedric Xing
b0ca403a9e tsm-mr: Fix init breakage after bin_attrs constification by scoping non-const pointers to init phase
Commit 9bec944506 ("sysfs: constify attribute_group::bin_attrs") enforced
the ro-after-init principle by making elements of bin_attrs_new pointing to
const.

To align with this change, introduce a temporary variable `bap` within the
initialization loop. This improves code clarity by explicitly marking the
initialization scope and eliminates the need for type casts when assigning
to bin_attrs_new.

Signed-off-by: Cedric Xing <cedric.xing@intel.com>
Link: https://patch.msgid.link/20250513164154.10109-1-cedric.xing@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2025-05-13 11:02:33 -07:00
Dan Williams
15ff5d0e90 Merge branch 'for-6.16/tsm-mr' into tsm-next
Merge measurement-register infrastructure for v6.16. Resolve conflicts
with the establishment of drivers/virt/coco/guest/ for cross-vendor
common TSM functionality.

Address a mis-merge with a fixup from Lukas:

Link: http://lore.kernel.org/20250509134031.70559-1-lukas.bulwahn@redhat.com
2025-05-12 22:12:44 -07:00
Cedric Xing
7c3f259dfe virt: tdx-guest: Transition to scoped_cond_guard for mutex operations
Replace mutex_lock_interruptible()/mutex_unlock() with scoped_cond_guard to
enhance code readability and maintainability.

Signed-off-by: Cedric Xing <cedric.xing@intel.com>
Acked-by: Dionna Amalie Glaze <dionnaglaze@google.com>
Link: https://patch.msgid.link/20250506-tdx-rtmr-v6-7-ac6ff5e9d58a@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2025-05-08 19:17:43 -07:00
Cedric Xing
850972bc61 virt: tdx-guest: Refactor and streamline TDREPORT generation
Consolidate instances (code segments) of TDREPORT generation to improve
readability and maintainability, by refactoring each instance into invoking
a unified subroutine throughout the TDX guest driver. Implement proper
locking around TDG.MR.REPORT and TDG.MR.RTMR.EXTEND to avoid race inside
the TDX module. Preallocate TDREPORT buffer to reduce overhead in
subsequent TDREPORT generation.

Signed-off-by: Cedric Xing <cedric.xing@intel.com>
Acked-by: Dionna Amalie Glaze <dionnaglaze@google.com>
Link: https://patch.msgid.link/20250506-tdx-rtmr-v6-6-ac6ff5e9d58a@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2025-05-08 19:17:43 -07:00
Cedric Xing
4d2a7bfad5 virt: tdx-guest: Expose TDX MRs as sysfs attributes
Expose the most commonly used TDX MRs (Measurement Registers) as sysfs
attributes. Use the ioctl() interface of /dev/tdx_guest to request a full
TDREPORT for access to other TD measurements.

Directory structure of TDX MRs inside a TDVM is as follows:

/sys/class/misc/tdx_guest
└── measurements
    ├── mrconfigid
    ├── mrowner
    ├── mrownerconfig
    ├── mrtd:sha384
    ├── rtmr0:sha384
    ├── rtmr1:sha384
    ├── rtmr2:sha384
    └── rtmr3:sha384

Read the file/attribute to retrieve the current value of an MR. Write to
the file/attribute (if writable) to extend the corresponding RTMR. Refer to
Documentation/ABI/testing/sysfs-devices-virtual-misc-tdx_guest for more
information.

Signed-off-by: Cedric Xing <cedric.xing@intel.com>
Acked-by: Dionna Amalie Glaze <dionnaglaze@google.com>
[djbw: fixup exit order]
Link: https://patch.msgid.link/20250508010606.4129953-1-dan.j.williams@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2025-05-08 19:17:43 -07:00
Cedric Xing
b9e22b35d4 tsm-mr: Add TVM Measurement Register support
Introduce new TSM Measurement helper library (tsm-mr) for TVM guest drivers
to expose MRs (Measurement Registers) as sysfs attributes, with Crypto
Agility support.

Add the following new APIs (see include/linux/tsm-mr.h for details):

- tsm_mr_create_attribute_group(): Take on input a `struct
  tsm_measurements` instance, which includes one `struct
  tsm_measurement_register` per MR with properties like `TSM_MR_F_READABLE`
  and `TSM_MR_F_WRITABLE`, to determine the supported operations and create
  the sysfs attributes accordingly. On success, return a `struct
  attribute_group` instance that will typically be included by the guest
  driver into `miscdevice.groups` before calling misc_register().

- tsm_mr_free_attribute_group(): Free the memory allocated to the attrubute
  group returned by tsm_mr_create_attribute_group().

tsm_mr_create_attribute_group() creates one attribute for each MR, with
names following this pattern:

        MRNAME[:HASH]

- MRNAME - Placeholder for the MR name, as specified by
  `tsm_measurement_register.mr_name`.
- :HASH - Optional suffix indicating the hash algorithm associated with
  this MR, as specified by `tsm_measurement_register.mr_hash`.

Support Crypto Agility by allowing multiple definitions of the same MR
(i.e., with the same `mr_name`) with distinct HASH algorithms.

NOTE: Crypto Agility, introduced in TPM 2.0, allows new hash algorithms to
be introduced without breaking compatibility with applications using older
algorithms. CC architectures may face the same challenge in the future,
needing new hashes for security while retaining compatibility with older
hashes, hence the need for Crypto Agility.

Signed-off-by: Cedric Xing <cedric.xing@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Dionna Amalie Glaze <dionnaglaze@google.com>
[djbw: fixup bin_attr const conflict]
Link: https://patch.msgid.link/20250509020739.882913-1-dan.j.williams@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2025-05-08 19:17:33 -07:00
Dan Williams
a0372b3831 Merge branch 'for-6.16/tsm' into tsm-next
Pick up the drivers/virt/coco/guest/ split in preparation for TSM host
drivers.
2025-05-08 18:12:06 -07:00
Dan Williams
fba4ceaa24 configfs-tsm-report: Fix NULL dereference of tsm_ops
Unlike sysfs, the lifetime of configfs objects is controlled by
userspace. There is no mechanism for the kernel to find and delete all
created config-items. Instead, the configfs-tsm-report mechanism has an
expectation that tsm_unregister() can happen at any time and cause
established config-item access to start failing.

That expectation is not fully satisfied. While tsm_report_read(),
tsm_report_{is,is_bin}_visible(), and tsm_report_make_item() safely fail
if tsm_ops have been unregistered, tsm_report_privlevel_store()
tsm_report_provider_show() fail to check for ops registration. Add the
missing checks for tsm_ops having been removed.

Now, in supporting the ability for tsm_unregister() to always succeed,
it leaves the problem of what to do with lingering config-items. The
expectation is that the admin that arranges for the ->remove() (unbind)
of the ${tsm_arch}-guest driver is also responsible for deletion of all
open config-items. Until that deletion happens, ->probe() (reload /
bind) of the ${tsm_arch}-guest driver fails.

This allows for emergency shutdown / revocation of attestation
interfaces, and requires coordinated restart.

Fixes: 70e6f7e2b9 ("configfs-tsm: Introduce a shared ABI for attestation reports")
Cc: stable@vger.kernel.org
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Steven Price <steven.price@arm.com>
Cc: Sami Mujawar <sami.mujawar@arm.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reported-by: Cedric Xing <cedric.xing@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20250430203331.1177062-1-dan.j.williams@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2025-05-07 16:48:37 -07:00
Dan Williams
7515f45c16 coco/guest: Move shared guest CC infrastructure to drivers/virt/coco/guest/
In preparation for creating a new drivers/virt/coco/host/ directory to
house shared host driver infrastructure for confidential computing, move
configfs-tsm to a guest/ sub-directory. The tsm.ko module is renamed to
tsm_reports.ko. The old tsm.ko module was only ever demand loaded by
kernel internal dependencies, so it should not affect existing userspace
module install scripts.

The new drivers/virt/coco/guest/ is also a preparatory landing spot for
new / optional TSM Report mechanics like a TCB stability enumeration /
watchdog mechanism. To be added later.

Cc: Wu Hao <hao.wu@intel.com>
Cc: Yilun Xu <yilun.xu@intel.com>
Cc: Samuel Ortiz <sameo@rivosinc.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/174107246641.1288555.208426916259466774.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2025-05-02 12:52:16 -07:00
Dan Williams
71ded61bee configfs-tsm: Namespace TSM report symbols
In preparation for new + common TSM (TEE Security Manager)
infrastructure, namespace the TSM report symbols in tsm.h with an
_REPORT suffix to differentiate them from other incoming tsm work.

Cc: Yilun Xu <yilun.xu@intel.com>
Cc: Samuel Ortiz <sameo@rivosinc.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Sami Mujawar <sami.mujawar@arm.com>
Cc: Steven Price <steven.price@arm.com>
Reviewed-by: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Link: https://patch.msgid.link/174107246021.1288555.7203769833791489618.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2025-05-02 12:52:16 -07:00
Ingo Molnar
89771319e0 Linux 6.14-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmfXVtUeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGN/sH/i5423Gt/z51gDjA
 s4v5Z7GaBJ9zOGBahn2RWFe72zytTqKrEJmMnGfguirs0atD1DtQj4WAP7iFKP+e
 WyO663X6HF7i5y37ja0Yd4PZc31hwtqzKH8LjBf8f8tTy8UsEVqumdi5A4sS9KTM
 qm4kTyyVEY9D/s7oRY8ywjDlRJtO6nT0aKMp4kAqNEbrNUYbilT/a0hgXcgSmPyB
 uIjmjL2fZfutxGI5LgfbaSHCa1ElmhvTvivOMpaAmZSGCRVHCKGgT0CTNnHyn/7C
 dB145JkRO4ZOUqirCdO4PE/23id3ajq9fcixJGBzAv7c45y+B3JZ1r2kAfKalE8/
 qrOKLys=
 =8r7a
 -----END PGP SIGNATURE-----

Merge tag 'v6.14-rc7' into x86/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-03-19 11:03:06 +01:00
Alexey Kardashevskiy
3e385c0d6c virt: sev-guest: Move SNP Guest Request data pages handling under snp_cmd_mutex
Compared to the SNP Guest Request, the "Extended" version adds data pages for
receiving certificates. If not enough pages provided, the HV can report to the
VM how much is needed so the VM can reallocate and repeat.

Commit

  ae596615d9 ("virt: sev-guest: Reduce the scope of SNP command mutex")

moved handling of the allocated/desired pages number out of scope of said
mutex and create a possibility for a race (multiple instances trying to
trigger Extended request in a VM) as there is just one instance of
snp_msg_desc per /dev/sev-guest and no locking other than snp_cmd_mutex.

Fix the issue by moving the data blob/size and the GHCB input struct
(snp_req_data) into snp_guest_req which is allocated on stack now and accessed
by the GHCB caller under that mutex.

Stop allocating SEV_FW_BLOB_MAX_SIZE in snp_msg_alloc() as only one of four
callers needs it. Free the received blob in get_ext_report() right after it is
copied to the userspace. Possible future users of snp_send_guest_request() are
likely to have different ideas about the buffer size anyways.

Fixes: ae596615d9 ("virt: sev-guest: Reduce the scope of SNP command mutex")
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikunj A Dadhania <nikunj@amd.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250307013700.437505-3-aik@amd.com
2025-03-07 14:09:33 +01:00
Nikunj A Dadhania
ac7c06acaa virt: sev-guest: Allocate request data dynamically
Commit

  ae596615d9 ("virt: sev-guest: Reduce the scope of SNP command mutex")

narrowed the command mutex scope to snp_send_guest_request().  However,
GET_REPORT, GET_DERIVED_KEY, and GET_EXT_REPORT share the req structure in
snp_guest_dev. Without the mutex protection, concurrent requests can overwrite
each other's data. Fix it by dynamically allocating the request structure.

Fixes: ae596615d9 ("virt: sev-guest: Reduce the scope of SNP command mutex")
Closes: https://github.com/AMDESE/AMDSEV/issues/265
Reported-by: andreas.stuehrk@yaxi.tech
Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250307013700.437505-2-aik@amd.com
2025-03-07 13:34:25 +01:00
Kevin Brodsky
95c4cc5a58 x86/mm: Reduce header dependencies in <asm/set_memory.h>
Commit:

  03b122da74 ("x86/sgx: Hook arch_memory_failure() into mainline code")

... added <linux/mm.h> to <asm/set_memory.h> to provide some helpers.

However the following commit:

  b3fdf9398a ("x86/mce: relocate set{clear}_mce_nospec() functions")

... moved the inline definitions someplace else, and now <asm/set_memory.h>
just declares a bunch of mostly self-contained functions.

No need for the whole <linux/mm.h> inclusion to declare functions; just
remove that include. This helps avoid circular dependency headaches
(e.g. if <linux/mm.h> ends up including <linux/set_memory.h>).

This change requires a couple of include fixups not to break the
build:

* <asm/smp.h>: including <asm/thread_info.h> directly relies on
  <linux/thread_info.h> having already been included, because the
  former needs the BAD_STACK/NOT_STACK constants defined in the
  latter. This is no longer the case when <asm/smp.h> is included from
  some driver file - just include <linux/thread_info.h> to stay out
  of trouble.

* sev-guest.c relies on <asm/set_memory.h> including <linux/mm.h>,
  so we just need to make that include explicit.

[ mingo: Cleaned up the changelog ]

Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20241212080904.2089632-3-kevin.brodsky@arm.com
2025-02-28 17:35:22 +01:00
Linus Torvalds
13b6931c44 - A segmented Reverse Map table (RMP) is a across-nodes distributed
table of sorts which contains per-node descriptors of each node-local
   4K page, denoting its ownership (hypervisor, guest, etc) in the realm
   of confidential computing.  Add support for such a table in order to
   improve referential locality when accessing or modifying RMP table
   entries
 
 - Add support for reading the TSC in SNP guests by removing any
   interference or influence the hypervisor might have, with the goal of
   making a confidential guest even more independent from the hypervisor
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmeOYLsACgkQEsHwGGHe
 VUrywg//WBuywe3+TNPwF0Iw8becqtD7lKMftmUoqpcf20JhiHSCexb+3/r7U2Kb
 WL1/T5cxX1rA45HzkwovUljlvin8B9bdpY40dUqrKFPMnWLfs4ru0HPA6UxPBsAq
 r/8XrXuRrI22MLbrAeQ2xSt8dqw3DpbJyUcyr0qOb6OsbtAy05uElYCzMSyzT06F
 QsTmenosuJqSo1gIGTxfU4nKyd1o8EJ5b1ThK11hvZaIOffgLjEU6g39cG9AeF4X
 TOkh9CdIlQc3ot14rJeWMy15YEW+xBdXdMEv0ZPOSZiKzTHA7wwdl0VmPm1EK57f
 BQkZikuoJezJA0r5wSwVgslTaYO0GTXNewwL5jxK1mqRgoK06IgC6xAkX8N7NTYL
 K6DX+tfaKjSJGY1z9TYOzs+wGV4MBAXmbLwnuhcPumkTYXPFbRFZqx6ec2BLIU+Y
 bZfwhlr3q+bfFeBYMzyWPHJ87JinOjwu4Ah0uLVmkoRtgb0S3pIdlyRYZAcEl6fn
 Tgfu0/RNLGGsH/a3BF7AQdt+hOv1ms5hEMYXg++30uC59LR8XbuKnLdUPRi0nVeD
 e9xyxFybu5ySesnnXabtaO9bSUF+8HV4nkclKglFvuHpLMQ5GlPxTnBj1V1podYR
 l12G2htXKsSV5JJK4x+WfYBe6Nn3tbcpgZD8M8g0lso8kejqMjs=
 =hh1m
 -----END PGP SIGNATURE-----

Merge tag 'x86_sev_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 SEV updates from Borislav Petkov:

 - A segmented Reverse Map table (RMP) is a across-nodes distributed
   table of sorts which contains per-node descriptors of each node-local
   4K page, denoting its ownership (hypervisor, guest, etc) in the realm
   of confidential computing. Add support for such a table in order to
   improve referential locality when accessing or modifying RMP table
   entries

 - Add support for reading the TSC in SNP guests by removing any
   interference or influence the hypervisor might have, with the goal of
   making a confidential guest even more independent from the hypervisor

* tag 'x86_sev_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sev: Add the Secure TSC feature for SNP guests
  x86/tsc: Init the TSC for Secure TSC guests
  x86/sev: Mark the TSC in a secure TSC guest as reliable
  x86/sev: Prevent RDTSC/RDTSCP interception for Secure TSC enabled guests
  x86/sev: Prevent GUEST_TSC_FREQ MSR interception for Secure TSC enabled guests
  x86/sev: Change TSC MSR behavior for Secure TSC enabled guests
  x86/sev: Add Secure TSC support for SNP guests
  x86/sev: Relocate SNP guest messaging routines to common code
  x86/sev: Carve out and export SNP guest messaging init routines
  virt: sev-guest: Replace GFP_KERNEL_ACCOUNT with GFP_KERNEL
  virt: sev-guest: Remove is_vmpck_empty() helper
  x86/sev/docs: Document the SNP Reverse Map Table (RMP)
  x86/sev: Add full support for a segmented RMP table
  x86/sev: Treat the contiguous RMP table as a single RMP segment
  x86/sev: Map only the RMP table entries instead of the full RMP range
  x86/sev: Move the SNP probe routine out of the way
  x86/sev: Require the RMPREAD instruction after Zen4
  x86/sev: Add support for the RMPREAD instruction
  x86/sev: Prepare for using the RMPREAD instruction to access the RMP
2025-01-21 09:00:31 -08:00
Linus Torvalds
9ad09c4f28 arm64 updates for 6.14
Confidential Computing:
 * Register a platform device when running in CCA realm mode to enable
   automatic loading of dependent modules.
 
 CPU Features:
 * Update a bunch of system register definitions to pick up new field
   encodings from the architectural documentation.
 
 * Add hwcaps and selftests for the new (2024) dpISA extensions.
 
 Documentation:
 * Update EL3 (firmware) requirements for booting Linux on modern arm64
   designs.
 
 * Remove stale information about the kernel virtual memory map.
 
 Miscellaneous:
 * Minor cleanups and typo fixes.
 
 Memory management:
 * Fix vmemmap_check_pmd() to look at the PMD type bits
 
 * LPA2 (52-bit physical addressing) cleanups and minor fixes.
 
 * Adjust physical address space depending upon whether or not LPA2 is
   enabled.
 
 Perf and PMUs:
 * Add port filtering support for NVIDIA's NVLINK-C2C Coresight PMU
 
 * Extend AXI filtering support for the DDR PMU on NXP IMX SoCs
 
 * Fix Designware PCIe PMU event numbering.
 
 * Add generic branch events for the Apple M1 CPU PMU.
 
 * Add support for Marvell Odyssey DDR and LLC-TAD PMUs.
 
 * Cleanups to the Hisilicon DDRC and Uncore PMU code.
 
 * Advertise discard mode for the SPE PMU.
 
 * Add the perf users mailing list to our MAINTAINERS entry.
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmeKZLcQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNEQzB/0X2U89ZiqxIkTPQvfFrjN/uUGybkq59rEL
 DfeoGukTgJIwc3GHWXXtQ//wuuYKdTeCXaIz5NFK3+7/wmKSLvjkexmue8pta6EY
 5rx9bAPr/D8lAUvhKIN2l3pF/ygoRwDz+nT2yVQ1xlZxYJWX7ZIsMj7W7ceb5kdx
 HRrTSQuhEEPREAWWO4oCMWl5SQZSrIflSE3Be/PsP0OhW6k//ZmWbcJTgUcHbKam
 o2WtNjITyGzxMpRCcrGEZKoe9YcwSxiut/PoD7JuoB4C/rbsf1cdJ6uLmtvGJcZj
 qsdRHhVfBzP1+ahONrDbiT3C2+s1UZySKdCDIxiYy6lB39wpP0dd
 =E7Mf
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:
 "We've got a little less than normal thanks to the holidays in
  December, but there's the usual summary below. The highlight is
  probably the 52-bit physical addressing (LPA2) clean-up from Ard.

  Confidential Computing:

   - Register a platform device when running in CCA realm mode to enable
     automatic loading of dependent modules

  CPU Features:

   - Update a bunch of system register definitions to pick up new field
     encodings from the architectural documentation

   - Add hwcaps and selftests for the new (2024) dpISA extensions

  Documentation:

   - Update EL3 (firmware) requirements for booting Linux on modern
     arm64 designs

   - Remove stale information about the kernel virtual memory map

  Miscellaneous:

   - Minor cleanups and typo fixes

  Memory management:

   - Fix vmemmap_check_pmd() to look at the PMD type bits

   - LPA2 (52-bit physical addressing) cleanups and minor fixes

   - Adjust physical address space depending upon whether or not LPA2 is
     enabled

  Perf and PMUs:

   - Add port filtering support for NVIDIA's NVLINK-C2C Coresight PMU

   - Extend AXI filtering support for the DDR PMU on NXP IMX SoCs

   - Fix Designware PCIe PMU event numbering

   - Add generic branch events for the Apple M1 CPU PMU

   - Add support for Marvell Odyssey DDR and LLC-TAD PMUs

   - Cleanups to the Hisilicon DDRC and Uncore PMU code

   - Advertise discard mode for the SPE PMU

   - Add the perf users mailing list to our MAINTAINERS entry"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (64 commits)
  Documentation: arm64: Remove stale and redundant virtual memory diagrams
  perf docs: arm_spe: Document new discard mode
  perf: arm_spe: Add format option for discard mode
  MAINTAINERS: Add perf list for drivers/perf/
  arm64: Remove duplicate included header
  drivers/perf: apple_m1: Map generic branch events
  arm64: rsi: Add automatic arm-cca-guest module loading
  kselftest/arm64: Add 2024 dpISA extensions to hwcap test
  KVM: arm64: Allow control of dpISA extensions in ID_AA64ISAR3_EL1
  arm64/hwcap: Describe 2024 dpISA extensions to userspace
  arm64/sysreg: Update ID_AA64SMFR0_EL1 to DDI0601 2024-12
  arm64: Filter out SVE hwcaps when FEAT_SVE isn't implemented
  drivers/perf: hisi: Set correct IRQ affinity for PMUs with no association
  arm64/sme: Move storage of reg_smidr to __cpuinfo_store_cpu()
  arm64: mm: Test for pmd_sect() in vmemmap_check_pmd()
  arm64/mm: Replace open encodings with PXD_TABLE_BIT
  arm64/mm: Rename pte_mkpresent() as pte_mkvalid()
  arm64/sysreg: Update ID_AA64ISAR2_EL1 to DDI0601 2024-09
  arm64/sysreg: Update ID_AA64ZFR0_EL1 to DDI0601 2024-09
  arm64/sysreg: Update ID_AA64FPFR0_EL1 to DDI0601 2024-09
  ...
2025-01-20 21:21:49 -08:00
Jeremy Linton
a1edec2245 arm64: rsi: Add automatic arm-cca-guest module loading
The TSM module provides guest identification and attestation when a
guest runs in CCA realm mode. By creating a dummy platform device,
let's ensure the module is automatically loaded. The udev daemon loads
the TSM module after it receives a device addition event. Once that
happens, it can be used earlier in the boot process to decrypt the
rootfs.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Link: https://lore.kernel.org/r/20241220181236.172060-2-jeremy.linton@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2025-01-08 13:58:49 +00:00
Nikunj A Dadhania
1e0b23b5d2 x86/sev: Relocate SNP guest messaging routines to common code
At present, the SEV guest driver exclusively handles SNP guest messaging. All
routines for sending guest messages are embedded within it.

To support Secure TSC, SEV-SNP guests must communicate with the AMD Security
Processor during early boot. However, these guest messaging functions are not
accessible during early boot since they are currently part of the guest
driver.

Hence, relocate the core SNP guest messaging functions to SEV common code and
provide an API for sending SNP guest messages.

No functional change, but just an export symbol added for
snp_send_guest_request() and dropped the export symbol on
snp_issue_guest_request() and made it static.

Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20250106124633.1418972-5-nikunj@amd.com
2025-01-07 11:16:46 +01:00
Nikunj A Dadhania
c5529418d0 x86/sev: Carve out and export SNP guest messaging init routines
Currently, the sev-guest driver is the only user of SNP guest messaging. All
routines for initializing SNP guest messaging are implemented within the
sev-guest driver and are not available during early boot.

In preparation for adding Secure TSC guest support, carve out APIs to allocate
and initialize the guest messaging descriptor context and make it part of
coco/sev/core.c. As there is no user of sev_guest_platform_data anymore,
remove the structure.

Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20250106124633.1418972-4-nikunj@amd.com
2025-01-07 10:33:20 +01:00
Nikunj A Dadhania
864884a0c2 virt: sev-guest: Replace GFP_KERNEL_ACCOUNT with GFP_KERNEL
Replace GFP_KERNEL_ACCOUNT with GFP_KERNEL in the sev-guest driver code.
GFP_KERNEL_ACCOUNT is typically used for accounting untrusted userspace
allocations. After auditing the sev-guest code, the following changes are
necessary:

  * snp_init_crypto(): Use GFP_KERNEL as this is a trusted device probe
    path.

Retain GFP_KERNEL_ACCOUNT in the following cases for robustness and
specific path requirements:

  * alloc_shared_pages(): Although all allocations are limited, retain
    GFP_KERNEL_ACCOUNT for future robustness.

  * get_report() and get_ext_report(): These functions are on the unlocked
    ioctl path and should continue using GFP_KERNEL_ACCOUNT.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20250106124633.1418972-3-nikunj@amd.com
2025-01-07 10:26:20 +01:00
Nikunj A Dadhania
8234177d20 virt: sev-guest: Remove is_vmpck_empty() helper
Remove is_vmpck_empty() which uses a local array allocation to check if the
VMPCK is empty and replace it with memchr_inv() to directly determine if the
VMPCK is empty without additional memory allocation.

  [ bp: Massage commit message. ]

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20250106124633.1418972-2-nikunj@amd.com
2025-01-07 08:57:19 +01:00
Li RongQing
27834971f6 virt: tdx-guest: Just leak decrypted memory on unrecoverable errors
In CoCo VMs it is possible for the untrusted host to cause
set_memory_decrypted() to fail such that an error is returned
and the resulting memory is shared. Callers need to take care
to handle these errors to avoid returning decrypted (shared)
memory to the page allocator, which could lead to functional
or security issues.

Leak the decrypted memory when set_memory_decrypted() fails,
and don't need to print an error since set_memory_decrypted()
will call WARN_ONCE().

Fixes: f4738f56d1 ("virt: tdx-guest: Add Quote generation support using TSM_REPORTS")
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20240619111801.25630-1-lirongqing%40baidu.com
2024-12-29 10:18:44 +01:00
Linus Torvalds
f3ddc438a2 arm64 fixes for 6.13-rc2:
- MTE/hugetlbfs:
 
   - Set VM_MTE_ALLOWED in the arch code and remove it from the core code
     for hugetlbfs mappings
 
   - Fix copy_highpage() warning when the source is a huge page but not
     MTE tagged, taking the wrong small page path
 
 - drivers/virt/coco:
 
   - Add the pKVM and Arm CCA drivers under the arm64 maintainership
 
   - Fix the pkvm driver to fall back to ioremap() (and warn) if the
     MMIO_GUARD hypercall fails
 
   - Keep the Arm CCA driver default 'n' rather than 'm'
 
 - A series of fixes for the arm64 ptrace() implementation, potentially
   leading to the kernel consuming uninitialised stack variables when
   PTRACE_SETREGSET is invoked with a length of 0
 
 - Fix zone_dma_limit calculation when RAM starts below 4GB and ZONE_DMA
   is capped to this limit
 
 - Fix early boot warning with CONFIG_DEBUG_VIRTUAL=y triggered by a call
   to page_to_phys() (from patch_map()) which checks pfn_valid() before
   vmemmap has been set up
 
 - Do not clobber bits 15:8 of the ASID used for TTBR1_EL1 and TLBI ops
   when the kernel assumes 8-bit ASIDs but running under a hypervisor on
   a system that implements 16-bit ASIDs (found running Linux under
   Parallels on Apple M4)
 
 - ACPI/IORT: Add PMCG platform information for HiSilicon HIP09A as it is
   using the same SMMU PMCG as HIP09 and suffers from the same errata
 
 - Add GCS to cpucap_is_possible(), missed in the recent merge
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmdTQW4ACgkQa9axLQDI
 XvGLUQ/+MEiCFytDsSIQsGMaCpRCcrNX3dzhgekjTSiS+iPRTGjhHPMxAgnKgtim
 U6MIdxItS5bvFKWQC/VmA3V+EtMy+9uwfQOy7MbG+wIzwlg48Pn2MjgmheSxhftO
 0x+lUB+5ELU9KxL0KV+WNCE5l/iBpzcSG+Uj3iqc5rPuYHxa8npekd/KVba42zGY
 QqZ75yCW5EQwyuSZve8SSMqyHNgZHNgwzhs0aRr3ZwccqE9eMKpcEv5wxbl6raGB
 Qr4HG+c3w4rQFBsj+9Zs/f5G45uZ+pM55aAVLSihhCdq51/oXXPajOWMP3tV6ke+
 hHXm4buxgIR2CWeCXp8n/H7S3OQIj4uFqmaFIGxv0+0OTemUBIEg8kAtqVcnxSXY
 hk00J5yMurDik1hhud21ZHaJaELwWAwpisVCjYBblUGOoH9uH062gb02CGWv3lSe
 hrzYohhi7IAPzDzK339Q3HVr5PZOGagoBS2B1ptX2f6rrPITIuB2rW+lzNDuuBSX
 twHcdZzmSgl2zmFu4D3ql5Oa2ewLMiOn0Z96Esz5y9f74jbLh9ynU7QyRZM0MioS
 V6te7HanJ17zMK6S2thj7qsewqV6N4lcWd7M5ZclK29F8qcW5OWuKn5njFQT7K4s
 QDI0+1uYaSMcWoDAXNVXZf3oKMJDy1LeG+UXGyP5b0AQJrqYrWQ=
 =zZ4I
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:
 "Nothing major, some left-overs from the recent merging window (MTE,
  coco) and some newly found issues like the ptrace() ones.

   - MTE/hugetlbfs:

      - Set VM_MTE_ALLOWED in the arch code and remove it from the core
        code for hugetlbfs mappings

      - Fix copy_highpage() warning when the source is a huge page but
        not MTE tagged, taking the wrong small page path

   - drivers/virt/coco:

      - Add the pKVM and Arm CCA drivers under the arm64 maintainership

      - Fix the pkvm driver to fall back to ioremap() (and warn) if the
        MMIO_GUARD hypercall fails

      - Keep the Arm CCA driver default 'n' rather than 'm'

   - A series of fixes for the arm64 ptrace() implementation,
     potentially leading to the kernel consuming uninitialised stack
     variables when PTRACE_SETREGSET is invoked with a length of 0

   - Fix zone_dma_limit calculation when RAM starts below 4GB and
     ZONE_DMA is capped to this limit

   - Fix early boot warning with CONFIG_DEBUG_VIRTUAL=y triggered by a
     call to page_to_phys() (from patch_map()) which checks pfn_valid()
     before vmemmap has been set up

   - Do not clobber bits 15:8 of the ASID used for TTBR1_EL1 and TLBI
     ops when the kernel assumes 8-bit ASIDs but running under a
     hypervisor on a system that implements 16-bit ASIDs (found running
     Linux under Parallels on Apple M4)

   - ACPI/IORT: Add PMCG platform information for HiSilicon HIP09A as it
     is using the same SMMU PMCG as HIP09 and suffers from the same
     errata

   - Add GCS to cpucap_is_possible(), missed in the recent merge"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: ptrace: fix partial SETREGSET for NT_ARM_GCS
  arm64: ptrace: fix partial SETREGSET for NT_ARM_POE
  arm64: ptrace: fix partial SETREGSET for NT_ARM_FPMR
  arm64: ptrace: fix partial SETREGSET for NT_ARM_TAGGED_ADDR_CTRL
  arm64: cpufeature: Add GCS to cpucap_is_possible()
  coco: virt: arm64: Do not enable cca guest driver by default
  arm64: mte: Fix copy_highpage() warning on hugetlb folios
  arm64: Ensure bits ASID[15:8] are masked out when the kernel uses 8-bit ASIDs
  ACPI/IORT: Add PMCG platform information for HiSilicon HIP09A
  MAINTAINERS: Add CCA and pKVM CoCO guest support to the ARM64 entry
  drivers/virt: pkvm: Don't fail ioremap() call if MMIO_GUARD fails
  arm64: patching: avoid early page_to_phys()
  arm64: mm: Fix zone_dma_limit calculation
  arm64: mte: set VM_MTE_ALLOWED for hugetlbfs at correct place
2024-12-06 13:47:55 -08:00
Suzuki K Poulose
16d5306629 coco: virt: arm64: Do not enable cca guest driver by default
As per the guidelines, new drivers may not be set to default on.
An expert user can always select it.

Reported-by: Dan Williams <dan.j.williams@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Sami Mujawar <sami.mujawar@arm.com>
Link: https://lore.kernel.org/r/6750c695194cd_2508129427@dwillia2-xfh.jf.intel.com.notmuch
Link: https://lore.kernel.org/r/20241205143634.306114-1-suzuki.poulose@arm.com
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-12-05 14:50:20 +00:00
Will Deacon
d44679fb95 drivers/virt: pkvm: Don't fail ioremap() call if MMIO_GUARD fails
Calling the MMIO_GUARD hypercall from guests which have not been
enrolled (e.g. because they are running without pvmfw) results in
-EINVAL being returned. In this case, MMIO_GUARD is not active
and so we can simply proceed with the normal ioremap() routine.

Don't fail ioremap() if MMIO_GUARD fails; instead WARN_ON_ONCE()
to highlight that the pvm environment is slightly wonky.

Fixes: 0f12694958 ("drivers/virt: pkvm: Intercept ioremap using pKVM MMIO_GUARD hypercall")
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20241202145731.6422-2-will@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-12-03 18:10:20 +00:00
Linus Torvalds
e70140ba0d Get rid of 'remove_new' relic from platform driver struct
The continual trickle of small conversion patches is grating on me, and
is really not helping.  Just get rid of the 'remove_new' member
function, which is just an alias for the plain 'remove', and had a
comment to that effect:

  /*
   * .remove_new() is a relic from a prototype conversion of .remove().
   * New drivers are supposed to implement .remove(). Once all drivers are
   * converted to not use .remove_new any more, it will be dropped.
   */

This was just a tree-wide 'sed' script that replaced '.remove_new' with
'.remove', with some care taken to turn a subsequent tab into two tabs
to make things line up.

I did do some minimal manual whitespace adjustment for places that used
spaces to line things up.

Then I just removed the old (sic) .remove_new member function, and this
is the end result.  No more unnecessary conversion noise.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-12-01 15:12:43 -08:00
Linus Torvalds
55db8eb456 - Do the proper memory conversion of guest memory in order to be able to kexec
kernels in SNP guests along with other adjustments and cleanups to that
   effect
 
 - Start converting and moving functionality from the sev-guest driver into
   core code with the purpose of supporting the secure TSC SNP feature where
   the hypervisor cannot influence the TSC exposed to the guest anymore
 
 - Add a "nosnp" cmdline option in order to be able to disable SNP support in
   the hypervisor and thus free-up resources which are not going to be used
 
 - Cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmc7ZToACgkQEsHwGGHe
 VUp61hAArA8taJaGUSdoe3sN60yRWCTe30QiDLvUrDGqmPHbBnDpdYsoaZujkQMI
 334piSWWu/pB6meO93uwv8X/ZO0ryOw46RK3szTz/RhBB5pTO3NbAj1zMF5q2KUy
 a+SYbZffV+qBUEpGujGrqrwT7X3U70yCKJFaZQOGvyYFzo+kyx6euqlYP+StOD+D
 ph7SDrXv0N0uU/2OiwCzF0cKvAuNHG2Cfn3kqSKvcZ+NWF3BKmw1IkgFA9f05P+j
 mOkc+1jCbi26b94MSJHSL33iRtbD0NgUzT9F2tw9Qszw1BQ5Er30Y45ywoudAhsn
 VrpMhBwWRCUdakQ2PsI7O8WB4gnBdWpEuzS2Ssqa1akB+pggH2xQzVb5EznmbzlS
 gz/SqUP75ijTT/oGh+C/hKAES3pmO4pH48J7llOKzb8YpoxxzjSEVb2pVbLzNdIV
 +it12Cap0lW+CTNGF4p2TbuKXKkE1LiGya1JMymQiZL8quCBYJIQUttiBvBg8Ac1
 oCw2DXQZsjDw55Hwwhr95J4FuY4+iQd+o1GgRDQ4MEqaYFEfdcFRA1YCbMHgiAzu
 NOGwjrQ2PB5xGST34qobGtk7Xt2nIilDvl5K5Co2E4s14NLrlBHo2uq33d0unlIZ
 BJMrHG/IWNjuHbKl/vM05fuiKEIvpL5qTKz7oVL6tX8Zphf6ljU=
 =C431
 -----END PGP SIGNATURE-----

Merge tag 'x86_sev_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 SEV updates from Borislav Petkov:

 - Do the proper memory conversion of guest memory in order to be able
   to kexec kernels in SNP guests along with other adjustments and
   cleanups to that effect

 - Start converting and moving functionality from the sev-guest driver
   into core code with the purpose of supporting the secure TSC SNP
   feature where the hypervisor cannot influence the TSC exposed to the
   guest anymore

 - Add a "nosnp" cmdline option in order to be able to disable SNP
   support in the hypervisor and thus free-up resources which are not
   going to be used

 - Cleanups

[ Reminding myself about the endless TLA's again: SEV is the AMD Secure
  Encrypted Virtualization    - Linus ]

* tag 'x86_sev_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sev: Cleanup vc_handle_msr()
  x86/sev: Convert shared memory back to private on kexec
  x86/mm: Refactor __set_clr_pte_enc()
  x86/boot: Skip video memory access in the decompressor for SEV-ES/SNP
  virt: sev-guest: Carve out SNP message context structure
  virt: sev-guest: Reduce the scope of SNP command mutex
  virt: sev-guest: Consolidate SNP guest messaging parameters to a struct
  x86/sev: Cache the secrets page address
  x86/sev: Handle failures from snp_init()
  virt: sev-guest: Use AES GCM crypto library
  x86/virt: Provide "nosnp" boot option for sev kernel command line
  x86/virt: Move SEV-specific parsing into arch/x86/virt/svm
2024-11-19 12:21:35 -08:00
Sami Mujawar
7999edc484 virt: arm-cca-guest: TSM_REPORT support for realms
Introduce an arm-cca-guest driver that registers with
the configfs-tsm module to provide user interfaces for
retrieving an attestation token.

When a new report is requested the arm-cca-guest driver
invokes the appropriate RSI interfaces to query an
attestation token.

The steps to retrieve an attestation token are as follows:
  1. Mount the configfs filesystem if not already mounted
     mount -t configfs none /sys/kernel/config
  2. Generate an attestation token
     report=/sys/kernel/config/tsm/report/report0
     mkdir $report
     dd if=/dev/urandom bs=64 count=1 > $report/inblob
     hexdump -C $report/outblob
     rmdir $report

Signed-off-by: Sami Mujawar <sami.mujawar@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Link: https://lore.kernel.org/r/20241017131434.40935-11-steven.price@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-23 10:19:33 +01:00
Nikunj A Dadhania
0a895c0d9b virt: sev-guest: Carve out SNP message context structure
Currently, the sev-guest driver is the only user of SNP guest messaging.
The snp_guest_dev structure holds all the allocated buffers, secrets page
and VMPCK details. In preparation for adding messaging allocation and
initialization APIs, decouple snp_guest_dev from messaging-related
information by carving out the guest message context
structure(snp_msg_desc).

Incorporate this newly added context into snp_send_guest_request() and all
related functions, replacing the use of the snp_guest_dev.

No functional change.

Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20241009092850.197575-7-nikunj@amd.com
2024-10-16 18:41:40 +02:00
Nikunj A Dadhania
ae596615d9 virt: sev-guest: Reduce the scope of SNP command mutex
The SNP command mutex is used to serialize access to the shared buffer,
command handling, and message sequence number.

All shared buffer, command handling, and message sequence updates are done
within snp_send_guest_request(), so moving the mutex to this function is
appropriate and maintains the critical section.

Since the mutex is now taken at a later point in time, remove the lockdep
checks that occur before taking the mutex.

Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20241009092850.197575-6-nikunj@amd.com
2024-10-16 18:35:28 +02:00
Nikunj A Dadhania
999d73686b virt: sev-guest: Consolidate SNP guest messaging parameters to a struct
Add a snp_guest_req structure to eliminate the need to pass a long list of
parameters. This structure will be used to call the SNP Guest message
request API, simplifying the function arguments.

Update the snp_issue_guest_request() prototype to include the new guest
request structure.

Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20241009092850.197575-5-nikunj@amd.com
2024-10-16 18:30:40 +02:00
Nikunj A Dadhania
f3476bc770 virt: sev-guest: Use AES GCM crypto library
The sev-guest driver encryption code uses the crypto API for SNP guest
messaging with the AMD Security processor. In order to enable secure TSC,
SEV-SNP guests need to send such a TSC_INFO message before the APs are
booted. Details from the TSC_INFO response will then be used to program the
VMSA before the APs are brought up.

However, the crypto API is not available this early in the boot process.

In preparation for moving the encryption code out of sev-guest to support
secure TSC and to ease review, switch to using the AES GCM library
implementation instead.

Drop __enc_payload() and dec_payload() helpers as both are small and can be
moved to the respective callers.

Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Peter Gonda <pgonda@google.com>
Link: https://lore.kernel.org/r/20241009092850.197575-2-nikunj@amd.com
2024-10-16 18:08:17 +02:00
Al Viro
cb787f4ac0 [tree-wide] finally take no_llseek out
no_llseek had been defined to NULL two years ago, in commit 868941b144
("fs: remove no_llseek")

To quote that commit,

  At -rc1 we'll need do a mechanical removal of no_llseek -

  git grep -l -w no_llseek | grep -v porting.rst | while read i; do
	sed -i '/\<no_llseek\>/d' $i
  done

  would do it.

Unfortunately, that hadn't been done.  Linus, could you do that now, so
that we could finally put that thing to rest? All instances are of the
form
	.llseek = no_llseek,
so it's obviously safe.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-09-27 08:18:43 -07:00
Linus Torvalds
114143a595 arm64 updates for 6.12
ACPI:
 * Enable PMCG erratum workaround for HiSilicon HIP10 and 11 platforms.
 * Ensure arm64-specific IORT header is covered by MAINTAINERS.
 
 CPU Errata:
 * Enable workaround for hardware access/dirty issue on Ampere-1A cores.
 
 Memory management:
 * Define PHYSMEM_END to fix a crash in the amdgpu driver.
 * Avoid tripping over invalid kernel mappings on the kexec() path.
 * Userspace support for the Permission Overlay Extension (POE) using
   protection keys.
 
 Perf and PMUs:
 * Add support for the "fixed instruction counter" extension in the CPU
   PMU architecture.
 * Extend and fix the event encodings for Apple's M1 CPU PMU.
 * Allow LSM hooks to decide on SPE permissions for physical profiling.
 * Add support for the CMN S3 and NI-700 PMUs.
 
 Confidential Computing:
 * Add support for booting an arm64 kernel as a protected guest under
   Android's "Protected KVM" (pKVM) hypervisor.
 
 Selftests:
 * Fix vector length issues in the SVE/SME sigreturn tests
 * Fix build warning in the ptrace tests.
 
 Timers:
 * Add support for PR_{G,S}ET_TSC so that 'rr' can deal with
   non-determinism arising from the architected counter.
 
 Miscellaneous:
 * Rework our IPI-based CPU stopping code to try NMIs if regular IPIs
   don't succeed.
 * Minor fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmbkVNEQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNKeIB/9YtbN7JMgsXktM94GP03r3tlFF36Y1S51S
 +zdDZclAVZCTCZN+PaFeAZ/+ah2EQYrY6rtDoHUSEMQdF9kH+ycuIPDTwaJ4Qkam
 QKXMpAgtY/4yf2rX4lhDF8rEvkhLDsu7oGDhqUZQsA33GrMBHfgA3oqpYwlVjvGq
 gkm7olTo9LdWAxkPpnjGrjB6Mv5Dq8dJRhW+0Q5AntI5zx3RdYGJZA9GUSzyYCCt
 FIYOtMmWPkQ0kKxIVxOxAOm/ubhfyCs2sjSfkaa3vtvtt+Yjye1Xd81rFciIbPgP
 QlK/Mes2kBZmjhkeus8guLI5Vi7tx3DQMkNqLXkHAAzOoC4oConE
 =6osL
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:
 "The highlights are support for Arm's "Permission Overlay Extension"
  using memory protection keys, support for running as a protected guest
  on Android as well as perf support for a bunch of new interconnect
  PMUs.

  Summary:

  ACPI:
   - Enable PMCG erratum workaround for HiSilicon HIP10 and 11
     platforms.
   - Ensure arm64-specific IORT header is covered by MAINTAINERS.

  CPU Errata:
   - Enable workaround for hardware access/dirty issue on Ampere-1A
     cores.

  Memory management:
   - Define PHYSMEM_END to fix a crash in the amdgpu driver.
   - Avoid tripping over invalid kernel mappings on the kexec() path.
   - Userspace support for the Permission Overlay Extension (POE) using
     protection keys.

  Perf and PMUs:
   - Add support for the "fixed instruction counter" extension in the
     CPU PMU architecture.
   - Extend and fix the event encodings for Apple's M1 CPU PMU.
   - Allow LSM hooks to decide on SPE permissions for physical
     profiling.
   - Add support for the CMN S3 and NI-700 PMUs.

  Confidential Computing:
   - Add support for booting an arm64 kernel as a protected guest under
     Android's "Protected KVM" (pKVM) hypervisor.

  Selftests:
   - Fix vector length issues in the SVE/SME sigreturn tests
   - Fix build warning in the ptrace tests.

  Timers:
   - Add support for PR_{G,S}ET_TSC so that 'rr' can deal with
     non-determinism arising from the architected counter.

  Miscellaneous:
   - Rework our IPI-based CPU stopping code to try NMIs if regular IPIs
     don't succeed.
   - Minor fixes and cleanups"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (94 commits)
  perf: arm-ni: Fix an NULL vs IS_ERR() bug
  arm64: hibernate: Fix warning for cast from restricted gfp_t
  arm64: esr: Define ESR_ELx_EC_* constants as UL
  arm64: pkeys: remove redundant WARN
  perf: arm_pmuv3: Use BR_RETIRED for HW branch event if enabled
  MAINTAINERS: List Arm interconnect PMUs as supported
  perf: Add driver for Arm NI-700 interconnect PMU
  dt-bindings/perf: Add Arm NI-700 PMU
  perf/arm-cmn: Improve format attr printing
  perf/arm-cmn: Clean up unnecessary NUMA_NO_NODE check
  arm64/mm: use lm_alias() with addresses passed to memblock_free()
  mm: arm64: document why pte is not advanced in contpte_ptep_set_access_flags()
  arm64: Expose the end of the linear map in PHYSMEM_END
  arm64: trans_pgd: mark PTEs entries as valid to avoid dead kexec()
  arm64/mm: Delete __init region from memblock.reserved
  perf/arm-cmn: Support CMN S3
  dt-bindings: perf: arm-cmn: Add CMN S3
  perf/arm-cmn: Refactor DTC PMU register access
  perf/arm-cmn: Make cycle counts less surprising
  perf/arm-cmn: Improve build-time assertion
  ...
2024-09-16 06:55:07 +02:00
Will Deacon
0f12694958 drivers/virt: pkvm: Intercept ioremap using pKVM MMIO_GUARD hypercall
Hook up pKVM's MMIO_GUARD hypercall so that ioremap() and friends will
register the target physical address as MMIO with the hypervisor,
allowing guest exits to that page to be emulated by the host with full
syndrome information.

Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240830130150.8568-7-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30 16:30:41 +01:00
Will Deacon
ebc59b120c drivers/virt: pkvm: Hook up mem_encrypt API using pKVM hypercalls
If we detect the presence of pKVM's SHARE and UNSHARE hypercalls, then
register a backend implementation of the mem_encrypt API so that things
like DMA buffers can be shared appropriately with the host.

Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240830130150.8568-5-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30 16:30:41 +01:00
Will Deacon
a06c3fad49 drivers/virt: pkvm: Add initial support for running as a protected guest
Implement a pKVM protected guest driver to probe the presence of pKVM
and determine the memory protection granule using the HYP_MEMINFO
hypercall.

Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240830130150.8568-3-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30 16:30:41 +01:00
Nikunj A Dadhania
2b9ac0b84c virt: sev-guest: Ensure the SNP guest messages do not exceed a page
Currently, struct snp_guest_msg includes a message header (96 bytes) and
a payload (4000 bytes). There is an implicit assumption here that the
SNP message header will always be 96 bytes, and with that assumption the
payload array size has been set to 4000 bytes - a magic number. If any
new member is added to the SNP message header, the SNP guest message
will span more than a page.

Instead of using a magic number for the payload, declare struct
snp_guest_msg in a way that payload plus the message header do not
exceed a page.

  [ bp: Massage. ]

Suggested-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240731150811.156771-5-nikunj@amd.com
2024-08-27 10:35:38 +02:00
Nikunj A Dadhania
5f7c38f81d virt: sev-guest: Fix user-visible strings
User-visible abbreviations should be in capitals, ensure messages are
readable and clear.

No functional change.

Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240731150811.156771-4-nikunj@amd.com
2024-08-27 10:35:06 +02:00
Nikunj A Dadhania
a1bbb2236b virt: sev-guest: Rename local guest message variables
Rename local guest message variables for more clarity.

No functional change.

Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240731150811.156771-3-nikunj@amd.com
2024-08-27 10:34:41 +02:00
Nikunj A Dadhania
dc6d20b900 virt: sev-guest: Replace dev_dbg() with pr_debug()
In preparation for moving code to arch/x86/coco/sev/core.c, replace
dev_dbg with pr_debug.

No functional change.

Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Peter Gonda <pgonda@google.com>
Link: https://lore.kernel.org/r/20240731150811.156771-2-nikunj@amd.com
2024-08-27 10:22:20 +02:00
Linus Torvalds
2c9b351240 ARM:
* Initial infrastructure for shadow stage-2 MMUs, as part of nested
   virtualization enablement
 
 * Support for userspace changes to the guest CTR_EL0 value, enabling
   (in part) migration of VMs between heterogenous hardware
 
 * Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1 of
   the protocol
 
 * FPSIMD/SVE support for nested, including merged trap configuration
   and exception routing
 
 * New command-line parameter to control the WFx trap behavior under KVM
 
 * Introduce kCFI hardening in the EL2 hypervisor
 
 * Fixes + cleanups for handling presence/absence of FEAT_TCRX
 
 * Miscellaneous fixes + documentation updates
 
 LoongArch:
 
 * Add paravirt steal time support.
 
 * Add support for KVM_DIRTY_LOG_INITIALLY_SET.
 
 * Add perf kvm-stat support for loongarch.
 
 RISC-V:
 
 * Redirect AMO load/store access fault traps to guest
 
 * perf kvm stat support
 
 * Use guest files for IMSIC virtualization, when available
 
 ONE_REG support for the Zimop, Zcmop, Zca, Zcf, Zcd, Zcb and Zawrs ISA
 extensions is coming through the RISC-V tree.
 
 s390:
 
 * Assortment of tiny fixes which are not time critical
 
 x86:
 
 * Fixes for Xen emulation.
 
 * Add a global struct to consolidate tracking of host values, e.g. EFER
 
 * Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the effective APIC
   bus frequency, because TDX.
 
 * Print the name of the APICv/AVIC inhibits in the relevant tracepoint.
 
 * Clean up KVM's handling of vendor specific emulation to consistently act on
   "compatible with Intel/AMD", versus checking for a specific vendor.
 
 * Drop MTRR virtualization, and instead always honor guest PAT on CPUs
   that support self-snoop.
 
 * Update to the newfangled Intel CPU FMS infrastructure.
 
 * Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as it reads
   '0' and writes from userspace are ignored.
 
 * Misc cleanups
 
 x86 - MMU:
 
 * Small cleanups, renames and refactoring extracted from the upcoming
   Intel TDX support.
 
 * Don't allocate kvm_mmu_page.shadowed_translation for shadow pages that can't
   hold leafs SPTEs.
 
 * Unconditionally drop mmu_lock when allocating TDP MMU page tables for eager
   page splitting, to avoid stalling vCPUs when splitting huge pages.
 
 * Bug the VM instead of simply warning if KVM tries to split a SPTE that is
   non-present or not-huge.  KVM is guaranteed to end up in a broken state
   because the callers fully expect a valid SPTE, it's all but dangerous
   to let more MMU changes happen afterwards.
 
 x86 - AMD:
 
 * Make per-CPU save_area allocations NUMA-aware.
 
 * Force sev_es_host_save_area() to be inlined to avoid calling into an
   instrumentable function from noinstr code.
 
 * Base support for running SEV-SNP guests.  API-wise, this includes
   a new KVM_X86_SNP_VM type, encrypting/measure the initial image into
   guest memory, and finalizing it before launching it.  Internally,
   there are some gmem/mmu hooks needed to prepare gmem-allocated pages
   before mapping them into guest private memory ranges.
 
   This includes basic support for attestation guest requests, enough to
   say that KVM supports the GHCB 2.0 specification.
 
   There is no support yet for loading into the firmware those signing
   keys to be used for attestation requests, and therefore no need yet
   for the host to provide certificate data for those keys.  To support
   fetching certificate data from userspace, a new KVM exit type will be
   needed to handle fetching the certificate from userspace. An attempt to
   define a new KVM_EXIT_COCO/KVM_EXIT_COCO_REQ_CERTS exit type to handle
   this was introduced in v1 of this patchset, but is still being discussed
   by community, so for now this patchset only implements a stub version
   of SNP Extended Guest Requests that does not provide certificate data.
 
 x86 - Intel:
 
 * Remove an unnecessary EPT TLB flush when enabling hardware.
 
 * Fix a series of bugs that cause KVM to fail to detect nested pending posted
   interrupts as valid wake eents for a vCPU executing HLT in L2 (with
   HLT-exiting disable by L1).
 
 * KVM: x86: Suppress MMIO that is triggered during task switch emulation
 
   Explicitly suppress userspace emulated MMIO exits that are triggered when
   emulating a task switch as KVM doesn't support userspace MMIO during
   complex (multi-step) emulation.  Silently ignoring the exit request can
   result in the WARN_ON_ONCE(vcpu->mmio_needed) firing if KVM exits to
   userspace for some other reason prior to purging mmio_needed.
 
   See commit 0dc902267c ("KVM: x86: Suppress pending MMIO write exits if
   emulator detects exception") for more details on KVM's limitations with
   respect to emulated MMIO during complex emulator flows.
 
 Generic:
 
 * Rename the AS_UNMOVABLE flag that was introduced for KVM to AS_INACCESSIBLE,
   because the special casing needed by these pages is not due to just
   unmovability (and in fact they are only unmovable because the CPU cannot
   access them).
 
 * New ioctl to populate the KVM page tables in advance, which is useful to
   mitigate KVM page faults during guest boot or after live migration.
   The code will also be used by TDX, but (probably) not through the ioctl.
 
 * Enable halt poll shrinking by default, as Intel found it to be a clear win.
 
 * Setup empty IRQ routing when creating a VM to avoid having to synchronize
   SRCU when creating a split IRQCHIP on x86.
 
 * Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag
   that arch code can use for hooking both sched_in() and sched_out().
 
 * Take the vCPU @id as an "unsigned long" instead of "u32" to avoid
   truncating a bogus value from userspace, e.g. to help userspace detect bugs.
 
 * Mark a vCPU as preempted if and only if it's scheduled out while in the
   KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest
   memory when retrieving guest state during live migration blackout.
 
 Selftests:
 
 * Remove dead code in the memslot modification stress test.
 
 * Treat "branch instructions retired" as supported on all AMD Family 17h+ CPUs.
 
 * Print the guest pseudo-RNG seed only when it changes, to avoid spamming the
   log for tests that create lots of VMs.
 
 * Make the PMU counters test less flaky when counting LLC cache misses by
   doing CLFLUSH{OPT} in every loop iteration.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmaZQB0UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNkZwf/bv2jiENaLFNGPe/VqTKMQ6PHQLMG
 +sNHx6fJPP35gTM8Jqf0/7/ummZXcSuC1mWrzYbecZm7Oeg3vwNXHZ4LquwwX6Dv
 8dKcUzLbWDAC4WA3SKhi8C8RV2v6E7ohy69NtAJmFWTc7H95dtIQm6cduV2osTC3
 OEuHe1i8d9umk6couL9Qhm8hk3i9v2KgCsrfyNrQgLtS3hu7q6yOTR8nT0iH6sJR
 KE5A8prBQgLmF34CuvYDw4Hu6E4j+0QmIqodovg2884W1gZQ9LmcVqYPaRZGsG8S
 iDdbkualLKwiR1TpRr3HJGKWSFdc7RblbsnHRvHIZgFsMQiimh4HrBSCyQ==
 =zepX
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:

   - Initial infrastructure for shadow stage-2 MMUs, as part of nested
     virtualization enablement

   - Support for userspace changes to the guest CTR_EL0 value, enabling
     (in part) migration of VMs between heterogenous hardware

   - Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1
     of the protocol

   - FPSIMD/SVE support for nested, including merged trap configuration
     and exception routing

   - New command-line parameter to control the WFx trap behavior under
     KVM

   - Introduce kCFI hardening in the EL2 hypervisor

   - Fixes + cleanups for handling presence/absence of FEAT_TCRX

   - Miscellaneous fixes + documentation updates

  LoongArch:

   - Add paravirt steal time support

   - Add support for KVM_DIRTY_LOG_INITIALLY_SET

   - Add perf kvm-stat support for loongarch

  RISC-V:

   - Redirect AMO load/store access fault traps to guest

   - perf kvm stat support

   - Use guest files for IMSIC virtualization, when available

  s390:

   - Assortment of tiny fixes which are not time critical

  x86:

   - Fixes for Xen emulation

   - Add a global struct to consolidate tracking of host values, e.g.
     EFER

   - Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the
     effective APIC bus frequency, because TDX

   - Print the name of the APICv/AVIC inhibits in the relevant
     tracepoint

   - Clean up KVM's handling of vendor specific emulation to
     consistently act on "compatible with Intel/AMD", versus checking
     for a specific vendor

   - Drop MTRR virtualization, and instead always honor guest PAT on
     CPUs that support self-snoop

   - Update to the newfangled Intel CPU FMS infrastructure

   - Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as
     it reads '0' and writes from userspace are ignored

   - Misc cleanups

  x86 - MMU:

   - Small cleanups, renames and refactoring extracted from the upcoming
     Intel TDX support

   - Don't allocate kvm_mmu_page.shadowed_translation for shadow pages
     that can't hold leafs SPTEs

   - Unconditionally drop mmu_lock when allocating TDP MMU page tables
     for eager page splitting, to avoid stalling vCPUs when splitting
     huge pages

   - Bug the VM instead of simply warning if KVM tries to split a SPTE
     that is non-present or not-huge. KVM is guaranteed to end up in a
     broken state because the callers fully expect a valid SPTE, it's
     all but dangerous to let more MMU changes happen afterwards

  x86 - AMD:

   - Make per-CPU save_area allocations NUMA-aware

   - Force sev_es_host_save_area() to be inlined to avoid calling into
     an instrumentable function from noinstr code

   - Base support for running SEV-SNP guests. API-wise, this includes a
     new KVM_X86_SNP_VM type, encrypting/measure the initial image into
     guest memory, and finalizing it before launching it. Internally,
     there are some gmem/mmu hooks needed to prepare gmem-allocated
     pages before mapping them into guest private memory ranges

     This includes basic support for attestation guest requests, enough
     to say that KVM supports the GHCB 2.0 specification

     There is no support yet for loading into the firmware those signing
     keys to be used for attestation requests, and therefore no need yet
     for the host to provide certificate data for those keys.

     To support fetching certificate data from userspace, a new KVM exit
     type will be needed to handle fetching the certificate from
     userspace.

     An attempt to define a new KVM_EXIT_COCO / KVM_EXIT_COCO_REQ_CERTS
     exit type to handle this was introduced in v1 of this patchset, but
     is still being discussed by community, so for now this patchset
     only implements a stub version of SNP Extended Guest Requests that
     does not provide certificate data

  x86 - Intel:

   - Remove an unnecessary EPT TLB flush when enabling hardware

   - Fix a series of bugs that cause KVM to fail to detect nested
     pending posted interrupts as valid wake eents for a vCPU executing
     HLT in L2 (with HLT-exiting disable by L1)

   - KVM: x86: Suppress MMIO that is triggered during task switch
     emulation

     Explicitly suppress userspace emulated MMIO exits that are
     triggered when emulating a task switch as KVM doesn't support
     userspace MMIO during complex (multi-step) emulation

     Silently ignoring the exit request can result in the
     WARN_ON_ONCE(vcpu->mmio_needed) firing if KVM exits to userspace
     for some other reason prior to purging mmio_needed

     See commit 0dc902267c ("KVM: x86: Suppress pending MMIO write
     exits if emulator detects exception") for more details on KVM's
     limitations with respect to emulated MMIO during complex emulator
     flows

  Generic:

   - Rename the AS_UNMOVABLE flag that was introduced for KVM to
     AS_INACCESSIBLE, because the special casing needed by these pages
     is not due to just unmovability (and in fact they are only
     unmovable because the CPU cannot access them)

   - New ioctl to populate the KVM page tables in advance, which is
     useful to mitigate KVM page faults during guest boot or after live
     migration. The code will also be used by TDX, but (probably) not
     through the ioctl

   - Enable halt poll shrinking by default, as Intel found it to be a
     clear win

   - Setup empty IRQ routing when creating a VM to avoid having to
     synchronize SRCU when creating a split IRQCHIP on x86

   - Rework the sched_in/out() paths to replace kvm_arch_sched_in() with
     a flag that arch code can use for hooking both sched_in() and
     sched_out()

   - Take the vCPU @id as an "unsigned long" instead of "u32" to avoid
     truncating a bogus value from userspace, e.g. to help userspace
     detect bugs

   - Mark a vCPU as preempted if and only if it's scheduled out while in
     the KVM_RUN loop, e.g. to avoid marking it preempted and thus
     writing guest memory when retrieving guest state during live
     migration blackout

  Selftests:

   - Remove dead code in the memslot modification stress test

   - Treat "branch instructions retired" as supported on all AMD Family
     17h+ CPUs

   - Print the guest pseudo-RNG seed only when it changes, to avoid
     spamming the log for tests that create lots of VMs

   - Make the PMU counters test less flaky when counting LLC cache
     misses by doing CLFLUSH{OPT} in every loop iteration"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits)
  crypto: ccp: Add the SNP_VLEK_LOAD command
  KVM: x86/pmu: Add kvm_pmu_call() to simplify static calls of kvm_pmu_ops
  KVM: x86: Introduce kvm_x86_call() to simplify static calls of kvm_x86_ops
  KVM: x86: Replace static_call_cond() with static_call()
  KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event
  x86/sev: Move sev_guest.h into common SEV header
  KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
  KVM: x86: Suppress MMIO that is triggered during task switch emulation
  KVM: x86/mmu: Clean up make_huge_page_split_spte() definition and intro
  KVM: x86/mmu: Bug the VM if KVM tries to split a !hugepage SPTE
  KVM: selftests: x86: Add test for KVM_PRE_FAULT_MEMORY
  KVM: x86: Implement kvm_arch_vcpu_pre_fault_memory()
  KVM: x86/mmu: Make kvm_mmu_do_page_fault() return mapped level
  KVM: x86/mmu: Account pf_{fixed,emulate,spurious} in callers of "do page fault"
  KVM: x86/mmu: Bump pf_taken stat only in the "real" page fault handler
  KVM: Add KVM_PRE_FAULT_MEMORY vcpu ioctl to pre-populate guest memory
  KVM: Document KVM_PRE_FAULT_MEMORY ioctl
  mm, virt: merge AS_UNMOVABLE and AS_INACCESSIBLE
  perf kvm: Add kvm-stat for loongarch64
  LoongArch: KVM: Add PV steal time support in guest side
  ...
2024-07-20 12:41:03 -07:00
Paolo Bonzini
bc9cd5a219 Merge branch 'kvm-6.11-sev-attestation' into HEAD
The GHCB 2.0 specification defines 2 GHCB request types to allow SNP guests
to send encrypted messages/requests to firmware: SNP Guest Requests and SNP
Extended Guest Requests. These encrypted messages are used for things like
servicing attestation requests issued by the guest. Implementing support for
these is required to be fully GHCB-compliant.

For the most part, KVM only needs to handle forwarding these requests to
firmware (to be issued via the SNP_GUEST_REQUEST firmware command defined
in the SEV-SNP Firmware ABI), and then forwarding the encrypted response to
the guest.

However, in the case of SNP Extended Guest Requests, the host is also
able to provide the certificate data corresponding to the endorsement key
used by firmware to sign attestation report requests. This certificate data
is provided by userspace because:

  1) It allows for different keys/key types to be used for each particular
     guest with requiring any sort of KVM API to configure the certificate
     table in advance on a per-guest basis.

  2) It provides additional flexibility with how attestation requests might
     be handled during live migration where the certificate data for
     source/dest might be different.

  3) It allows all synchronization between certificates and firmware/signing
     key updates to be handled purely by userspace rather than requiring
     some in-kernel mechanism to facilitate it. [1]

To support fetching certificate data from userspace, a new KVM exit type will
be needed to handle fetching the certificate from userspace. An attempt to
define a new KVM_EXIT_COCO/KVM_EXIT_COCO_REQ_CERTS exit type to handle this
was introduced in v1 of this patchset, but is still being discussed by
community, so for now this patchset only implements a stub version of SNP
Extended Guest Requests that does not provide certificate data, but is still
enough to provide compliance with the GHCB 2.0 spec.
2024-07-16 11:44:23 -04:00
Michael Roth
f55f3c3ac6 x86/sev: Move sev_guest.h into common SEV header
sev_guest.h currently contains various definitions relating to the
format of SNP_GUEST_REQUEST commands to SNP firmware. Currently only the
sev-guest driver makes use of them, but when the KVM side of this is
implemented there's a need to parse the SNP_GUEST_REQUEST header to
determine whether additional information needs to be provided to the
guest. Prepare for this by moving those definitions to a common header
that's shared by host/guest code so that KVM can also make use of them.

Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240701223148.3798365-3-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-16 11:44:00 -04:00
Uwe Kleine-König
3991b04d48 virt: sev-guest: Mark driver struct with __refdata to prevent section mismatch
As described in the added code comment, a reference to .exit.text is ok for
drivers registered via module_platform_driver_probe(). Make this explicit to
prevent the following section mismatch warning:

  WARNING: modpost: drivers/virt/coco/sev-guest/sev-guest: section mismatch in reference: \
    sev_guest_driver+0x10 (section: .data) -> sev_guest_remove (section: .exit.text)

that triggers on an allmodconfig W=1 build.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/4a81b0e87728a58904283e2d1f18f73abc69c2a1.1711748999.git.u.kleine-koenig@pengutronix.de
2024-06-20 20:28:50 +02:00
Tom Lendacky
627dc67151 x86/sev: Extend the config-fs attestation support for an SVSM
When an SVSM is present, the guest can also request attestation reports
from it. These SVSM attestation reports can be used to attest the SVSM
and any services running within the SVSM.

Extend the config-fs attestation support to provide such. This involves
creating four new config-fs attributes:

  - 'service-provider' (input)
    This attribute is used to determine whether the attestation request
    should be sent to the specified service provider or to the SEV
    firmware. The SVSM service provider is represented by the value
    'svsm'.

  - 'service_guid' (input)
    Used for requesting the attestation of a single service within the
    service provider. A null GUID implies that the SVSM_ATTEST_SERVICES
    call should be used to request the attestation report. A non-null
    GUID implies that the SVSM_ATTEST_SINGLE_SERVICE call should be used.

  - 'service_manifest_version' (input)
    Used with the SVSM_ATTEST_SINGLE_SERVICE call, the service version
    represents a specific service manifest version be used for the
    attestation report.

  - 'manifestblob' (output)
    Used to return the service manifest associated with the attestation
    report.

Only display these new attributes when running under an SVSM.

  [ bp: Massage.
   - s/svsm_attestation_call/svsm_attest_call/g ]

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/965015dce3c76bb8724839d50c5dea4e4b5d598f.1717600736.git.thomas.lendacky@amd.com
2024-06-17 20:42:57 +02:00
Tom Lendacky
20dfee9593 x86/sev: Take advantage of configfs visibility support in TSM
The TSM attestation report support provides multiple configfs attribute
types (both for standard and binary attributes) to allow for additional
attributes to be displayed for SNP as compared to TDX. With the ability
to hide attributes via configfs, consolidate the multiple attribute groups
into a single standard attribute group and a single binary attribute
group. Modify the TDX support to hide the attributes that were previously
"hidden" as a result of registering the selective attribute groups.

Co-developed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://lore.kernel.org/r/8873c45d0c8abc35aaf01d7833a55788a6905727.1717600736.git.thomas.lendacky@amd.com
2024-06-17 20:42:57 +02:00
Tom Lendacky
614dc0fb76 sev-guest: configfs-tsm: Allow the privlevel_floor attribute to be updated
With the introduction of an SVSM, Linux will be running at a non-zero
VMPL. Any request for an attestation report at a higher privilege VMPL
than what Linux is currently running will result in an error. Allow for
the privlevel_floor attribute to be updated dynamically.

  [ bp: Trim commit message. ]

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/5a736be9384aebd98a0b7c929660f8a97cbdc366.1717600736.git.thomas.lendacky@amd.com
2024-06-17 20:42:57 +02:00
Tom Lendacky
eb65f96cb3 virt: sev-guest: Choose the VMPCK key based on executing VMPL
Currently, the sev-guest driver uses the vmpck-0 key by default. When an
SVSM is present, the kernel is running at a VMPL other than 0 and the
vmpck-0 key is no longer available. If a specific vmpck key has not be
requested by the user via the vmpck_id module parameter, choose the
vmpck key based on the active VMPL level.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/b88081c5d88263176849df8ea93e90a404619cab.1717600736.git.thomas.lendacky@amd.com
2024-06-17 20:42:57 +02:00