From 7ac6612d6b7994491ac410401ed2fbac2bdefc18 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Tue, 3 Jun 2025 11:52:54 -0700 Subject: [PATCH 01/28] Documentation/driver-api/cxl: Introduce conventions.rst There exists shipping platforms that bend, break, or otherwise lean on ambiguities in the CXL specification. Without driver changes to accommodate these deviations, end users are left without CXL subsystem RAS features. Specifically, provisioning, error translation, and other flows require the CXL subsystem to understand the platforms CXL topology beyond undecorated memory address ranges. Those isolated compatibility problems risk growing into deeper upstream maintenance burden if different platform vendors arrive at diverging solutions. For example, there are multiple options for resolving low-memory-mmio intersecting large-interleave-ways CXL windows. Linux should only entertain one solution to that problem. Now, with the ACPI Specification Working Group, situations like this would be resolved with the "Code First ECN" process to codify Linux expectations in a specification. In the absence of such a process for the CXL specification, create a file in Linux documentation to detail the motivations, assumptions, tradeoffs, and proposals for amending specification language. The goal is to capture the issues such that platform vendors arrive at compatible solutions for these problems and serve as a repository for potential specification updates. The expectation is to update conventions.rst along with CXL subsystem code changes to accommodate the platform topology. [ dj: Rebased against v6.16-rc1 ] Signed-off-by: Dan Williams Reviewed-by: Jonathan Cameron Reviewed-by: Dave Jiang Reviewed-by: Robert Richter Link: https://patch.msgid.link/20250603185254.3730099-1-dan.j.williams@intel.com Signed-off-by: Dave Jiang --- Documentation/driver-api/cxl/conventions.rst | 47 ++++++++++++++++++++ Documentation/driver-api/cxl/index.rst | 1 + 2 files changed, 48 insertions(+) create mode 100644 Documentation/driver-api/cxl/conventions.rst diff --git a/Documentation/driver-api/cxl/conventions.rst b/Documentation/driver-api/cxl/conventions.rst new file mode 100644 index 000000000000..da347a81a237 --- /dev/null +++ b/Documentation/driver-api/cxl/conventions.rst @@ -0,0 +1,47 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. include:: + +======================================= +Compute Express Link: Linux Conventions +======================================= + +There exists shipping platforms that bend or break CXL specification +expectations. Record the details and the rationale for those deviations. +Borrow the ACPI Code First template format to capture the assumptions +and tradeoffs such that multiple platform implementations can follow the +same convention. + +<(template) Title> +================== + +Document +-------- +CXL Revision , Version + +License +------- +SPDX-License Identifier: CC-BY-4.0 + +Creator/Contributors +-------------------- + +Summary of the Change +--------------------- + + + + +Benefits of the Change +---------------------- + + + +References +---------- + +Detailed Description of the Change +---------------------------------- + + diff --git a/Documentation/driver-api/cxl/index.rst b/Documentation/driver-api/cxl/index.rst index 9e1414ad3357..c1106a68b67c 100644 --- a/Documentation/driver-api/cxl/index.rst +++ b/Documentation/driver-api/cxl/index.rst @@ -14,6 +14,7 @@ that have impacts on each other. The docs here break up configurations steps. theory-of-operation maturity-map + conventions .. toctree:: :maxdepth: 2 From 38b502e0a65215ddefaf84b672ec3908af97bacf Mon Sep 17 00:00:00 2001 From: Alison Schofield Date: Thu, 29 May 2025 13:51:13 -0700 Subject: [PATCH 02/28] cxl/pci: Replace mutex_lock_io() w mutex_lock() for mailbox access mutex_lock_io() differs from mutex_lock() in that it may call io_schedule() when a task must sleep waiting for the lock. This distinction only makes sense in block I/O or memory reclaim paths, where giving I/O a chance to make progress is useful. At this call site, cxl_pci_mbox_send(), the mutex protects an MMIO mailbox. The task holding the lock is not blocking I/O progress, so calling io_schedule(), as mutex_lock_io() may do, has no practical effect. Although there is no functional change, using the correct locking primitive, that more accurately reflects the semantics and intended use of the lock, improves code clarity and avoids misleading readers and tools. [ dj: Dropped fixes tag, no need to backport ] Reported-by: Alok Tiwari Closes: https://lore.kernel.org/linux-cxl/0d2af1e8-7f1b-438c-a090-fd366c8c63e0@oracle.com/ Suggested-by: Dan Williams Signed-off-by: Alison Schofield Reviewed-by: Davidlohr Bueso Reviewed-by: Dan Williams Link: https://patch.msgid.link/20250529205117.1990465-1-alison.schofield@intel.com Signed-off-by: Dave Jiang --- drivers/cxl/pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c index 785aa2af5eaa..bd100ac31672 100644 --- a/drivers/cxl/pci.c +++ b/drivers/cxl/pci.c @@ -379,7 +379,7 @@ static int cxl_pci_mbox_send(struct cxl_mailbox *cxl_mbox, { int rc; - mutex_lock_io(&cxl_mbox->mbox_mutex); + mutex_lock(&cxl_mbox->mbox_mutex); rc = __cxl_pci_mbox_send_cmd(cxl_mbox, cmd); mutex_unlock(&cxl_mbox->mbox_mutex); From 60da1f685a94bc9bd94caf46d953cfa43468c29e Mon Sep 17 00:00:00 2001 From: Jonathan Cameron Date: Tue, 27 May 2025 16:34:51 +0100 Subject: [PATCH 03/28] cxl_test: Limit location for fake CFMWS to mappable range Some architectures (e.g. arm64) only support memory hotplug operations on a restricted set of physical addresses. This applies even when we are faking some CXL fixed memory windows for the purposes of cxl_test. That range can be queried with mhp_get_pluggable_range(true). Use the minimum of that the top of that range and iomem_resource.end to establish the 64GiB region used by cxl_test. From thread #2 which was related to the issue in #1. [ dj: Add CONFIG_MEMORY_HOTPLUG config check, from Alison ] Link: https://lore.kernel.org/linux-cxl/20250522145622.00002633@huawei.com/ #2 Reported-by: Itaru Kitayama Closes: https://github.com/pmem/ndctl/issues/278 #1 Reviewed-by: Dan Williams Tested-by: Itaru Kitayama Tested-by: Marc Herbert Signed-off-by: Jonathan Cameron Link: https://patch.msgid.link/20250527153451.82858-1-Jonathan.Cameron@huawei.com Signed-off-by: Dave Jiang --- tools/testing/cxl/config_check.c | 1 + tools/testing/cxl/test/cxl.c | 7 ++++++- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/tools/testing/cxl/config_check.c b/tools/testing/cxl/config_check.c index 0902c5d6e410..a80bc2c062fe 100644 --- a/tools/testing/cxl/config_check.c +++ b/tools/testing/cxl/config_check.c @@ -14,4 +14,5 @@ void check(void) BUILD_BUG_ON(!IS_ENABLED(CONFIG_CXL_REGION_INVALIDATION_TEST)); BUILD_BUG_ON(!IS_ENABLED(CONFIG_NVDIMM_SECURITY_TEST)); BUILD_BUG_ON(!IS_ENABLED(CONFIG_DEBUG_FS)); + BUILD_BUG_ON(!IS_ENABLED(CONFIG_MEMORY_HOTPLUG)); } diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c index 8a5815ca870d..6a25cca5636f 100644 --- a/tools/testing/cxl/test/cxl.c +++ b/tools/testing/cxl/test/cxl.c @@ -2,6 +2,7 @@ // Copyright(c) 2021 Intel Corporation. All rights reserved. #include +#include #include #include #include @@ -1328,6 +1329,7 @@ err_mem: static __init int cxl_test_init(void) { int rc, i; + struct range mappable; cxl_acpi_test(); cxl_core_test(); @@ -1342,8 +1344,11 @@ static __init int cxl_test_init(void) rc = -ENOMEM; goto err_gen_pool_create; } + mappable = mhp_get_pluggable_range(true); - rc = gen_pool_add(cxl_mock_pool, iomem_resource.end + 1 - SZ_64G, + rc = gen_pool_add(cxl_mock_pool, + min(iomem_resource.end + 1 - SZ_64G, + mappable.end + 1 - SZ_64G), SZ_64G, NUMA_NO_NODE); if (rc) goto err_gen_pool_add; From 5af29a583a17f9699b2a6de5e8148e8349d99a46 Mon Sep 17 00:00:00 2001 From: Alok Tiwari Date: Mon, 9 Jun 2025 10:10:48 -0700 Subject: [PATCH 04/28] Documentation: cxl: fix typos and improve clarity in memory-devices.rst This patch corrects several typographical issues and improves phrasing in memory-devices.rst: - Fixes duplicate word ("1 one") and adjusts phrasing for clarity. - Adds missing hyphen in "on-device". - Corrects "a give memory device" to "a given memory device". - fix singular/plural "decoder resource" -> "decoder resources". - Clarifies "spans to Host Bridges" -> "spans two Host Bridges". - change "at a" -> "a" These changes improve readability and accuracy of the documentation. Signed-off-by: Alok Tiwari Reviewed-by: Randy Dunlap Reviewed-by: Gregory Price Reviewed-by: Jonathan Cameron Link: https://patch.msgid.link/20250609171130.2375901-1-alok.a.tiwari@oracle.com Signed-off-by: Dave Jiang --- Documentation/driver-api/cxl/theory-of-operation.rst | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/Documentation/driver-api/cxl/theory-of-operation.rst b/Documentation/driver-api/cxl/theory-of-operation.rst index 40793dad3630..257f513e320c 100644 --- a/Documentation/driver-api/cxl/theory-of-operation.rst +++ b/Documentation/driver-api/cxl/theory-of-operation.rst @@ -29,8 +29,8 @@ Platform firmware enumerates a menu of interleave options at the "CXL root port" (Linux term for the top of the CXL decode topology). From there, PCIe topology dictates which endpoints can participate in which Host Bridge decode regimes. Each PCIe Switch in the path between the root and an endpoint introduces a point -at which the interleave can be split. For example platform firmware may say at a -given range only decodes to 1 one Host Bridge, but that Host Bridge may in turn +at which the interleave can be split. For example, platform firmware may say a +given range only decodes to one Host Bridge, but that Host Bridge may in turn interleave cycles across multiple Root Ports. An intervening Switch between a port and an endpoint may interleave cycles across multiple Downstream Switch Ports, etc. @@ -187,7 +187,7 @@ decodes them to "ports", "ports" decode to "endpoints", and "endpoints" represent the decode from SPA (System Physical Address) to DPA (Device Physical Address). -Continuing the RAID analogy, disks have both topology metadata and on device +Continuing the RAID analogy, disks have both topology metadata and on-device metadata that determine RAID set assembly. CXL Port topology and CXL Port link status is metadata for CXL.mem set assembly. The CXL Port topology is enumerated by the arrival of a CXL.mem device. I.e. unless and until the PCIe core attaches @@ -197,7 +197,7 @@ the Linux PCI core to tear down switch-level CXL resources because the endpoint ->remove() event cleans up the port data that was established to support that Memory Expander. -The port metadata and potential decode schemes that a give memory device may +The port metadata and potential decode schemes that a given memory device may participate can be determined via a command like:: # cxl list -BDMu -d root -m mem3 @@ -249,8 +249,8 @@ participate can be determined via a command like:: ...which queries the CXL topology to ask "given CXL Memory Expander with a kernel device name of 'mem3' which platform level decode ranges may this device participate". A given expander can participate in multiple CXL.mem interleave -sets simultaneously depending on how many decoder resource it has. In this -example mem3 can participate in one or more of a PMEM interleave that spans to +sets simultaneously depending on how many decoder resources it has. In this +example mem3 can participate in one or more of a PMEM interleave that spans two Host Bridges, a PMEM interleave that targets a single Host Bridge, a Volatile memory interleave that spans 2 Host Bridges, and a Volatile memory interleave that only targets a single Host Bridge. From 7d14230db8a76c776985d510b9f27f66aedc7b14 Mon Sep 17 00:00:00 2001 From: Nai-Chen Cheng Date: Wed, 11 Jun 2025 01:31:52 +0800 Subject: [PATCH 05/28] Documentation: fix typo in CXL driver documentation Fix typo 'enumates' to 'enumerate' in CXL driver operation documentation to improve readability. Signed-off-by: Nai-Chen Cheng Reviewed-by: Jonathan Cameron Reviewed-by: Li Ming Reviewed-by: Dave Jiang Link: https://patch.msgid.link/20250610173152.33566-1-bleach1827@gmail.com Signed-off-by: Dave Jiang --- Documentation/driver-api/cxl/linux/cxl-driver.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/driver-api/cxl/linux/cxl-driver.rst b/Documentation/driver-api/cxl/linux/cxl-driver.rst index 9759e90c3cf1..dd6dd17dc536 100644 --- a/Documentation/driver-api/cxl/linux/cxl-driver.rst +++ b/Documentation/driver-api/cxl/linux/cxl-driver.rst @@ -20,7 +20,7 @@ The CXL driver is split into a number of drivers. * cxl_port - initializes root and provides port enumeration interface. * cxl_acpi - initializes root decoders and interacts with ACPI data. * cxl_p/mem - initializes memory devices -* cxl_pci - uses cxl_port to enumates the actual fabric hierarchy. +* cxl_pci - uses cxl_port to enumerate the actual fabric hierarchy. Driver Devices ============== From 8ad85794be61e046697df8305de34a49791d2ed1 Mon Sep 17 00:00:00 2001 From: Alok Tiwari Date: Sun, 15 Jun 2025 23:07:32 -0700 Subject: [PATCH 06/28] cxl: docs/devices Fix typos and clarify wording in device-types.rst Fix several typos and improve comment clarity in the CXL device types docs: "w/" replaced with "with" "sill" -> "still" "The allows" -> "This allows" "capacity" corrected to "capable" "more devices" corrected to "more upstream devices" in MLD description These changes improve readability and enhance the documentation quality. [ dj: Fix up "one or more hosts" to "one or more upstream devices" from Gregory ] Signed-off-by: Alok Tiwari Reviewed-by: Gregory Price Link: https://patch.msgid.link/20250616060737.1645393-1-alok.a.tiwari@oracle.com Signed-off-by: Dave Jiang --- Documentation/driver-api/cxl/devices/device-types.rst | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/Documentation/driver-api/cxl/devices/device-types.rst b/Documentation/driver-api/cxl/devices/device-types.rst index f5e4330c1cfe..923f5d89bc04 100644 --- a/Documentation/driver-api/cxl/devices/device-types.rst +++ b/Documentation/driver-api/cxl/devices/device-types.rst @@ -63,13 +63,13 @@ A Type-2 CXL Device: * Supports cxl.io, cxl.cache, and cxl.mem protocols * Optionally implements coherent cache and Host-Managed Device Memory -* Is typically an accelerator device w/ high bandwidth memory. +* Is typically an accelerator device with high bandwidth memory. The primary difference between a type-1 and type-2 device is the presence of host-managed device memory, which allows the device to operate on a -local memory bank - while the CPU sill has coherent DMA to the same memory. +local memory bank - while the CPU still has coherent DMA to the same memory. -The allows things like GPUs to expose their memory via DAX devices or file +This allows things like GPUs to expose their memory via DAX devices or file descriptors, allows drivers and programs direct access to device memory rather than use block-transfer semantics. @@ -89,7 +89,7 @@ basic coherent DMA. Switch ------ -A CXL switch is a device capacity of routing any CXL (and by extension, PCIe) +A CXL switch is a device capable of routing any CXL (and by extension, PCIe) protocol between an upstream, downstream, or peer devices. Many devices, such as Multi-Logical Devices, imply the presence of switching in some manner. @@ -103,7 +103,7 @@ A Single-Logical Device (SLD) is a device which presents a single device to one or more heads. A Multi-Logical Device (MLD) is a device which may present multiple devices -to one or more devices. +to one or more upstream devices. A Single-Headed Device exposes only a single physical connection. From d7b9056c3a6c58d41074b7ba19ab7fd34ce9f63e Mon Sep 17 00:00:00 2001 From: Alok Tiwari Date: Sun, 22 Jun 2025 11:39:16 -0700 Subject: [PATCH 07/28] cxl/edac: Use correct format specifier for u32 val The dev_dbg() message in cxl_set_ecs_threshold() used %d for an unsigned value, which could lead to incorrect logging. Update the format specifier to %u to match variable type. Signed-off-by: Alok Tiwari Reviewed-by: Shiju Jose Reviewed-by: Alison Schofield Reviewed-by: Ira Weiny Link: https://patch.msgid.link/20250622183919.4156343-1-alok.a.tiwari@oracle.com Signed-off-by: Dave Jiang --- drivers/cxl/core/edac.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c index 623aaa4439c4..cd3873750e78 100644 --- a/drivers/cxl/core/edac.c +++ b/drivers/cxl/core/edac.c @@ -697,7 +697,7 @@ static int cxl_set_ecs_threshold(struct device *dev, u8 *log_cap, u16 *config, ECS_THRESHOLD_IDX_4096); break; default: - dev_dbg(dev, "Invalid CXL ECS threshold count(%d) to set\n", + dev_dbg(dev, "Invalid CXL ECS threshold count(%u) to set\n", val); dev_dbg(dev, "Supported ECS threshold counts: %u, %u, %u\n", ECS_THRESHOLD_256, ECS_THRESHOLD_1024, From ac0fe6a5731700bcea6fecfd5d0b76c0454b3a20 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Tue, 1 Jul 2025 14:07:39 +0200 Subject: [PATCH 08/28] cxl: make cxl_bus_type constant Now that the driver core can properly handle constant struct bus_type, move the cxl_bus_type variable to be a constant structure as well, placing it into read-only memory which can not be modified at runtime. Cc: Davidlohr Bueso Cc: Jonathan Cameron Cc: Dave Jiang Cc: Alison Schofield Cc: Vishal Verma Cc: Ira Weiny Cc: Dan Williams Cc: linux-cxl@vger.kernel.org Signed-off-by: Greg Kroah-Hartman Reviewed-by: Ira Weiny Reviewed-by: Jonathan Cameron Reviewed-by: Dave Jiang Reviewed-by: Greg Kroah-Hartman Link: https://patch.msgid.link/2025070138-vigorous-negative-eae7@gregkh Signed-off-by: Dave Jiang --- drivers/cxl/core/port.c | 2 +- drivers/cxl/cxl.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index eb46c6764d20..0696f7fcef56 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -2293,7 +2293,7 @@ static const struct attribute_group *cxl_bus_attribute_groups[] = { NULL, }; -struct bus_type cxl_bus_type = { +const struct bus_type cxl_bus_type = { .name = "cxl", .uevent = cxl_bus_uevent, .match = cxl_bus_match, diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 3f1695c96abc..e7b66ca1d423 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -815,7 +815,7 @@ int cxl_dvsec_rr_decode(struct cxl_dev_state *cxlds, bool is_cxl_region(struct device *dev); -extern struct bus_type cxl_bus_type; +extern const struct bus_type cxl_bus_type; struct cxl_driver { const char *name; From 5b6031c832c2747d58d3f0130098d965ef050b9a Mon Sep 17 00:00:00 2001 From: Li Ming Date: Fri, 11 Jul 2025 11:23:55 +0800 Subject: [PATCH 09/28] cxl/core: Introduce a new helper cxl_resource_contains_addr() In CXL subsystem, many functions need to check an address availability by checking if the resource range contains the address. Providing a new helper function cxl_resource_contains_addr() to check if the resource range contains the input address. Suggested-by: Alison Schofield Signed-off-by: Li Ming Tested-by: Shiju Jose Reviewed-by: Andy Shevchenko Reviewed-by: Dave Jiang Reviewed-by: Jonathan Cameron Reviewed-by: Alison Schofield Link: https://patch.msgid.link/20250711032357.127355-2-ming.li@zohomail.com Signed-off-by: Dave Jiang --- drivers/cxl/core/core.h | 1 + drivers/cxl/core/hdm.c | 7 +++++++ 2 files changed, 8 insertions(+) diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h index 29b61828a847..6b78b10da3e1 100644 --- a/drivers/cxl/core/core.h +++ b/drivers/cxl/core/core.h @@ -80,6 +80,7 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, u64 size); int cxl_dpa_free(struct cxl_endpoint_decoder *cxled); resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled); resource_size_t cxl_dpa_resource_start(struct cxl_endpoint_decoder *cxled); +bool cxl_resource_contains_addr(const struct resource *res, const resource_size_t addr); enum cxl_rcrb { CXL_RCRB_DOWNSTREAM, diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index ab1007495f6b..088caa6b6f74 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -547,6 +547,13 @@ resource_size_t cxl_dpa_resource_start(struct cxl_endpoint_decoder *cxled) return base; } +bool cxl_resource_contains_addr(const struct resource *res, const resource_size_t addr) +{ + struct resource _addr = DEFINE_RES_MEM(addr, 1); + + return resource_contains(res, &_addr); +} + int cxl_dpa_free(struct cxl_endpoint_decoder *cxled) { struct cxl_port *port = cxled_to_port(cxled); From 03ff65c02559e8da32be231d7f10fe899233ceae Mon Sep 17 00:00:00 2001 From: Li Ming Date: Fri, 11 Jul 2025 11:23:56 +0800 Subject: [PATCH 10/28] cxl/edac: Fix wrong dpa checking for PPR operation Per Table 8-143. "Get Partition Info Output Payload" in CXL r3.2 section 8.2.10.9.2.1 "Get Partition Info(Opcode 4100h)", DPA 0 is a valid address of a CXL device. However, cxl_do_ppr() considers it as an invalid address, so that user will get an -EINVAL when user calls the sysfs interface of the edac driver to trigger a Post Package Repair(PPR) operation for DPA 0 on a CXL device. The correct implementation should be checking if the input DPA is in the DPA range of the CXL device. Fixes: be9b359e056a ("cxl/edac: Add CXL memory device soft PPR control feature") Signed-off-by: Li Ming Tested-by: Shiju Jose Reviewed-by: Shiju Jose Reviewed-by: Dave Jiang Reviewed-by: Alison Schofield Reviewed-by: Jonathan Cameron Link: https://patch.msgid.link/20250711032357.127355-3-ming.li@zohomail.com Signed-off-by: Dave Jiang --- drivers/cxl/core/edac.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c index cd3873750e78..1526d388740c 100644 --- a/drivers/cxl/core/edac.c +++ b/drivers/cxl/core/edac.c @@ -1923,8 +1923,11 @@ static int cxl_ppr_set_nibble_mask(struct device *dev, void *drv_data, static int cxl_do_ppr(struct device *dev, void *drv_data, u32 val) { struct cxl_ppr_context *cxl_ppr_ctx = drv_data; + struct cxl_memdev *cxlmd = cxl_ppr_ctx->cxlmd; + struct cxl_dev_state *cxlds = cxlmd->cxlds; - if (!cxl_ppr_ctx->dpa || val != EDAC_DO_MEM_REPAIR) + if (val != EDAC_DO_MEM_REPAIR || + !cxl_resource_contains_addr(&cxlds->dpa_res, cxl_ppr_ctx->dpa)) return -EINVAL; return cxl_mem_perform_ppr(cxl_ppr_ctx); From bdf2d9fd3a86538b8c7368989248b857b5f1bcf1 Mon Sep 17 00:00:00 2001 From: Li Ming Date: Fri, 11 Jul 2025 11:23:57 +0800 Subject: [PATCH 11/28] cxl/core: Using cxl_resource_contains_addr() to check address availability Helper function cxl_resource_contains_addr() can be used to check if a resource range contains an input address. Use it to replace all code that checks whether a resource range contains a DPA/HPA/SPA. Signed-off-by: Li Ming Reviewed-by: Dave Jiang Reviewed-by: Alison Schofield Reviewed-by: Jonathan Cameron Link: https://patch.msgid.link/20250711032357.127355-4-ming.li@zohomail.com Signed-off-by: Dave Jiang --- drivers/cxl/core/edac.c | 4 ++-- drivers/cxl/core/memdev.c | 2 +- drivers/cxl/core/region.c | 6 +++--- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c index 1526d388740c..520121901353 100644 --- a/drivers/cxl/core/edac.c +++ b/drivers/cxl/core/edac.c @@ -1523,7 +1523,7 @@ static int cxl_mem_sparing_set_dpa(struct device *dev, void *drv_data, u64 dpa) struct cxl_memdev *cxlmd = ctx->cxlmd; struct cxl_dev_state *cxlds = cxlmd->cxlds; - if (dpa < cxlds->dpa_res.start || dpa > cxlds->dpa_res.end) + if (!cxl_resource_contains_addr(&cxlds->dpa_res, dpa)) return -EINVAL; ctx->dpa = dpa; @@ -1892,7 +1892,7 @@ static int cxl_ppr_set_dpa(struct device *dev, void *drv_data, u64 dpa) struct cxl_memdev *cxlmd = cxl_ppr_ctx->cxlmd; struct cxl_dev_state *cxlds = cxlmd->cxlds; - if (dpa < cxlds->dpa_res.start || dpa > cxlds->dpa_res.end) + if (!cxl_resource_contains_addr(&cxlds->dpa_res, dpa)) return -EINVAL; cxl_ppr_ctx->dpa = dpa; diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c index f88a13adf7fa..769bd9be8b94 100644 --- a/drivers/cxl/core/memdev.c +++ b/drivers/cxl/core/memdev.c @@ -267,7 +267,7 @@ static int cxl_validate_poison_dpa(struct cxl_memdev *cxlmd, u64 dpa) dev_dbg(cxlds->dev, "device has no dpa resource\n"); return -EINVAL; } - if (dpa < cxlds->dpa_res.start || dpa > cxlds->dpa_res.end) { + if (!cxl_resource_contains_addr(&cxlds->dpa_res, dpa)) { dev_dbg(cxlds->dev, "dpa:0x%llx not in resource:%pR\n", dpa, &cxlds->dpa_res); return -EINVAL; diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 6e5e1460068d..91ff3a495fbd 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -2847,7 +2847,7 @@ static int __cxl_dpa_to_region(struct device *dev, void *arg) if (!cxled || !cxled->dpa_res || !resource_size(cxled->dpa_res)) return 0; - if (dpa > cxled->dpa_res->end || dpa < cxled->dpa_res->start) + if (!cxl_resource_contains_addr(cxled->dpa_res, dpa)) return 0; /* @@ -2959,7 +2959,7 @@ u64 cxl_dpa_to_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd, if (cxlrd->hpa_to_spa) hpa = cxlrd->hpa_to_spa(cxlrd, hpa); - if (hpa < p->res->start || hpa > p->res->end) { + if (!cxl_resource_contains_addr(p->res, hpa)) { dev_dbg(&cxlr->dev, "Addr trans fail: hpa 0x%llx not in region\n", hpa); return ULLONG_MAX; @@ -3499,7 +3499,7 @@ u64 cxl_port_get_spa_cache_alias(struct cxl_port *endpoint, u64 spa) xa_for_each(&endpoint->regions, index, iter) { struct cxl_region_params *p = &iter->region->params; - if (p->res->start <= spa && spa <= p->res->end) { + if (cxl_resource_contains_addr(p->res, spa)) { if (!p->cache_size) return ~0ULL; From 12b3d697c812aaf356e82d9e1f351fbb2ea97500 Mon Sep 17 00:00:00 2001 From: Robert Richter Date: Fri, 11 Jul 2025 17:15:27 +0200 Subject: [PATCH 12/28] cxl: Remove core/acpi.c and cxl core dependency on ACPI From Dave [1]: """ It was a mistake to introduce core/acpi.c and putting ACPI dependency on cxl_core when adding the extended linear cache support. """ Current implementation calls hmat_get_extended_linear_cache_size() of the ACPI subsystem. That external reference causes issue running cxl_test as there is no way to "mock" that function and ignore it when using cxl test. Instead of working around that using cxlrd ops and extensively expanding cxl_test code [1], just move HMAT calls out of the core module to cxl_acpi. Implement this by adding a @cache_size member to struct cxl_root_decoder. During initialization the cache size is determined and added to the root decoder object in cxl_acpi. Later on in cxl_core the cache_size parameter is used to setup extended linear caching. [1] https://patch.msgid.link/20250610172938.139428-1-dave.jiang@intel.com [ dj: Remove core/acpi.o from tools/testing/cxl/Kbuild ] [ dj: Add kdoc for cxlrd->cache_size ] Cc: Dave Jiang Signed-off-by: Robert Richter Reviewed-by: Alison Schofield Reviewed-by: Jonathan Cameron Reviewed-by: Dave Jiang Link: https://patch.msgid.link/20250711151529.787470-1-rrichter@amd.com Signed-off-by: Dave Jiang --- drivers/cxl/acpi.c | 59 +++++++++++++++++++++++++++++++++++++++ drivers/cxl/core/Makefile | 1 - drivers/cxl/core/acpi.c | 11 -------- drivers/cxl/core/core.h | 2 -- drivers/cxl/core/region.c | 7 +---- drivers/cxl/cxl.h | 2 ++ tools/testing/cxl/Kbuild | 1 - 7 files changed, 62 insertions(+), 21 deletions(-) delete mode 100644 drivers/cxl/core/acpi.c diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c index a1a99ec3f12c..712624cba2b6 100644 --- a/drivers/cxl/acpi.c +++ b/drivers/cxl/acpi.c @@ -335,6 +335,63 @@ static int add_or_reset_cxl_resource(struct resource *parent, struct resource *r return rc; } +static int cxl_acpi_set_cache_size(struct cxl_root_decoder *cxlrd) +{ + struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld; + struct range *hpa = &cxld->hpa_range; + resource_size_t size = range_len(hpa); + resource_size_t start = hpa->start; + resource_size_t cache_size; + struct resource res; + int nid, rc; + + res = DEFINE_RES(start, size, 0); + nid = phys_to_target_node(start); + + rc = hmat_get_extended_linear_cache_size(&res, nid, &cache_size); + if (rc) + return rc; + + /* + * The cache range is expected to be within the CFMWS. + * Currently there is only support cache_size == cxl_size. CXL + * size is then half of the total CFMWS window size. + */ + size = size >> 1; + if (cache_size && size != cache_size) { + dev_warn(&cxld->dev, + "Extended Linear Cache size %pa != CXL size %pa. No Support!", + &cache_size, &size); + return -ENXIO; + } + + cxlrd->cache_size = cache_size; + + return 0; +} + +static void cxl_setup_extended_linear_cache(struct cxl_root_decoder *cxlrd) +{ + int rc; + + rc = cxl_acpi_set_cache_size(cxlrd); + if (!rc) + return; + + if (rc != -EOPNOTSUPP) { + /* + * Failing to support extended linear cache region resize does not + * prevent the region from functioning. Only causes cxl list showing + * incorrect region size. + */ + dev_warn(cxlrd->cxlsd.cxld.dev.parent, + "Extended linear cache calculation failed rc:%d\n", rc); + } + + /* Ignoring return code */ + cxlrd->cache_size = 0; +} + DEFINE_FREE(put_cxlrd, struct cxl_root_decoder *, if (!IS_ERR_OR_NULL(_T)) put_device(&_T->cxlsd.cxld.dev)) DEFINE_FREE(del_cxl_resource, struct resource *, if (_T) del_cxl_resource(_T)) @@ -394,6 +451,8 @@ static int __cxl_parse_cfmws(struct acpi_cedt_cfmws *cfmws, ig = CXL_DECODER_MIN_GRANULARITY; cxld->interleave_granularity = ig; + cxl_setup_extended_linear_cache(cxlrd); + if (cfmws->interleave_arithmetic == ACPI_CEDT_CFMWS_ARITHMETIC_XOR) { if (ways != 1 && ways != 3) { cxims_ctx = (struct cxl_cxims_context) { diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile index 79e2ef81fde8..5ad8fef210b5 100644 --- a/drivers/cxl/core/Makefile +++ b/drivers/cxl/core/Makefile @@ -15,7 +15,6 @@ cxl_core-y += hdm.o cxl_core-y += pmu.o cxl_core-y += cdat.o cxl_core-y += ras.o -cxl_core-y += acpi.o cxl_core-$(CONFIG_TRACING) += trace.o cxl_core-$(CONFIG_CXL_REGION) += region.o cxl_core-$(CONFIG_CXL_MCE) += mce.o diff --git a/drivers/cxl/core/acpi.c b/drivers/cxl/core/acpi.c deleted file mode 100644 index f13b4dae6ac5..000000000000 --- a/drivers/cxl/core/acpi.c +++ /dev/null @@ -1,11 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-only -/* Copyright(c) 2024 Intel Corporation. All rights reserved. */ -#include -#include "cxl.h" -#include "core.h" - -int cxl_acpi_get_extended_linear_cache_size(struct resource *backing_res, - int nid, resource_size_t *size) -{ - return hmat_get_extended_linear_cache_size(backing_res, nid, size); -} diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h index 6b78b10da3e1..2250c05cecc3 100644 --- a/drivers/cxl/core/core.h +++ b/drivers/cxl/core/core.h @@ -121,8 +121,6 @@ int cxl_port_get_switch_dport_bandwidth(struct cxl_port *port, int cxl_ras_init(void); void cxl_ras_exit(void); int cxl_gpf_port_setup(struct cxl_dport *dport); -int cxl_acpi_get_extended_linear_cache_size(struct resource *backing_res, - int nid, resource_size_t *size); #ifdef CONFIG_CXL_FEATURES struct cxl_feat_entry * diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 91ff3a495fbd..08ac7f483562 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -3282,15 +3282,10 @@ static int cxl_extended_linear_cache_resize(struct cxl_region *cxlr, { struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent); struct cxl_region_params *p = &cxlr->params; - int nid = phys_to_target_node(res->start); resource_size_t size = resource_size(res); resource_size_t cache_size, start; - int rc; - - rc = cxl_acpi_get_extended_linear_cache_size(res, nid, &cache_size); - if (rc) - return rc; + cache_size = cxlrd->cache_size; if (!cache_size) return 0; diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index e7b66ca1d423..0730f92df038 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -423,6 +423,7 @@ typedef u64 (*cxl_hpa_to_spa_fn)(struct cxl_root_decoder *cxlrd, u64 hpa); /** * struct cxl_root_decoder - Static platform CXL address decoder * @res: host / parent resource for region allocations + * @cache_size: extended linear cache size if exists, otherwise zero. * @region_id: region id for next region provisioning event * @hpa_to_spa: translate CXL host-physical-address to Platform system-physical-address * @platform_data: platform specific configuration data @@ -432,6 +433,7 @@ typedef u64 (*cxl_hpa_to_spa_fn)(struct cxl_root_decoder *cxlrd, u64 hpa); */ struct cxl_root_decoder { struct resource *res; + resource_size_t cache_size; atomic_t region_id; cxl_hpa_to_spa_fn hpa_to_spa; void *platform_data; diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild index 31a2d73c963f..d07f14cb7aa4 100644 --- a/tools/testing/cxl/Kbuild +++ b/tools/testing/cxl/Kbuild @@ -62,7 +62,6 @@ cxl_core-y += $(CXL_CORE_SRC)/hdm.o cxl_core-y += $(CXL_CORE_SRC)/pmu.o cxl_core-y += $(CXL_CORE_SRC)/cdat.o cxl_core-y += $(CXL_CORE_SRC)/ras.o -cxl_core-y += $(CXL_CORE_SRC)/acpi.o cxl_core-$(CONFIG_TRACING) += $(CXL_CORE_SRC)/trace.o cxl_core-$(CONFIG_CXL_REGION) += $(CXL_CORE_SRC)/region.o cxl_core-$(CONFIG_CXL_MCE) += $(CXL_CORE_SRC)/mce.o From 857d18f23ab17284d1b6de6f61f4e74958596376 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Fri, 11 Jul 2025 16:49:25 -0700 Subject: [PATCH 13/28] cleanup: Introduce ACQUIRE() and ACQUIRE_ERR() for conditional locks scoped_cond_guard(), automatic cleanup for conditional locks, has a couple pain points: * It causes existing straight-line code to be re-indented into a new bracketed scope. While this can be mitigated by a new helper function to contain the scope, that is not always a comfortable conversion. * The return code from the conditional lock is tossed in favor of a scheme to pass a 'return err;' statement to the macro. Other attempts to clean this up, to behave more like guard() [1], got hung up trying to both establish and evaluate the conditional lock in one statement. ACQUIRE() solves this by reflecting the result of the condition in the automatic variable established by the lock CLASS(). The result is separately retrieved with the ACQUIRE_ERR() helper, effectively a PTR_ERR() operation. Link: http://lore.kernel.org/all/Z1LBnX9TpZLR5Dkf@gmail.com [1] Link: http://patch.msgid.link/20250512105026.GP4439@noisy.programming.kicks-ass.net Link: http://patch.msgid.link/20250512185817.GA1808@noisy.programming.kicks-ass.net Cc: Ingo Molnar Cc: Linus Torvalds Cc: David Lechner Cc: Fabio M. De Francesco Signed-off-by: Peter Zijlstra (Intel) [djbw: wrap Peter's proposal with changelog and comments] Co-developed-by: Dan Williams Signed-off-by: Dan Williams Reviewed-by: Jonathan Cameron Link: https://patch.msgid.link/20250711234932.671292-2-dan.j.williams@intel.com Signed-off-by: Dave Jiang --- include/linux/cleanup.h | 95 +++++++++++++++++++++++++++++++++++------ include/linux/mutex.h | 2 +- include/linux/rwsem.h | 2 +- 3 files changed, 83 insertions(+), 16 deletions(-) diff --git a/include/linux/cleanup.h b/include/linux/cleanup.h index 7093e1d08af0..4eb83dd71cfe 100644 --- a/include/linux/cleanup.h +++ b/include/linux/cleanup.h @@ -3,6 +3,8 @@ #define _LINUX_CLEANUP_H #include +#include +#include /** * DOC: scope-based cleanup helpers @@ -61,9 +63,21 @@ * Observe the lock is held for the remainder of the "if ()" block not * the remainder of "func()". * - * Now, when a function uses both __free() and guard(), or multiple - * instances of __free(), the LIFO order of variable definition order - * matters. GCC documentation says: + * The ACQUIRE() macro can be used in all places that guard() can be + * used and additionally support conditional locks + * + * + * DEFINE_GUARD_COND(pci_dev, _try, pci_dev_trylock(_T)) + * ... + * ACQUIRE(pci_dev_try, lock)(dev); + * rc = ACQUIRE_ERR(pci_dev_try, &lock); + * if (rc) + * return rc; + * // @lock is held + * + * Now, when a function uses both __free() and guard()/ACQUIRE(), or + * multiple instances of __free(), the LIFO order of variable definition + * order matters. GCC documentation says: * * "When multiple variables in the same scope have cleanup attributes, * at exit from the scope their associated cleanup functions are run in @@ -305,14 +319,46 @@ static inline class_##_name##_t class_##_name##ext##_constructor(_init_args) \ * acquire fails. * * Only for conditional locks. + * + * ACQUIRE(name, var): + * a named instance of the (guard) class, suitable for conditional + * locks when paired with ACQUIRE_ERR(). + * + * ACQUIRE_ERR(name, &var): + * a helper that is effectively a PTR_ERR() conversion of the guard + * pointer. Returns 0 when the lock was acquired and a negative + * error code otherwise. */ #define __DEFINE_CLASS_IS_CONDITIONAL(_name, _is_cond) \ static __maybe_unused const bool class_##_name##_is_conditional = _is_cond -#define __DEFINE_GUARD_LOCK_PTR(_name, _exp) \ - static inline void * class_##_name##_lock_ptr(class_##_name##_t *_T) \ - { return (void *)(__force unsigned long)*(_exp); } +#define __GUARD_IS_ERR(_ptr) \ + ({ \ + unsigned long _rc = (__force unsigned long)(_ptr); \ + unlikely((_rc - 1) >= -MAX_ERRNO - 1); \ + }) + +#define __DEFINE_GUARD_LOCK_PTR(_name, _exp) \ + static inline void *class_##_name##_lock_ptr(class_##_name##_t *_T) \ + { \ + void *_ptr = (void *)(__force unsigned long)*(_exp); \ + if (IS_ERR(_ptr)) { \ + _ptr = NULL; \ + } \ + return _ptr; \ + } \ + static inline int class_##_name##_lock_err(class_##_name##_t *_T) \ + { \ + long _rc = (__force unsigned long)*(_exp); \ + if (!_rc) { \ + _rc = -EBUSY; \ + } \ + if (!IS_ERR_VALUE(_rc)) { \ + _rc = 0; \ + } \ + return _rc; \ + } #define DEFINE_CLASS_IS_GUARD(_name) \ __DEFINE_CLASS_IS_CONDITIONAL(_name, false); \ @@ -323,23 +369,37 @@ static __maybe_unused const bool class_##_name##_is_conditional = _is_cond __DEFINE_GUARD_LOCK_PTR(_name, _T) #define DEFINE_GUARD(_name, _type, _lock, _unlock) \ - DEFINE_CLASS(_name, _type, if (_T) { _unlock; }, ({ _lock; _T; }), _type _T); \ + DEFINE_CLASS(_name, _type, if (!__GUARD_IS_ERR(_T)) { _unlock; }, ({ _lock; _T; }), _type _T); \ DEFINE_CLASS_IS_GUARD(_name) -#define DEFINE_GUARD_COND(_name, _ext, _condlock) \ +#define DEFINE_GUARD_COND_4(_name, _ext, _lock, _cond) \ __DEFINE_CLASS_IS_CONDITIONAL(_name##_ext, true); \ EXTEND_CLASS(_name, _ext, \ - ({ void *_t = _T; if (_T && !(_condlock)) _t = NULL; _t; }), \ + ({ void *_t = _T; int _RET = (_lock); if (_T && !(_cond)) _t = ERR_PTR(_RET); _t; }), \ class_##_name##_t _T) \ static inline void * class_##_name##_ext##_lock_ptr(class_##_name##_t *_T) \ - { return class_##_name##_lock_ptr(_T); } + { return class_##_name##_lock_ptr(_T); } \ + static inline int class_##_name##_ext##_lock_err(class_##_name##_t *_T) \ + { return class_##_name##_lock_err(_T); } + +/* + * Default binary condition; success on 'true'. + */ +#define DEFINE_GUARD_COND_3(_name, _ext, _lock) \ + DEFINE_GUARD_COND_4(_name, _ext, _lock, _RET) + +#define DEFINE_GUARD_COND(X...) CONCATENATE(DEFINE_GUARD_COND_, COUNT_ARGS(X))(X) #define guard(_name) \ CLASS(_name, __UNIQUE_ID(guard)) #define __guard_ptr(_name) class_##_name##_lock_ptr +#define __guard_err(_name) class_##_name##_lock_err #define __is_cond_ptr(_name) class_##_name##_is_conditional +#define ACQUIRE(_name, _var) CLASS(_name, _var) +#define ACQUIRE_ERR(_name, _var) __guard_err(_name)(_var) + /* * Helper macro for scoped_guard(). * @@ -401,7 +461,7 @@ typedef struct { \ \ static inline void class_##_name##_destructor(class_##_name##_t *_T) \ { \ - if (_T->lock) { _unlock; } \ + if (!__GUARD_IS_ERR(_T->lock)) { _unlock; } \ } \ \ __DEFINE_GUARD_LOCK_PTR(_name, &_T->lock) @@ -433,15 +493,22 @@ __DEFINE_CLASS_IS_CONDITIONAL(_name, false); \ __DEFINE_UNLOCK_GUARD(_name, void, _unlock, __VA_ARGS__) \ __DEFINE_LOCK_GUARD_0(_name, _lock) -#define DEFINE_LOCK_GUARD_1_COND(_name, _ext, _condlock) \ +#define DEFINE_LOCK_GUARD_1_COND_4(_name, _ext, _lock, _cond) \ __DEFINE_CLASS_IS_CONDITIONAL(_name##_ext, true); \ EXTEND_CLASS(_name, _ext, \ ({ class_##_name##_t _t = { .lock = l }, *_T = &_t;\ - if (_T->lock && !(_condlock)) _T->lock = NULL; \ + int _RET = (_lock); \ + if (_T->lock && !(_cond)) _T->lock = ERR_PTR(_RET);\ _t; }), \ typeof_member(class_##_name##_t, lock) l) \ static inline void * class_##_name##_ext##_lock_ptr(class_##_name##_t *_T) \ - { return class_##_name##_lock_ptr(_T); } + { return class_##_name##_lock_ptr(_T); } \ + static inline int class_##_name##_ext##_lock_err(class_##_name##_t *_T) \ + { return class_##_name##_lock_err(_T); } +#define DEFINE_LOCK_GUARD_1_COND_3(_name, _ext, _lock) \ + DEFINE_LOCK_GUARD_1_COND_4(_name, _ext, _lock, _RET) + +#define DEFINE_LOCK_GUARD_1_COND(X...) CONCATENATE(DEFINE_LOCK_GUARD_1_COND_, COUNT_ARGS(X))(X) #endif /* _LINUX_CLEANUP_H */ diff --git a/include/linux/mutex.h b/include/linux/mutex.h index a039fa8c1780..9d5d7ed5c101 100644 --- a/include/linux/mutex.h +++ b/include/linux/mutex.h @@ -224,7 +224,7 @@ extern int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock); DEFINE_GUARD(mutex, struct mutex *, mutex_lock(_T), mutex_unlock(_T)) DEFINE_GUARD_COND(mutex, _try, mutex_trylock(_T)) -DEFINE_GUARD_COND(mutex, _intr, mutex_lock_interruptible(_T) == 0) +DEFINE_GUARD_COND(mutex, _intr, mutex_lock_interruptible(_T), _RET == 0) extern unsigned long mutex_get_owner(struct mutex *lock); diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h index c8b543d428b0..c810deb88d13 100644 --- a/include/linux/rwsem.h +++ b/include/linux/rwsem.h @@ -240,7 +240,7 @@ extern void up_write(struct rw_semaphore *sem); DEFINE_GUARD(rwsem_read, struct rw_semaphore *, down_read(_T), up_read(_T)) DEFINE_GUARD_COND(rwsem_read, _try, down_read_trylock(_T)) -DEFINE_GUARD_COND(rwsem_read, _intr, down_read_interruptible(_T) == 0) +DEFINE_GUARD_COND(rwsem_read, _intr, down_read_interruptible(_T), _RET == 0) DEFINE_GUARD(rwsem_write, struct rw_semaphore *, down_write(_T), up_write(_T)) DEFINE_GUARD_COND(rwsem_write, _try, down_write_trylock(_T)) From 683513084acb978fb7f401b9e4dce7e3866af172 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Fri, 11 Jul 2025 16:49:26 -0700 Subject: [PATCH 14/28] cxl/mbox: Convert poison list mutex to ACQUIRE() Towards removing all explicit unlock calls in the CXL subsystem, convert the conditional poison list mutex to use a conditional lock guard. Rename the lock to have the compiler validate that all existing call sites are converted. Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Davidlohr Bueso Cc: Jonathan Cameron Cc: Dave Jiang Cc: Alison Schofield Cc: Vishal Verma Cc: Ira Weiny Acked-by: Peter Zijlstra (Intel) Reviewed-by: Dave Jiang Reviewed-by: Jonathan Cameron Reviewed-by: Alison Schofield Signed-off-by: Dan Williams Link: https://patch.msgid.link/20250711234932.671292-3-dan.j.williams@intel.com Signed-off-by: Dave Jiang --- drivers/cxl/core/mbox.c | 7 +++---- drivers/cxl/cxlmem.h | 4 ++-- 2 files changed, 5 insertions(+), 6 deletions(-) diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index 2689e6453c5a..81b21effe8cf 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -1401,8 +1401,8 @@ int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len, int nr_records = 0; int rc; - rc = mutex_lock_interruptible(&mds->poison.lock); - if (rc) + ACQUIRE(mutex_intr, lock)(&mds->poison.mutex); + if ((rc = ACQUIRE_ERR(mutex_intr, &lock))) return rc; po = mds->poison.list_out; @@ -1437,7 +1437,6 @@ int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len, } } while (po->flags & CXL_POISON_FLAG_MORE); - mutex_unlock(&mds->poison.lock); return rc; } EXPORT_SYMBOL_NS_GPL(cxl_mem_get_poison, "CXL"); @@ -1473,7 +1472,7 @@ int cxl_poison_state_init(struct cxl_memdev_state *mds) return rc; } - mutex_init(&mds->poison.lock); + mutex_init(&mds->poison.mutex); return 0; } EXPORT_SYMBOL_NS_GPL(cxl_poison_state_init, "CXL"); diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index 551b0ba2caa1..f5b20641e57c 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -254,7 +254,7 @@ enum security_cmd_enabled_bits { * @max_errors: Maximum media error records held in device cache * @enabled_cmds: All poison commands enabled in the CEL * @list_out: The poison list payload returned by device - * @lock: Protect reads of the poison list + * @mutex: Protect reads of the poison list * * Reads of the poison list are synchronized to ensure that a reader * does not get an incomplete list because their request overlapped @@ -265,7 +265,7 @@ struct cxl_poison_state { u32 max_errors; DECLARE_BITMAP(enabled_cmds, CXL_POISON_ENABLED_MAX); struct cxl_mbox_poison_out *list_out; - struct mutex lock; /* Protect reads of poison list */ + struct mutex mutex; /* Protect reads of poison list */ }; /* From 7cb3b42a6bce4e604ca948e6ede543542b49fb54 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Fri, 11 Jul 2025 16:49:27 -0700 Subject: [PATCH 15/28] cxl/decoder: Move decoder register programming to a helper In preparation for converting to rw_semaphore_acquire semantics move the contents of an open-coded {down,up}_read(&cxl_dpa_rwsem) section to a helper function. Cc: Davidlohr Bueso Cc: Jonathan Cameron Cc: Dave Jiang Cc: Alison Schofield Cc: Vishal Verma Cc: Ira Weiny Acked-by: Peter Zijlstra (Intel) Reviewed-by: Dave Jiang Reviewed-by: Jonathan Cameron Reviewed-by: Alison Schofield Signed-off-by: Dan Williams Link: https://patch.msgid.link/20250711234932.671292-4-dan.j.williams@intel.com Signed-off-by: Dave Jiang --- drivers/cxl/core/hdm.c | 79 +++++++++++++++++++++++------------------- 1 file changed, 43 insertions(+), 36 deletions(-) diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index ab1007495f6b..81556d12e9b8 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -764,46 +764,12 @@ static int cxld_await_commit(void __iomem *hdm, int id) return -ETIMEDOUT; } -static int cxl_decoder_commit(struct cxl_decoder *cxld) +static void setup_hw_decoder(struct cxl_decoder *cxld, void __iomem *hdm) { - struct cxl_port *port = to_cxl_port(cxld->dev.parent); - struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev); - void __iomem *hdm = cxlhdm->regs.hdm_decoder; - int id = cxld->id, rc; + int id = cxld->id; u64 base, size; u32 ctrl; - if (cxld->flags & CXL_DECODER_F_ENABLE) - return 0; - - if (cxl_num_decoders_committed(port) != id) { - dev_dbg(&port->dev, - "%s: out of order commit, expected decoder%d.%d\n", - dev_name(&cxld->dev), port->id, - cxl_num_decoders_committed(port)); - return -EBUSY; - } - - /* - * For endpoint decoders hosted on CXL memory devices that - * support the sanitize operation, make sure sanitize is not in-flight. - */ - if (is_endpoint_decoder(&cxld->dev)) { - struct cxl_endpoint_decoder *cxled = - to_cxl_endpoint_decoder(&cxld->dev); - struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); - struct cxl_memdev_state *mds = - to_cxl_memdev_state(cxlmd->cxlds); - - if (mds && mds->security.sanitize_active) { - dev_dbg(&cxlmd->dev, - "attempted to commit %s during sanitize\n", - dev_name(&cxld->dev)); - return -EBUSY; - } - } - - down_read(&cxl_dpa_rwsem); /* common decoder settings */ ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(cxld->id)); cxld_set_interleave(cxld, &ctrl); @@ -837,6 +803,47 @@ static int cxl_decoder_commit(struct cxl_decoder *cxld) } writel(ctrl, hdm + CXL_HDM_DECODER0_CTRL_OFFSET(id)); +} + +static int cxl_decoder_commit(struct cxl_decoder *cxld) +{ + struct cxl_port *port = to_cxl_port(cxld->dev.parent); + struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev); + void __iomem *hdm = cxlhdm->regs.hdm_decoder; + int id = cxld->id, rc; + + if (cxld->flags & CXL_DECODER_F_ENABLE) + return 0; + + if (cxl_num_decoders_committed(port) != id) { + dev_dbg(&port->dev, + "%s: out of order commit, expected decoder%d.%d\n", + dev_name(&cxld->dev), port->id, + cxl_num_decoders_committed(port)); + return -EBUSY; + } + + /* + * For endpoint decoders hosted on CXL memory devices that + * support the sanitize operation, make sure sanitize is not in-flight. + */ + if (is_endpoint_decoder(&cxld->dev)) { + struct cxl_endpoint_decoder *cxled = + to_cxl_endpoint_decoder(&cxld->dev); + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); + struct cxl_memdev_state *mds = + to_cxl_memdev_state(cxlmd->cxlds); + + if (mds && mds->security.sanitize_active) { + dev_dbg(&cxlmd->dev, + "attempted to commit %s during sanitize\n", + dev_name(&cxld->dev)); + return -EBUSY; + } + } + + down_read(&cxl_dpa_rwsem); + setup_hw_decoder(cxld, hdm); up_read(&cxl_dpa_rwsem); port->commit_end++; From 55a89d9c99a9a79a7c2c7cb88c2ae9e86868a60b Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Fri, 11 Jul 2025 16:49:28 -0700 Subject: [PATCH 16/28] cxl/decoder: Drop pointless locking cxl_dpa_rwsem coordinates changes to dpa allocation settings for a given decoder. cxl_decoder_reset() has no need for a consistent snapshot of the dpa settings since it is merely clearing out whatever was there previously. Otherwise, cxl_region_rwsem protects against 'reset' racing 'setup'. In preparation for converting to rw_semaphore_acquire semantics, drop this locking. Cc: Davidlohr Bueso Cc: Jonathan Cameron Cc: Dave Jiang Cc: Alison Schofield Cc: Vishal Verma Cc: Ira Weiny Acked-by: Peter Zijlstra (Intel) Reviewed-by: Dave Jiang Reviewed-by: Jonathan Cameron Reviewed-by: Alison Schofield Reviewed-by: Davidlohr Bueso Signed-off-by: Dan Williams Link: https://patch.msgid.link/20250711234932.671292-5-dan.j.williams@intel.com Signed-off-by: Dave Jiang --- drivers/cxl/core/hdm.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index 81556d12e9b8..e9cb34e30248 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -914,7 +914,6 @@ static void cxl_decoder_reset(struct cxl_decoder *cxld) "%s: out of order reset, expected decoder%d.%d\n", dev_name(&cxld->dev), port->id, port->commit_end); - down_read(&cxl_dpa_rwsem); ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(id)); ctrl &= ~CXL_HDM_DECODER0_CTRL_COMMIT; writel(ctrl, hdm + CXL_HDM_DECODER0_CTRL_OFFSET(id)); @@ -923,7 +922,6 @@ static void cxl_decoder_reset(struct cxl_decoder *cxld) writel(0, hdm + CXL_HDM_DECODER0_SIZE_LOW_OFFSET(id)); writel(0, hdm + CXL_HDM_DECODER0_BASE_HIGH_OFFSET(id)); writel(0, hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(id)); - up_read(&cxl_dpa_rwsem); cxld->flags &= ~CXL_DECODER_F_ENABLE; From a235d7d963e82ac026eca968b71da376534dc9b9 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Fri, 11 Jul 2025 16:49:29 -0700 Subject: [PATCH 17/28] cxl/region: Split commit_store() into __commit() and queue_reset() helpers The complexity of dropping the lock is removed in favor of splitting commit operations to a helper, and leaving all the complexities of "decommit" for commit_store() to coordinate the different locking contexts. The CPU cache-invalidation in the decommit path is solely handled now by cxl_region_decode_reset(). Previously the CPU caches were being needlessly flushed twice in the decommit path where the first flush had no guarantee that the memory would not be immediately re-dirtied. Cc: Davidlohr Bueso Cc: Jonathan Cameron Cc: Dave Jiang Cc: Alison Schofield Cc: Vishal Verma Cc: Ira Weiny Acked-by: Peter Zijlstra (Intel) Reviewed-by: Dave Jiang Reviewed-by: Jonathan Cameron Signed-off-by: Dan Williams Reviewed-by: Fabio M. De Francesco Link: https://patch.msgid.link/20250711234932.671292-6-dan.j.williams@intel.com Signed-off-by: Dave Jiang --- drivers/cxl/core/region.c | 103 ++++++++++++++++++++++++++------------ 1 file changed, 72 insertions(+), 31 deletions(-) diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 6e5e1460068d..3a77aec2c447 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -349,30 +349,42 @@ err: return rc; } -static ssize_t commit_store(struct device *dev, struct device_attribute *attr, - const char *buf, size_t len) +static int queue_reset(struct cxl_region *cxlr) { - struct cxl_region *cxlr = to_cxl_region(dev); struct cxl_region_params *p = &cxlr->params; - bool commit; - ssize_t rc; - - rc = kstrtobool(buf, &commit); - if (rc) - return rc; + int rc; rc = down_write_killable(&cxl_region_rwsem); if (rc) return rc; /* Already in the requested state? */ - if (commit && p->state >= CXL_CONFIG_COMMIT) + if (p->state < CXL_CONFIG_COMMIT) goto out; - if (!commit && p->state < CXL_CONFIG_COMMIT) + + p->state = CXL_CONFIG_RESET_PENDING; + +out: + up_write(&cxl_region_rwsem); + + return rc; +} + +static int __commit(struct cxl_region *cxlr) +{ + struct cxl_region_params *p = &cxlr->params; + int rc; + + rc = down_write_killable(&cxl_region_rwsem); + if (rc) + return rc; + + /* Already in the requested state? */ + if (p->state >= CXL_CONFIG_COMMIT) goto out; /* Not ready to commit? */ - if (commit && p->state < CXL_CONFIG_ACTIVE) { + if (p->state < CXL_CONFIG_ACTIVE) { rc = -ENXIO; goto out; } @@ -385,31 +397,60 @@ static ssize_t commit_store(struct device *dev, struct device_attribute *attr, if (rc) goto out; - if (commit) { - rc = cxl_region_decode_commit(cxlr); - if (rc == 0) - p->state = CXL_CONFIG_COMMIT; - } else { - p->state = CXL_CONFIG_RESET_PENDING; - up_write(&cxl_region_rwsem); - device_release_driver(&cxlr->dev); - down_write(&cxl_region_rwsem); - - /* - * The lock was dropped, so need to revalidate that the reset is - * still pending. - */ - if (p->state == CXL_CONFIG_RESET_PENDING) { - cxl_region_decode_reset(cxlr, p->interleave_ways); - p->state = CXL_CONFIG_ACTIVE; - } - } + rc = cxl_region_decode_commit(cxlr); + if (rc == 0) + p->state = CXL_CONFIG_COMMIT; out: up_write(&cxl_region_rwsem); + return rc; +} + +static ssize_t commit_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t len) +{ + struct cxl_region *cxlr = to_cxl_region(dev); + struct cxl_region_params *p = &cxlr->params; + bool commit; + ssize_t rc; + + rc = kstrtobool(buf, &commit); if (rc) return rc; + + if (commit) { + rc = __commit(cxlr); + if (rc) + return rc; + return len; + } + + rc = queue_reset(cxlr); + if (rc) + return rc; + + /* + * Unmap the region and depend the reset-pending state to ensure + * it does not go active again until post reset + */ + device_release_driver(&cxlr->dev); + + /* + * With the reset pending take cxl_region_rwsem unconditionally + * to ensure the reset gets handled before returning. + */ + guard(rwsem_write)(&cxl_region_rwsem); + + /* + * Revalidate that the reset is still pending in case another + * thread already handled this reset. + */ + if (p->state == CXL_CONFIG_RESET_PENDING) { + cxl_region_decode_reset(cxlr, p->interleave_ways); + p->state = CXL_CONFIG_ACTIVE; + } + return len; } From 695d9455af282056b53baf9782da5bcec3409a57 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Fri, 11 Jul 2025 16:49:30 -0700 Subject: [PATCH 18/28] cxl/region: Move ready-to-probe state check to a helper Rather than unlocking the region rwsem in the middle of cxl_region_probe() create a helper for determining when the region is ready-to-probe. Cc: Davidlohr Bueso Cc: Jonathan Cameron Cc: Dave Jiang Cc: Alison Schofield Cc: Vishal Verma Cc: Ira Weiny Acked-by: Peter Zijlstra (Intel) Reviewed-by: Dave Jiang Signed-off-by: Dan Williams Reviewed-by: Jonathan Cameron Reviewed-by: Fabio M. De Francesco Link: https://patch.msgid.link/20250711234932.671292-7-dan.j.williams@intel.com Signed-off-by: Dave Jiang --- drivers/cxl/core/region.c | 24 ++++++++++++++++++------ 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 3a77aec2c447..2a97fa9a394f 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -3572,9 +3572,8 @@ static void shutdown_notifiers(void *_cxlr) unregister_mt_adistance_algorithm(&cxlr->adist_notifier); } -static int cxl_region_probe(struct device *dev) +static int cxl_region_can_probe(struct cxl_region *cxlr) { - struct cxl_region *cxlr = to_cxl_region(dev); struct cxl_region_params *p = &cxlr->params; int rc; @@ -3597,15 +3596,28 @@ static int cxl_region_probe(struct device *dev) goto out; } - /* - * From this point on any path that changes the region's state away from - * CXL_CONFIG_COMMIT is also responsible for releasing the driver. - */ out: up_read(&cxl_region_rwsem); if (rc) return rc; + return 0; +} + +static int cxl_region_probe(struct device *dev) +{ + struct cxl_region *cxlr = to_cxl_region(dev); + struct cxl_region_params *p = &cxlr->params; + int rc; + + rc = cxl_region_can_probe(cxlr); + if (rc) + return rc; + + /* + * From this point on any path that changes the region's state away from + * CXL_CONFIG_COMMIT is also responsible for releasing the driver. + */ cxlr->memory_notifier.notifier_call = cxl_region_perf_attrs_callback; cxlr->memory_notifier.priority = CXL_CALLBACK_PRI; From b3a88225519cfd05d71b99946d37476c941145b8 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Fri, 11 Jul 2025 16:49:31 -0700 Subject: [PATCH 19/28] cxl/region: Consolidate cxl_decoder_kill_region() and cxl_region_detach() Both detach_target() and cxld_unregister() want to tear down a cxl_region when an endpoint decoder is either detached or destroyed. When a region is to be destroyed cxl_region_detach() releases cxl_region_rwsem unbinds the cxl_region driver and re-acquires the rwsem. This "reverse" locking pattern is difficult to reason about, not amenable to scope-based cleanup, and the minor differences in the calling context of detach_target() and cxld_unregister() currently results in the cxl_decoder_kill_region() wrapper. Introduce cxl_decoder_detach() to wrap a core __cxl_decoder_detach() that serves both cases. I.e. either detaching a known position in a region (interruptible), or detaching an endpoint decoder if it is found to be a member of a region (uninterruptible). Cc: Davidlohr Bueso Cc: Jonathan Cameron Cc: Dave Jiang Cc: Alison Schofield Cc: Vishal Verma Cc: Ira Weiny Acked-by: Peter Zijlstra (Intel) Signed-off-by: Dan Williams Reviewed-by: Fabio M. De Francesco Reviewed-by: Dave Jiang Reviewed-by: Jonathan Cameron Reviewed-by: Alison Schofield Reviewed-by: Davidlohr Bueso Link: https://patch.msgid.link/20250711234932.671292-8-dan.j.williams@intel.com Signed-off-by: Dave Jiang --- drivers/cxl/core/core.h | 15 +++++- drivers/cxl/core/port.c | 9 ++-- drivers/cxl/core/region.c | 103 ++++++++++++++++++++++---------------- 3 files changed, 75 insertions(+), 52 deletions(-) diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h index 29b61828a847..2be37084409f 100644 --- a/drivers/cxl/core/core.h +++ b/drivers/cxl/core/core.h @@ -12,6 +12,11 @@ extern const struct device_type cxl_pmu_type; extern struct attribute_group cxl_base_attribute_group; +enum cxl_detach_mode { + DETACH_ONLY, + DETACH_INVALIDATE, +}; + #ifdef CONFIG_CXL_REGION extern struct device_attribute dev_attr_create_pmem_region; extern struct device_attribute dev_attr_create_ram_region; @@ -20,7 +25,11 @@ extern struct device_attribute dev_attr_region; extern const struct device_type cxl_pmem_region_type; extern const struct device_type cxl_dax_region_type; extern const struct device_type cxl_region_type; -void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled); + +int cxl_decoder_detach(struct cxl_region *cxlr, + struct cxl_endpoint_decoder *cxled, int pos, + enum cxl_detach_mode mode); + #define CXL_REGION_ATTR(x) (&dev_attr_##x.attr) #define CXL_REGION_TYPE(x) (&cxl_region_type) #define SET_CXL_REGION_ATTR(x) (&dev_attr_##x.attr), @@ -48,7 +57,9 @@ static inline int cxl_get_poison_by_endpoint(struct cxl_port *port) { return 0; } -static inline void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled) +static inline int cxl_decoder_detach(struct cxl_region *cxlr, + struct cxl_endpoint_decoder *cxled, + int pos, enum cxl_detach_mode mode) { } static inline int cxl_region_init(void) diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index eb46c6764d20..087a20a9ee1c 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -2001,12 +2001,9 @@ EXPORT_SYMBOL_NS_GPL(cxl_decoder_add, "CXL"); static void cxld_unregister(void *dev) { - struct cxl_endpoint_decoder *cxled; - - if (is_endpoint_decoder(dev)) { - cxled = to_cxl_endpoint_decoder(dev); - cxl_decoder_kill_region(cxled); - } + if (is_endpoint_decoder(dev)) + cxl_decoder_detach(NULL, to_cxl_endpoint_decoder(dev), -1, + DETACH_INVALIDATE); device_unregister(dev); } diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 2a97fa9a394f..4314aaed8ad8 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -2135,27 +2135,43 @@ static int cxl_region_attach(struct cxl_region *cxlr, return 0; } -static int cxl_region_detach(struct cxl_endpoint_decoder *cxled) +static struct cxl_region * +__cxl_decoder_detach(struct cxl_region *cxlr, + struct cxl_endpoint_decoder *cxled, int pos, + enum cxl_detach_mode mode) { - struct cxl_port *iter, *ep_port = cxled_to_port(cxled); - struct cxl_region *cxlr = cxled->cxld.region; struct cxl_region_params *p; - int rc = 0; lockdep_assert_held_write(&cxl_region_rwsem); - if (!cxlr) - return 0; + if (!cxled) { + p = &cxlr->params; - p = &cxlr->params; - get_device(&cxlr->dev); + if (pos >= p->interleave_ways) { + dev_dbg(&cxlr->dev, "position %d out of range %d\n", + pos, p->interleave_ways); + return ERR_PTR(-ENXIO); + } + + if (!p->targets[pos]) + return NULL; + cxled = p->targets[pos]; + } else { + cxlr = cxled->cxld.region; + if (!cxlr) + return NULL; + p = &cxlr->params; + } + + if (mode == DETACH_INVALIDATE) + cxled->part = -1; if (p->state > CXL_CONFIG_ACTIVE) { cxl_region_decode_reset(cxlr, p->interleave_ways); p->state = CXL_CONFIG_ACTIVE; } - for (iter = ep_port; !is_cxl_root(iter); + for (struct cxl_port *iter = cxled_to_port(cxled); !is_cxl_root(iter); iter = to_cxl_port(iter->dev.parent)) cxl_port_detach_region(iter, cxlr, cxled); @@ -2166,7 +2182,7 @@ static int cxl_region_detach(struct cxl_endpoint_decoder *cxled) dev_WARN_ONCE(&cxlr->dev, 1, "expected %s:%s at position %d\n", dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), cxled->pos); - goto out; + return NULL; } if (p->state == CXL_CONFIG_ACTIVE) { @@ -2180,21 +2196,42 @@ static int cxl_region_detach(struct cxl_endpoint_decoder *cxled) .end = -1, }; - /* notify the region driver that one of its targets has departed */ - up_write(&cxl_region_rwsem); - device_release_driver(&cxlr->dev); - down_write(&cxl_region_rwsem); -out: - put_device(&cxlr->dev); - return rc; + get_device(&cxlr->dev); + return cxlr; } -void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled) +/* + * Cleanup a decoder's interest in a region. There are 2 cases to + * handle, removing an unknown @cxled from a known position in a region + * (detach_target()) or removing a known @cxled from an unknown @cxlr + * (cxld_unregister()) + * + * When the detachment finds a region release the region driver. + */ +int cxl_decoder_detach(struct cxl_region *cxlr, + struct cxl_endpoint_decoder *cxled, int pos, + enum cxl_detach_mode mode) { - down_write(&cxl_region_rwsem); - cxled->part = -1; - cxl_region_detach(cxled); + struct cxl_region *detach; + + /* when the decoder is being destroyed lock unconditionally */ + if (mode == DETACH_INVALIDATE) + down_write(&cxl_region_rwsem); + else { + int rc = down_write_killable(&cxl_region_rwsem); + + if (rc) + return rc; + } + + detach = __cxl_decoder_detach(cxlr, cxled, pos, mode); up_write(&cxl_region_rwsem); + + if (detach) { + device_release_driver(&detach->dev); + put_device(&detach->dev); + } + return 0; } static int attach_target(struct cxl_region *cxlr, @@ -2225,29 +2262,7 @@ static int attach_target(struct cxl_region *cxlr, static int detach_target(struct cxl_region *cxlr, int pos) { - struct cxl_region_params *p = &cxlr->params; - int rc; - - rc = down_write_killable(&cxl_region_rwsem); - if (rc) - return rc; - - if (pos >= p->interleave_ways) { - dev_dbg(&cxlr->dev, "position %d out of range %d\n", pos, - p->interleave_ways); - rc = -ENXIO; - goto out; - } - - if (!p->targets[pos]) { - rc = 0; - goto out; - } - - rc = cxl_region_detach(p->targets[pos]); -out: - up_write(&cxl_region_rwsem); - return rc; + return cxl_decoder_detach(cxlr, NULL, pos, DETACH_ONLY); } static size_t store_targetN(struct cxl_region *cxlr, const char *buf, int pos, From d03fcf50ba56f4479685b951506422eeca230853 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Fri, 11 Jul 2025 16:49:32 -0700 Subject: [PATCH 20/28] cxl: Convert to ACQUIRE() for conditional rwsem locking Use ACQUIRE() to cleanup conditional locking paths in the CXL driver The ACQUIRE() macro and its associated ACQUIRE_ERR() helpers, like scoped_cond_guard(), arrange for scoped-based conditional locking. Unlike scoped_cond_guard(), these macros arrange for an ERR_PTR() to be retrieved representing the state of the conditional lock. The goal of this conversion is to complete the removal of all explicit unlock calls in the subsystem. I.e. the methods to acquire a lock are solely via guard(), scoped_guard() (for limited cases), or ACQUIRE(). All unlock is implicit / scope-based. In order to make sure all lock sites are converted, the existing rwsem's are consolidated and renamed in 'struct cxl_rwsem'. While that makes the patch noisier it gives a clean cut-off between old-world (explicit unlock allowed), and new world (explicit unlock deleted). Cc: David Lechner Cc: Peter Zijlstra Cc: Linus Torvalds Cc: Ingo Molnar Cc: Fabio M. De Francesco Cc: Davidlohr Bueso Cc: Jonathan Cameron Cc: Dave Jiang Cc: Alison Schofield Cc: Vishal Verma Cc: Ira Weiny Cc: Shiju Jose Acked-by: Peter Zijlstra (Intel) Signed-off-by: Dan Williams Reviewed-by: Jonathan Cameron Reviewed-by: Fabio M. De Francesco Reviewed-by: Dave Jiang Tested-by: Shiju Jose Link: https://patch.msgid.link/20250711234932.671292-9-dan.j.williams@intel.com Signed-off-by: Dave Jiang --- drivers/cxl/core/cdat.c | 6 +- drivers/cxl/core/core.h | 17 ++- drivers/cxl/core/edac.c | 44 +++--- drivers/cxl/core/hdm.c | 41 +++--- drivers/cxl/core/mbox.c | 6 +- drivers/cxl/core/memdev.c | 50 +++---- drivers/cxl/core/port.c | 18 +-- drivers/cxl/core/region.c | 295 ++++++++++++++++---------------------- drivers/cxl/cxl.h | 13 +- include/linux/rwsem.h | 1 + 10 files changed, 212 insertions(+), 279 deletions(-) diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c index 0ccef2f2a26a..c0af645425f4 100644 --- a/drivers/cxl/core/cdat.c +++ b/drivers/cxl/core/cdat.c @@ -336,7 +336,7 @@ static int match_cxlrd_hb(struct device *dev, void *data) cxlrd = to_cxl_root_decoder(dev); cxlsd = &cxlrd->cxlsd; - guard(rwsem_read)(&cxl_region_rwsem); + guard(rwsem_read)(&cxl_rwsem.region); for (int i = 0; i < cxlsd->nr_targets; i++) { if (host_bridge == cxlsd->target[i]->dport_dev) return 1; @@ -987,7 +987,7 @@ void cxl_region_shared_upstream_bandwidth_update(struct cxl_region *cxlr) bool is_root; int rc; - lockdep_assert_held(&cxl_dpa_rwsem); + lockdep_assert_held(&cxl_rwsem.dpa); struct xarray *usp_xa __free(free_perf_xa) = kzalloc(sizeof(*usp_xa), GFP_KERNEL); @@ -1057,7 +1057,7 @@ void cxl_region_perf_data_calculate(struct cxl_region *cxlr, { struct cxl_dpa_perf *perf; - lockdep_assert_held(&cxl_dpa_rwsem); + lockdep_assert_held(&cxl_rwsem.dpa); perf = cxled_get_dpa_perf(cxled); if (IS_ERR(perf)) diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h index 2be37084409f..f796731deedf 100644 --- a/drivers/cxl/core/core.h +++ b/drivers/cxl/core/core.h @@ -5,6 +5,7 @@ #define __CXL_CORE_H__ #include +#include extern const struct device_type cxl_nvdimm_bridge_type; extern const struct device_type cxl_nvdimm_type; @@ -107,8 +108,20 @@ u16 cxl_rcrb_to_aer(struct device *dev, resource_size_t rcrb); #define PCI_RCRB_CAP_HDR_NEXT_MASK GENMASK(15, 8) #define PCI_CAP_EXP_SIZEOF 0x3c -extern struct rw_semaphore cxl_dpa_rwsem; -extern struct rw_semaphore cxl_region_rwsem; +struct cxl_rwsem { + /* + * All changes to HPA (interleave configuration) occur with this + * lock held for write. + */ + struct rw_semaphore region; + /* + * All changes to a device DPA space occur with this lock held + * for write. + */ + struct rw_semaphore dpa; +}; + +extern struct cxl_rwsem cxl_rwsem; int cxl_memdev_init(void); void cxl_memdev_exit(void); diff --git a/drivers/cxl/core/edac.c b/drivers/cxl/core/edac.c index 623aaa4439c4..9ed1b670efb8 100644 --- a/drivers/cxl/core/edac.c +++ b/drivers/cxl/core/edac.c @@ -115,10 +115,9 @@ static int cxl_scrub_get_attrbs(struct cxl_patrol_scrub_context *cxl_ps_ctx, flags, min_cycle); } - struct rw_semaphore *region_lock __free(rwsem_read_release) = - rwsem_read_intr_acquire(&cxl_region_rwsem); - if (!region_lock) - return -EINTR; + ACQUIRE(rwsem_read_intr, rwsem)(&cxl_rwsem.region); + if ((ret = ACQUIRE_ERR(rwsem_read_intr, &rwsem))) + return ret; cxlr = cxl_ps_ctx->cxlr; p = &cxlr->params; @@ -158,10 +157,9 @@ static int cxl_scrub_set_attrbs_region(struct device *dev, struct cxl_region *cxlr; int ret, i; - struct rw_semaphore *region_lock __free(rwsem_read_release) = - rwsem_read_intr_acquire(&cxl_region_rwsem); - if (!region_lock) - return -EINTR; + ACQUIRE(rwsem_read_intr, rwsem)(&cxl_rwsem.region); + if ((ret = ACQUIRE_ERR(rwsem_read_intr, &rwsem))) + return ret; cxlr = cxl_ps_ctx->cxlr; p = &cxlr->params; @@ -1340,16 +1338,15 @@ cxl_mem_perform_sparing(struct device *dev, struct cxl_memdev_sparing_in_payload sparing_pi; struct cxl_event_dram *rec = NULL; u16 validity_flags = 0; + int ret; - struct rw_semaphore *region_lock __free(rwsem_read_release) = - rwsem_read_intr_acquire(&cxl_region_rwsem); - if (!region_lock) - return -EINTR; + ACQUIRE(rwsem_read_intr, region_rwsem)(&cxl_rwsem.region); + if ((ret = ACQUIRE_ERR(rwsem_read_intr, ®ion_rwsem))) + return ret; - struct rw_semaphore *dpa_lock __free(rwsem_read_release) = - rwsem_read_intr_acquire(&cxl_dpa_rwsem); - if (!dpa_lock) - return -EINTR; + ACQUIRE(rwsem_read_intr, dpa_rwsem)(&cxl_rwsem.dpa); + if ((ret = ACQUIRE_ERR(rwsem_read_intr, &dpa_rwsem))) + return ret; if (!cxl_sparing_ctx->cap_safe_when_in_use) { /* Memory to repair must be offline */ @@ -1787,16 +1784,15 @@ static int cxl_mem_perform_ppr(struct cxl_ppr_context *cxl_ppr_ctx) struct cxl_memdev_ppr_maintenance_attrbs maintenance_attrbs; struct cxl_memdev *cxlmd = cxl_ppr_ctx->cxlmd; struct cxl_mem_repair_attrbs attrbs = { 0 }; + int ret; - struct rw_semaphore *region_lock __free(rwsem_read_release) = - rwsem_read_intr_acquire(&cxl_region_rwsem); - if (!region_lock) - return -EINTR; + ACQUIRE(rwsem_read_intr, region_rwsem)(&cxl_rwsem.region); + if ((ret = ACQUIRE_ERR(rwsem_read_intr, ®ion_rwsem))) + return ret; - struct rw_semaphore *dpa_lock __free(rwsem_read_release) = - rwsem_read_intr_acquire(&cxl_dpa_rwsem); - if (!dpa_lock) - return -EINTR; + ACQUIRE(rwsem_read_intr, dpa_rwsem)(&cxl_rwsem.dpa); + if ((ret = ACQUIRE_ERR(rwsem_read_intr, &dpa_rwsem))) + return ret; if (!cxl_ppr_ctx->media_accessible || !cxl_ppr_ctx->data_retained) { /* Memory to repair must be offline */ diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index e9cb34e30248..865a71bce251 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -16,7 +16,10 @@ * for enumerating these registers and capabilities. */ -DECLARE_RWSEM(cxl_dpa_rwsem); +struct cxl_rwsem cxl_rwsem = { + .region = __RWSEM_INITIALIZER(cxl_rwsem.region), + .dpa = __RWSEM_INITIALIZER(cxl_rwsem.dpa), +}; static int add_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld, int *target_map) @@ -214,7 +217,7 @@ void cxl_dpa_debug(struct seq_file *file, struct cxl_dev_state *cxlds) { struct resource *p1, *p2; - guard(rwsem_read)(&cxl_dpa_rwsem); + guard(rwsem_read)(&cxl_rwsem.dpa); for (p1 = cxlds->dpa_res.child; p1; p1 = p1->sibling) { __cxl_dpa_debug(file, p1, 0); for (p2 = p1->child; p2; p2 = p2->sibling) @@ -266,7 +269,7 @@ static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled) struct resource *res = cxled->dpa_res; resource_size_t skip_start; - lockdep_assert_held_write(&cxl_dpa_rwsem); + lockdep_assert_held_write(&cxl_rwsem.dpa); /* save @skip_start, before @res is released */ skip_start = res->start - cxled->skip; @@ -281,7 +284,7 @@ static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled) static void cxl_dpa_release(void *cxled) { - guard(rwsem_write)(&cxl_dpa_rwsem); + guard(rwsem_write)(&cxl_rwsem.dpa); __cxl_dpa_release(cxled); } @@ -293,7 +296,7 @@ static void devm_cxl_dpa_release(struct cxl_endpoint_decoder *cxled) { struct cxl_port *port = cxled_to_port(cxled); - lockdep_assert_held_write(&cxl_dpa_rwsem); + lockdep_assert_held_write(&cxl_rwsem.dpa); devm_remove_action(&port->dev, cxl_dpa_release, cxled); __cxl_dpa_release(cxled); } @@ -361,7 +364,7 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, struct resource *res; int rc; - lockdep_assert_held_write(&cxl_dpa_rwsem); + lockdep_assert_held_write(&cxl_rwsem.dpa); if (!len) { dev_warn(dev, "decoder%d.%d: empty reservation attempted\n", @@ -470,7 +473,7 @@ int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info) { struct device *dev = cxlds->dev; - guard(rwsem_write)(&cxl_dpa_rwsem); + guard(rwsem_write)(&cxl_rwsem.dpa); if (cxlds->nr_partitions) return -EBUSY; @@ -516,9 +519,8 @@ int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, struct cxl_port *port = cxled_to_port(cxled); int rc; - down_write(&cxl_dpa_rwsem); - rc = __cxl_dpa_reserve(cxled, base, len, skipped); - up_write(&cxl_dpa_rwsem); + scoped_guard(rwsem_write, &cxl_rwsem.dpa) + rc = __cxl_dpa_reserve(cxled, base, len, skipped); if (rc) return rc; @@ -529,7 +531,7 @@ EXPORT_SYMBOL_NS_GPL(devm_cxl_dpa_reserve, "CXL"); resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled) { - guard(rwsem_read)(&cxl_dpa_rwsem); + guard(rwsem_read)(&cxl_rwsem.dpa); if (cxled->dpa_res) return resource_size(cxled->dpa_res); @@ -540,7 +542,7 @@ resource_size_t cxl_dpa_resource_start(struct cxl_endpoint_decoder *cxled) { resource_size_t base = -1; - lockdep_assert_held(&cxl_dpa_rwsem); + lockdep_assert_held(&cxl_rwsem.dpa); if (cxled->dpa_res) base = cxled->dpa_res->start; @@ -552,7 +554,7 @@ int cxl_dpa_free(struct cxl_endpoint_decoder *cxled) struct cxl_port *port = cxled_to_port(cxled); struct device *dev = &cxled->cxld.dev; - guard(rwsem_write)(&cxl_dpa_rwsem); + guard(rwsem_write)(&cxl_rwsem.dpa); if (!cxled->dpa_res) return 0; if (cxled->cxld.region) { @@ -582,7 +584,7 @@ int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled, struct device *dev = &cxled->cxld.dev; int part; - guard(rwsem_write)(&cxl_dpa_rwsem); + guard(rwsem_write)(&cxl_rwsem.dpa); if (cxled->cxld.flags & CXL_DECODER_F_ENABLE) return -EBUSY; @@ -614,7 +616,7 @@ static int __cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, u64 size) struct resource *p, *last; int part; - guard(rwsem_write)(&cxl_dpa_rwsem); + guard(rwsem_write)(&cxl_rwsem.dpa); if (cxled->cxld.region) { dev_dbg(dev, "decoder attached to %s\n", dev_name(&cxled->cxld.region->dev)); @@ -842,9 +844,8 @@ static int cxl_decoder_commit(struct cxl_decoder *cxld) } } - down_read(&cxl_dpa_rwsem); - setup_hw_decoder(cxld, hdm); - up_read(&cxl_dpa_rwsem); + scoped_guard(rwsem_read, &cxl_rwsem.dpa) + setup_hw_decoder(cxld, hdm); port->commit_end++; rc = cxld_await_commit(hdm, cxld->id); @@ -882,7 +883,7 @@ void cxl_port_commit_reap(struct cxl_decoder *cxld) { struct cxl_port *port = to_cxl_port(cxld->dev.parent); - lockdep_assert_held_write(&cxl_region_rwsem); + lockdep_assert_held_write(&cxl_rwsem.region); /* * Once the highest committed decoder is disabled, free any other @@ -1030,7 +1031,7 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld, else cxld->target_type = CXL_DECODER_DEVMEM; - guard(rwsem_write)(&cxl_region_rwsem); + guard(rwsem_write)(&cxl_rwsem.region); if (cxld->id != cxl_num_decoders_committed(port)) { dev_warn(&port->dev, "decoder%d.%d: Committed out of order\n", diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index 81b21effe8cf..92cd3cbdd8ec 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -909,8 +909,8 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd, * translations. Take topology mutation locks and lookup * { HPA, REGION } from { DPA, MEMDEV } in the event record. */ - guard(rwsem_read)(&cxl_region_rwsem); - guard(rwsem_read)(&cxl_dpa_rwsem); + guard(rwsem_read)(&cxl_rwsem.region); + guard(rwsem_read)(&cxl_rwsem.dpa); dpa = le64_to_cpu(evt->media_hdr.phys_addr) & CXL_DPA_MASK; cxlr = cxl_dpa_to_region(cxlmd, dpa); @@ -1265,7 +1265,7 @@ int cxl_mem_sanitize(struct cxl_memdev *cxlmd, u16 cmd) /* synchronize with cxl_mem_probe() and decoder write operations */ guard(device)(&cxlmd->dev); endpoint = cxlmd->endpoint; - guard(rwsem_read)(&cxl_region_rwsem); + guard(rwsem_read)(&cxl_rwsem.region); /* * Require an endpoint to be safe otherwise the driver can not * be sure that the device is unmapped. diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c index f88a13adf7fa..f5fbd34310fd 100644 --- a/drivers/cxl/core/memdev.c +++ b/drivers/cxl/core/memdev.c @@ -232,15 +232,13 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd) if (!port || !is_cxl_endpoint(port)) return -EINVAL; - rc = down_read_interruptible(&cxl_region_rwsem); - if (rc) + ACQUIRE(rwsem_read_intr, region_rwsem)(&cxl_rwsem.region); + if ((rc = ACQUIRE_ERR(rwsem_read_intr, ®ion_rwsem))) return rc; - rc = down_read_interruptible(&cxl_dpa_rwsem); - if (rc) { - up_read(&cxl_region_rwsem); + ACQUIRE(rwsem_read_intr, dpa_rwsem)(&cxl_rwsem.dpa); + if ((rc = ACQUIRE_ERR(rwsem_read_intr, &dpa_rwsem))) return rc; - } if (cxl_num_decoders_committed(port) == 0) { /* No regions mapped to this memdev */ @@ -249,8 +247,6 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd) /* Regions mapped, collect poison by endpoint */ rc = cxl_get_poison_by_endpoint(port); } - up_read(&cxl_dpa_rwsem); - up_read(&cxl_region_rwsem); return rc; } @@ -292,19 +288,17 @@ int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa) if (!IS_ENABLED(CONFIG_DEBUG_FS)) return 0; - rc = down_read_interruptible(&cxl_region_rwsem); - if (rc) + ACQUIRE(rwsem_read_intr, region_rwsem)(&cxl_rwsem.region); + if ((rc = ACQUIRE_ERR(rwsem_read_intr, ®ion_rwsem))) return rc; - rc = down_read_interruptible(&cxl_dpa_rwsem); - if (rc) { - up_read(&cxl_region_rwsem); + ACQUIRE(rwsem_read_intr, dpa_rwsem)(&cxl_rwsem.dpa); + if ((rc = ACQUIRE_ERR(rwsem_read_intr, &dpa_rwsem))) return rc; - } rc = cxl_validate_poison_dpa(cxlmd, dpa); if (rc) - goto out; + return rc; inject.address = cpu_to_le64(dpa); mbox_cmd = (struct cxl_mbox_cmd) { @@ -314,7 +308,7 @@ int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa) }; rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd); if (rc) - goto out; + return rc; cxlr = cxl_dpa_to_region(cxlmd, dpa); if (cxlr) @@ -327,11 +321,8 @@ int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa) .length = cpu_to_le32(1), }; trace_cxl_poison(cxlmd, cxlr, &record, 0, 0, CXL_POISON_TRACE_INJECT); -out: - up_read(&cxl_dpa_rwsem); - up_read(&cxl_region_rwsem); - return rc; + return 0; } EXPORT_SYMBOL_NS_GPL(cxl_inject_poison, "CXL"); @@ -347,19 +338,17 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa) if (!IS_ENABLED(CONFIG_DEBUG_FS)) return 0; - rc = down_read_interruptible(&cxl_region_rwsem); - if (rc) + ACQUIRE(rwsem_read_intr, region_rwsem)(&cxl_rwsem.region); + if ((rc = ACQUIRE_ERR(rwsem_read_intr, ®ion_rwsem))) return rc; - rc = down_read_interruptible(&cxl_dpa_rwsem); - if (rc) { - up_read(&cxl_region_rwsem); + ACQUIRE(rwsem_read_intr, dpa_rwsem)(&cxl_rwsem.dpa); + if ((rc = ACQUIRE_ERR(rwsem_read_intr, &dpa_rwsem))) return rc; - } rc = cxl_validate_poison_dpa(cxlmd, dpa); if (rc) - goto out; + return rc; /* * In CXL 3.0 Spec 8.2.9.8.4.3, the Clear Poison mailbox command @@ -378,7 +367,7 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa) rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd); if (rc) - goto out; + return rc; cxlr = cxl_dpa_to_region(cxlmd, dpa); if (cxlr) @@ -391,11 +380,8 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa) .length = cpu_to_le32(1), }; trace_cxl_poison(cxlmd, cxlr, &record, 0, 0, CXL_POISON_TRACE_CLEAR); -out: - up_read(&cxl_dpa_rwsem); - up_read(&cxl_region_rwsem); - return rc; + return 0; } EXPORT_SYMBOL_NS_GPL(cxl_clear_poison, "CXL"); diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 087a20a9ee1c..bacf1380dc4d 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -30,18 +30,12 @@ * instantiated by the core. */ -/* - * All changes to the interleave configuration occur with this lock held - * for write. - */ -DECLARE_RWSEM(cxl_region_rwsem); - static DEFINE_IDA(cxl_port_ida); static DEFINE_XARRAY(cxl_root_buses); int cxl_num_decoders_committed(struct cxl_port *port) { - lockdep_assert_held(&cxl_region_rwsem); + lockdep_assert_held(&cxl_rwsem.region); return port->commit_end + 1; } @@ -176,7 +170,7 @@ static ssize_t target_list_show(struct device *dev, ssize_t offset; int rc; - guard(rwsem_read)(&cxl_region_rwsem); + guard(rwsem_read)(&cxl_rwsem.region); rc = emit_target_list(cxlsd, buf); if (rc < 0) return rc; @@ -196,7 +190,7 @@ static ssize_t mode_show(struct device *dev, struct device_attribute *attr, struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev); struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); struct cxl_dev_state *cxlds = cxlmd->cxlds; - /* without @cxl_dpa_rwsem, make sure @part is not reloaded */ + /* without @cxl_rwsem.dpa, make sure @part is not reloaded */ int part = READ_ONCE(cxled->part); const char *desc; @@ -235,7 +229,7 @@ static ssize_t dpa_resource_show(struct device *dev, struct device_attribute *at { struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev); - guard(rwsem_read)(&cxl_dpa_rwsem); + guard(rwsem_read)(&cxl_rwsem.dpa); return sysfs_emit(buf, "%#llx\n", (u64)cxl_dpa_resource_start(cxled)); } static DEVICE_ATTR_RO(dpa_resource); @@ -560,7 +554,7 @@ static ssize_t decoders_committed_show(struct device *dev, { struct cxl_port *port = to_cxl_port(dev); - guard(rwsem_read)(&cxl_region_rwsem); + guard(rwsem_read)(&cxl_rwsem.region); return sysfs_emit(buf, "%d\n", cxl_num_decoders_committed(port)); } @@ -1722,7 +1716,7 @@ static int decoder_populate_targets(struct cxl_switch_decoder *cxlsd, if (xa_empty(&port->dports)) return -EINVAL; - guard(rwsem_write)(&cxl_region_rwsem); + guard(rwsem_write)(&cxl_rwsem.region); for (i = 0; i < cxlsd->cxld.interleave_ways; i++) { struct cxl_dport *dport = find_dport(port, target_map[i]); diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 4314aaed8ad8..ad60c93be803 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -141,16 +141,12 @@ static ssize_t uuid_show(struct device *dev, struct device_attribute *attr, struct cxl_region_params *p = &cxlr->params; ssize_t rc; - rc = down_read_interruptible(&cxl_region_rwsem); - if (rc) + ACQUIRE(rwsem_read_intr, region_rwsem)(&cxl_rwsem.region); + if ((rc = ACQUIRE_ERR(rwsem_read_intr, ®ion_rwsem))) return rc; if (cxlr->mode != CXL_PARTMODE_PMEM) - rc = sysfs_emit(buf, "\n"); - else - rc = sysfs_emit(buf, "%pUb\n", &p->uuid); - up_read(&cxl_region_rwsem); - - return rc; + return sysfs_emit(buf, "\n"); + return sysfs_emit(buf, "%pUb\n", &p->uuid); } static int is_dup(struct device *match, void *data) @@ -162,7 +158,7 @@ static int is_dup(struct device *match, void *data) if (!is_cxl_region(match)) return 0; - lockdep_assert_held(&cxl_region_rwsem); + lockdep_assert_held(&cxl_rwsem.region); cxlr = to_cxl_region(match); p = &cxlr->params; @@ -192,27 +188,22 @@ static ssize_t uuid_store(struct device *dev, struct device_attribute *attr, if (uuid_is_null(&temp)) return -EINVAL; - rc = down_write_killable(&cxl_region_rwsem); - if (rc) + ACQUIRE(rwsem_write_kill, region_rwsem)(&cxl_rwsem.region); + if ((rc = ACQUIRE_ERR(rwsem_write_kill, ®ion_rwsem))) return rc; if (uuid_equal(&p->uuid, &temp)) - goto out; + return len; - rc = -EBUSY; if (p->state >= CXL_CONFIG_ACTIVE) - goto out; + return -EBUSY; rc = bus_for_each_dev(&cxl_bus_type, NULL, &temp, is_dup); if (rc < 0) - goto out; + return rc; uuid_copy(&p->uuid, &temp); -out: - up_write(&cxl_region_rwsem); - if (rc) - return rc; return len; } static DEVICE_ATTR_RW(uuid); @@ -354,20 +345,17 @@ static int queue_reset(struct cxl_region *cxlr) struct cxl_region_params *p = &cxlr->params; int rc; - rc = down_write_killable(&cxl_region_rwsem); - if (rc) + ACQUIRE(rwsem_write_kill, rwsem)(&cxl_rwsem.region); + if ((rc = ACQUIRE_ERR(rwsem_write_kill, &rwsem))) return rc; /* Already in the requested state? */ if (p->state < CXL_CONFIG_COMMIT) - goto out; + return 0; p->state = CXL_CONFIG_RESET_PENDING; -out: - up_write(&cxl_region_rwsem); - - return rc; + return 0; } static int __commit(struct cxl_region *cxlr) @@ -375,19 +363,17 @@ static int __commit(struct cxl_region *cxlr) struct cxl_region_params *p = &cxlr->params; int rc; - rc = down_write_killable(&cxl_region_rwsem); - if (rc) + ACQUIRE(rwsem_write_kill, rwsem)(&cxl_rwsem.region); + if ((rc = ACQUIRE_ERR(rwsem_write_kill, &rwsem))) return rc; /* Already in the requested state? */ if (p->state >= CXL_CONFIG_COMMIT) - goto out; + return 0; /* Not ready to commit? */ - if (p->state < CXL_CONFIG_ACTIVE) { - rc = -ENXIO; - goto out; - } + if (p->state < CXL_CONFIG_ACTIVE) + return -ENXIO; /* * Invalidate caches before region setup to drop any speculative @@ -395,16 +381,15 @@ static int __commit(struct cxl_region *cxlr) */ rc = cxl_region_invalidate_memregion(cxlr); if (rc) - goto out; + return rc; rc = cxl_region_decode_commit(cxlr); - if (rc == 0) - p->state = CXL_CONFIG_COMMIT; + if (rc) + return rc; -out: - up_write(&cxl_region_rwsem); + p->state = CXL_CONFIG_COMMIT; - return rc; + return 0; } static ssize_t commit_store(struct device *dev, struct device_attribute *attr, @@ -437,10 +422,10 @@ static ssize_t commit_store(struct device *dev, struct device_attribute *attr, device_release_driver(&cxlr->dev); /* - * With the reset pending take cxl_region_rwsem unconditionally + * With the reset pending take cxl_rwsem.region unconditionally * to ensure the reset gets handled before returning. */ - guard(rwsem_write)(&cxl_region_rwsem); + guard(rwsem_write)(&cxl_rwsem.region); /* * Revalidate that the reset is still pending in case another @@ -461,13 +446,10 @@ static ssize_t commit_show(struct device *dev, struct device_attribute *attr, struct cxl_region_params *p = &cxlr->params; ssize_t rc; - rc = down_read_interruptible(&cxl_region_rwsem); - if (rc) + ACQUIRE(rwsem_read_intr, rwsem)(&cxl_rwsem.region); + if ((rc = ACQUIRE_ERR(rwsem_read_intr, &rwsem))) return rc; - rc = sysfs_emit(buf, "%d\n", p->state >= CXL_CONFIG_COMMIT); - up_read(&cxl_region_rwsem); - - return rc; + return sysfs_emit(buf, "%d\n", p->state >= CXL_CONFIG_COMMIT); } static DEVICE_ATTR_RW(commit); @@ -491,15 +473,12 @@ static ssize_t interleave_ways_show(struct device *dev, { struct cxl_region *cxlr = to_cxl_region(dev); struct cxl_region_params *p = &cxlr->params; - ssize_t rc; + int rc; - rc = down_read_interruptible(&cxl_region_rwsem); - if (rc) + ACQUIRE(rwsem_read_intr, rwsem)(&cxl_rwsem.region); + if ((rc = ACQUIRE_ERR(rwsem_read_intr, &rwsem))) return rc; - rc = sysfs_emit(buf, "%d\n", p->interleave_ways); - up_read(&cxl_region_rwsem); - - return rc; + return sysfs_emit(buf, "%d\n", p->interleave_ways); } static const struct attribute_group *get_cxl_region_target_group(void); @@ -534,23 +513,21 @@ static ssize_t interleave_ways_store(struct device *dev, return -EINVAL; } - rc = down_write_killable(&cxl_region_rwsem); - if (rc) + ACQUIRE(rwsem_write_kill, rwsem)(&cxl_rwsem.region); + if ((rc = ACQUIRE_ERR(rwsem_write_kill, &rwsem))) return rc; - if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE) { - rc = -EBUSY; - goto out; - } + + if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE) + return -EBUSY; save = p->interleave_ways; p->interleave_ways = val; rc = sysfs_update_group(&cxlr->dev.kobj, get_cxl_region_target_group()); - if (rc) + if (rc) { p->interleave_ways = save; -out: - up_write(&cxl_region_rwsem); - if (rc) return rc; + } + return len; } static DEVICE_ATTR_RW(interleave_ways); @@ -561,15 +538,12 @@ static ssize_t interleave_granularity_show(struct device *dev, { struct cxl_region *cxlr = to_cxl_region(dev); struct cxl_region_params *p = &cxlr->params; - ssize_t rc; + int rc; - rc = down_read_interruptible(&cxl_region_rwsem); - if (rc) + ACQUIRE(rwsem_read_intr, rwsem)(&cxl_rwsem.region); + if ((rc = ACQUIRE_ERR(rwsem_read_intr, &rwsem))) return rc; - rc = sysfs_emit(buf, "%d\n", p->interleave_granularity); - up_read(&cxl_region_rwsem); - - return rc; + return sysfs_emit(buf, "%d\n", p->interleave_granularity); } static ssize_t interleave_granularity_store(struct device *dev, @@ -602,19 +576,15 @@ static ssize_t interleave_granularity_store(struct device *dev, if (cxld->interleave_ways > 1 && val != cxld->interleave_granularity) return -EINVAL; - rc = down_write_killable(&cxl_region_rwsem); - if (rc) + ACQUIRE(rwsem_write_kill, rwsem)(&cxl_rwsem.region); + if ((rc = ACQUIRE_ERR(rwsem_write_kill, &rwsem))) return rc; - if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE) { - rc = -EBUSY; - goto out; - } + + if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE) + return -EBUSY; p->interleave_granularity = val; -out: - up_write(&cxl_region_rwsem); - if (rc) - return rc; + return len; } static DEVICE_ATTR_RW(interleave_granularity); @@ -625,17 +595,15 @@ static ssize_t resource_show(struct device *dev, struct device_attribute *attr, struct cxl_region *cxlr = to_cxl_region(dev); struct cxl_region_params *p = &cxlr->params; u64 resource = -1ULL; - ssize_t rc; + int rc; - rc = down_read_interruptible(&cxl_region_rwsem); - if (rc) + ACQUIRE(rwsem_read_intr, rwsem)(&cxl_rwsem.region); + if ((rc = ACQUIRE_ERR(rwsem_read_intr, &rwsem))) return rc; + if (p->res) resource = p->res->start; - rc = sysfs_emit(buf, "%#llx\n", resource); - up_read(&cxl_region_rwsem); - - return rc; + return sysfs_emit(buf, "%#llx\n", resource); } static DEVICE_ATTR_RO(resource); @@ -663,7 +631,7 @@ static int alloc_hpa(struct cxl_region *cxlr, resource_size_t size) struct resource *res; u64 remainder = 0; - lockdep_assert_held_write(&cxl_region_rwsem); + lockdep_assert_held_write(&cxl_rwsem.region); /* Nothing to do... */ if (p->res && resource_size(p->res) == size) @@ -705,7 +673,7 @@ static void cxl_region_iomem_release(struct cxl_region *cxlr) struct cxl_region_params *p = &cxlr->params; if (device_is_registered(&cxlr->dev)) - lockdep_assert_held_write(&cxl_region_rwsem); + lockdep_assert_held_write(&cxl_rwsem.region); if (p->res) { /* * Autodiscovered regions may not have been able to insert their @@ -722,7 +690,7 @@ static int free_hpa(struct cxl_region *cxlr) { struct cxl_region_params *p = &cxlr->params; - lockdep_assert_held_write(&cxl_region_rwsem); + lockdep_assert_held_write(&cxl_rwsem.region); if (!p->res) return 0; @@ -746,15 +714,14 @@ static ssize_t size_store(struct device *dev, struct device_attribute *attr, if (rc) return rc; - rc = down_write_killable(&cxl_region_rwsem); - if (rc) + ACQUIRE(rwsem_write_kill, rwsem)(&cxl_rwsem.region); + if ((rc = ACQUIRE_ERR(rwsem_write_kill, &rwsem))) return rc; if (val) rc = alloc_hpa(cxlr, val); else rc = free_hpa(cxlr); - up_write(&cxl_region_rwsem); if (rc) return rc; @@ -770,15 +737,12 @@ static ssize_t size_show(struct device *dev, struct device_attribute *attr, u64 size = 0; ssize_t rc; - rc = down_read_interruptible(&cxl_region_rwsem); - if (rc) + ACQUIRE(rwsem_read_intr, rwsem)(&cxl_rwsem.region); + if ((rc = ACQUIRE_ERR(rwsem_read_intr, &rwsem))) return rc; if (p->res) size = resource_size(p->res); - rc = sysfs_emit(buf, "%#llx\n", size); - up_read(&cxl_region_rwsem); - - return rc; + return sysfs_emit(buf, "%#llx\n", size); } static DEVICE_ATTR_RW(size); @@ -804,26 +768,20 @@ static size_t show_targetN(struct cxl_region *cxlr, char *buf, int pos) struct cxl_endpoint_decoder *cxled; int rc; - rc = down_read_interruptible(&cxl_region_rwsem); - if (rc) + ACQUIRE(rwsem_read_intr, rwsem)(&cxl_rwsem.region); + if ((rc = ACQUIRE_ERR(rwsem_read_intr, &rwsem))) return rc; if (pos >= p->interleave_ways) { dev_dbg(&cxlr->dev, "position %d out of range %d\n", pos, p->interleave_ways); - rc = -ENXIO; - goto out; + return -ENXIO; } cxled = p->targets[pos]; if (!cxled) - rc = sysfs_emit(buf, "\n"); - else - rc = sysfs_emit(buf, "%s\n", dev_name(&cxled->cxld.dev)); -out: - up_read(&cxl_region_rwsem); - - return rc; + return sysfs_emit(buf, "\n"); + return sysfs_emit(buf, "%s\n", dev_name(&cxled->cxld.dev)); } static int check_commit_order(struct device *dev, void *data) @@ -938,7 +896,7 @@ cxl_port_pick_region_decoder(struct cxl_port *port, /* * This decoder is pinned registered as long as the endpoint decoder is * registered, and endpoint decoder unregistration holds the - * cxl_region_rwsem over unregister events, so no need to hold on to + * cxl_rwsem.region over unregister events, so no need to hold on to * this extra reference. */ put_device(dev); @@ -1129,7 +1087,7 @@ static int cxl_port_attach_region(struct cxl_port *port, unsigned long index; int rc = -EBUSY; - lockdep_assert_held_write(&cxl_region_rwsem); + lockdep_assert_held_write(&cxl_rwsem.region); cxl_rr = cxl_rr_load(port, cxlr); if (cxl_rr) { @@ -1239,7 +1197,7 @@ static void cxl_port_detach_region(struct cxl_port *port, struct cxl_region_ref *cxl_rr; struct cxl_ep *ep = NULL; - lockdep_assert_held_write(&cxl_region_rwsem); + lockdep_assert_held_write(&cxl_rwsem.region); cxl_rr = cxl_rr_load(port, cxlr); if (!cxl_rr) @@ -2142,7 +2100,7 @@ __cxl_decoder_detach(struct cxl_region *cxlr, { struct cxl_region_params *p; - lockdep_assert_held_write(&cxl_region_rwsem); + lockdep_assert_held_write(&cxl_rwsem.region); if (!cxled) { p = &cxlr->params; @@ -2215,18 +2173,18 @@ int cxl_decoder_detach(struct cxl_region *cxlr, struct cxl_region *detach; /* when the decoder is being destroyed lock unconditionally */ - if (mode == DETACH_INVALIDATE) - down_write(&cxl_region_rwsem); - else { - int rc = down_write_killable(&cxl_region_rwsem); + if (mode == DETACH_INVALIDATE) { + guard(rwsem_write)(&cxl_rwsem.region); + detach = __cxl_decoder_detach(cxlr, cxled, pos, mode); + } else { + int rc; - if (rc) + ACQUIRE(rwsem_write_kill, rwsem)(&cxl_rwsem.region); + if ((rc = ACQUIRE_ERR(rwsem_write_kill, &rwsem))) return rc; + detach = __cxl_decoder_detach(cxlr, cxled, pos, mode); } - detach = __cxl_decoder_detach(cxlr, cxled, pos, mode); - up_write(&cxl_region_rwsem); - if (detach) { device_release_driver(&detach->dev); put_device(&detach->dev); @@ -2234,29 +2192,35 @@ int cxl_decoder_detach(struct cxl_region *cxlr, return 0; } +static int __attach_target(struct cxl_region *cxlr, + struct cxl_endpoint_decoder *cxled, int pos, + unsigned int state) +{ + int rc; + + if (state == TASK_INTERRUPTIBLE) { + ACQUIRE(rwsem_write_kill, rwsem)(&cxl_rwsem.region); + if ((rc = ACQUIRE_ERR(rwsem_write_kill, &rwsem))) + return rc; + guard(rwsem_read)(&cxl_rwsem.dpa); + return cxl_region_attach(cxlr, cxled, pos); + } + guard(rwsem_write)(&cxl_rwsem.region); + guard(rwsem_read)(&cxl_rwsem.dpa); + return cxl_region_attach(cxlr, cxled, pos); +} + static int attach_target(struct cxl_region *cxlr, struct cxl_endpoint_decoder *cxled, int pos, unsigned int state) { - int rc = 0; + int rc = __attach_target(cxlr, cxled, pos, state); - if (state == TASK_INTERRUPTIBLE) - rc = down_write_killable(&cxl_region_rwsem); - else - down_write(&cxl_region_rwsem); - if (rc) - return rc; - - down_read(&cxl_dpa_rwsem); - rc = cxl_region_attach(cxlr, cxled, pos); - up_read(&cxl_dpa_rwsem); - up_write(&cxl_region_rwsem); - - if (rc) - dev_warn(cxled->cxld.dev.parent, - "failed to attach %s to %s: %d\n", - dev_name(&cxled->cxld.dev), dev_name(&cxlr->dev), rc); + if (rc == 0) + return 0; + dev_warn(cxled->cxld.dev.parent, "failed to attach %s to %s: %d\n", + dev_name(&cxled->cxld.dev), dev_name(&cxlr->dev), rc); return rc; } @@ -2516,7 +2480,7 @@ static int cxl_region_perf_attrs_callback(struct notifier_block *nb, return NOTIFY_DONE; /* - * No need to hold cxl_region_rwsem; region parameters are stable + * No need to hold cxl_rwsem.region; region parameters are stable * within the cxl_region driver. */ region_nid = phys_to_target_node(cxlr->params.res->start); @@ -2539,7 +2503,7 @@ static int cxl_region_calculate_adistance(struct notifier_block *nb, int region_nid; /* - * No need to hold cxl_region_rwsem; region parameters are stable + * No need to hold cxl_rwsem.region; region parameters are stable * within the cxl_region driver. */ region_nid = phys_to_target_node(cxlr->params.res->start); @@ -2688,17 +2652,13 @@ static ssize_t region_show(struct device *dev, struct device_attribute *attr, struct cxl_decoder *cxld = to_cxl_decoder(dev); ssize_t rc; - rc = down_read_interruptible(&cxl_region_rwsem); - if (rc) + ACQUIRE(rwsem_read_intr, rwsem)(&cxl_rwsem.region); + if ((rc = ACQUIRE_ERR(rwsem_read_intr, &rwsem))) return rc; if (cxld->region) - rc = sysfs_emit(buf, "%s\n", dev_name(&cxld->region->dev)); - else - rc = sysfs_emit(buf, "\n"); - up_read(&cxl_region_rwsem); - - return rc; + return sysfs_emit(buf, "%s\n", dev_name(&cxld->region->dev)); + return sysfs_emit(buf, "\n"); } DEVICE_ATTR_RO(region); @@ -3037,7 +2997,7 @@ static int cxl_pmem_region_alloc(struct cxl_region *cxlr) struct device *dev; int i; - guard(rwsem_read)(&cxl_region_rwsem); + guard(rwsem_read)(&cxl_rwsem.region); if (p->state != CXL_CONFIG_COMMIT) return -ENXIO; @@ -3049,7 +3009,7 @@ static int cxl_pmem_region_alloc(struct cxl_region *cxlr) cxlr_pmem->hpa_range.start = p->res->start; cxlr_pmem->hpa_range.end = p->res->end; - /* Snapshot the region configuration underneath the cxl_region_rwsem */ + /* Snapshot the region configuration underneath the cxl_rwsem.region */ cxlr_pmem->nr_mappings = p->nr_targets; for (i = 0; i < p->nr_targets; i++) { struct cxl_endpoint_decoder *cxled = p->targets[i]; @@ -3126,7 +3086,7 @@ static struct cxl_dax_region *cxl_dax_region_alloc(struct cxl_region *cxlr) struct cxl_dax_region *cxlr_dax; struct device *dev; - guard(rwsem_read)(&cxl_region_rwsem); + guard(rwsem_read)(&cxl_rwsem.region); if (p->state != CXL_CONFIG_COMMIT) return ERR_PTR(-ENXIO); @@ -3326,7 +3286,7 @@ static int match_region_by_range(struct device *dev, const void *data) cxlr = to_cxl_region(dev); p = &cxlr->params; - guard(rwsem_read)(&cxl_region_rwsem); + guard(rwsem_read)(&cxl_rwsem.region); if (p->res && p->res->start == r->start && p->res->end == r->end) return 1; @@ -3386,7 +3346,7 @@ static int __construct_region(struct cxl_region *cxlr, struct resource *res; int rc; - guard(rwsem_write)(&cxl_region_rwsem); + guard(rwsem_write)(&cxl_rwsem.region); p = &cxlr->params; if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE) { dev_err(cxlmd->dev.parent, @@ -3522,10 +3482,10 @@ int cxl_add_to_region(struct cxl_endpoint_decoder *cxled) attach_target(cxlr, cxled, -1, TASK_UNINTERRUPTIBLE); - down_read(&cxl_region_rwsem); - p = &cxlr->params; - attach = p->state == CXL_CONFIG_COMMIT; - up_read(&cxl_region_rwsem); + scoped_guard(rwsem_read, &cxl_rwsem.region) { + p = &cxlr->params; + attach = p->state == CXL_CONFIG_COMMIT; + } if (attach) { /* @@ -3550,7 +3510,7 @@ u64 cxl_port_get_spa_cache_alias(struct cxl_port *endpoint, u64 spa) if (!endpoint) return ~0ULL; - guard(rwsem_write)(&cxl_region_rwsem); + guard(rwsem_write)(&cxl_rwsem.region); xa_for_each(&endpoint->regions, index, iter) { struct cxl_region_params *p = &iter->region->params; @@ -3592,30 +3552,23 @@ static int cxl_region_can_probe(struct cxl_region *cxlr) struct cxl_region_params *p = &cxlr->params; int rc; - rc = down_read_interruptible(&cxl_region_rwsem); - if (rc) { + ACQUIRE(rwsem_read_intr, rwsem)(&cxl_rwsem.region); + if ((rc = ACQUIRE_ERR(rwsem_read_intr, &rwsem))) { dev_dbg(&cxlr->dev, "probe interrupted\n"); return rc; } if (p->state < CXL_CONFIG_COMMIT) { dev_dbg(&cxlr->dev, "config state: %d\n", p->state); - rc = -ENXIO; - goto out; + return -ENXIO; } if (test_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags)) { dev_err(&cxlr->dev, "failed to activate, re-commit region and retry\n"); - rc = -ENXIO; - goto out; + return -ENXIO; } -out: - up_read(&cxl_region_rwsem); - - if (rc) - return rc; return 0; } diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 3f1695c96abc..50799a681231 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -469,7 +469,7 @@ enum cxl_config_state { * @nr_targets: number of targets * @cache_size: extended linear cache size if exists, otherwise zero. * - * State transitions are protected by the cxl_region_rwsem + * State transitions are protected by cxl_rwsem.region */ struct cxl_region_params { enum cxl_config_state state; @@ -912,15 +912,4 @@ bool cxl_endpoint_decoder_reset_detected(struct cxl_port *port); #endif u16 cxl_gpf_get_dvsec(struct device *dev); - -static inline struct rw_semaphore *rwsem_read_intr_acquire(struct rw_semaphore *rwsem) -{ - if (down_read_interruptible(rwsem)) - return NULL; - - return rwsem; -} - -DEFINE_FREE(rwsem_read_release, struct rw_semaphore *, if (_T) up_read(_T)) - #endif /* __CXL_H__ */ diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h index c810deb88d13..cbafdc12e743 100644 --- a/include/linux/rwsem.h +++ b/include/linux/rwsem.h @@ -244,6 +244,7 @@ DEFINE_GUARD_COND(rwsem_read, _intr, down_read_interruptible(_T), _RET == 0) DEFINE_GUARD(rwsem_write, struct rw_semaphore *, down_write(_T), up_write(_T)) DEFINE_GUARD_COND(rwsem_write, _try, down_write_trylock(_T)) +DEFINE_GUARD_COND(rwsem_write, _kill, down_write_killable(_T), _RET == 0) /* * downgrade write lock to read lock From b2df55a98672f4be076ff69d0f0d0b1fc81f2044 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Thu, 17 Jul 2025 09:30:36 -0700 Subject: [PATCH 21/28] cleanup: Fix documentation build error for ACQUIRE updates Stephen reports: Documentation/core-api/cleanup:7: include/linux/cleanup.h:73: ERROR: Unexpected indentation. [docutils] Documentation/core-api/cleanup:7: include/linux/cleanup.h:74: WARNING: Block quote ends without a blank line; unexpected unindent. [docutils] Which points out that the ACQUIRE() example in cleanup.h missed the "::" suffix to mark the following text as a code-block. Fixes: 857d18f23ab1 ("cleanup: Introduce ACQUIRE() and ACQUIRE_ERR() for conditional locks") Reported-by: Stephen Rothwell Closes: http://lore.kernel.org/20250717173354.34375751@canb.auug.org.au Signed-off-by: Dan Williams Acked-by: Randy Dunlap Tested-by: Randy Dunlap Link: https://patch.msgid.link/20250717163036.1275791-1-dan.j.williams@intel.com Signed-off-by: Dave Jiang --- include/linux/cleanup.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/include/linux/cleanup.h b/include/linux/cleanup.h index 4eb83dd71cfe..0fb796db4811 100644 --- a/include/linux/cleanup.h +++ b/include/linux/cleanup.h @@ -64,8 +64,7 @@ * the remainder of "func()". * * The ACQUIRE() macro can be used in all places that guard() can be - * used and additionally support conditional locks - * + * used and additionally support conditional locks:: * * DEFINE_GUARD_COND(pci_dev, _try, pci_dev_trylock(_T)) * ... From 3796f2985c267b90052613cf0b379e51c61e9367 Mon Sep 17 00:00:00 2001 From: Li Zhijian Date: Thu, 17 Jul 2025 11:12:51 +0800 Subject: [PATCH 22/28] cxl: Fix -Werror=return-type in cxl_decoder_detach() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fix following compiling errors: In file included from ../drivers/cxl/core/pmu.c:10: ../drivers/cxl/core/core.h: In function ‘cxl_decoder_detach’: ../drivers/cxl/core/core.h:65:1: error: no return statement in function returning non-void [-Werror=return-type] } ^ cc1: some warnings being treated as errors CC [M] drivers/nvdimm/claim.o make[6]: *** [../scripts/Makefile.build:287: drivers/cxl/core/pmu.o] Error 1 make[6]: *** Waiting for unfinished jobs.... CC [M] drivers/infiniband/core/verbs.o Fixes: b3a88225519c ("cxl/region: Consolidate cxl_decoder_kill_region() and cxl_region_detach()") Signed-off-by: Li Zhijian Link: https://patch.msgid.link/20250717031251.1043825-1-lizhijian@fujitsu.com Signed-off-by: Dave Jiang --- drivers/cxl/core/core.h | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h index 705a5f09aa78..2669f251d677 100644 --- a/drivers/cxl/core/core.h +++ b/drivers/cxl/core/core.h @@ -62,6 +62,7 @@ static inline int cxl_decoder_detach(struct cxl_region *cxlr, struct cxl_endpoint_decoder *cxled, int pos, enum cxl_detach_mode mode) { + return 0; } static inline int cxl_region_init(void) { From 1f4f8166110f037f15a89c2203ff887b98a8393a Mon Sep 17 00:00:00 2001 From: Shiju Jose Date: Thu, 17 Jul 2025 11:18:14 +0100 Subject: [PATCH 23/28] cxl/events: Update Common Event Record to CXL spec rev 3.2 CXL spec 3.2 section 8.2.10.2.1 Table 8-55, Common Event Record format defined new fields LD-ID and Head ID. LD-ID: ID of logical device from where the event originated, which is valid only if LD-ID valid flag is set to 1. CXL spec 3.2 Section 2.4 describes, a Type 3 Multi-Logical Device (MLD) can partition its resources into up to 16 isolated Logical Devices. Each Logical Device is identified by a Logical Device Identifier (LD-ID) in CXL.mem and CXL.io protocols. LD-ID is a 16-bit Logical Device identifier applicable for CXL.io and CXL.mem requests and responses. CXL.mem supports only the lower 4 bits of LD-ID and therefore can support up to 16 unique LD-ID values over the link. Requests and responses forwarded over an MLD Port are tagged with LD-ID. Head ID: ID of the device head, from where the event originated, which is valid only if head valid flag is set to 1. Add updates for the above spec changes in the CXL events record and CXL common trace event implementation. Reviewed-by: Dave Jiang Reviewed-by: Jonathan Cameron Signed-off-by: Shiju Jose Link: https://patch.msgid.link/20250717101817.2104-2-shiju.jose@huawei.com Signed-off-by: Dave Jiang --- drivers/cxl/core/trace.h | 18 ++++++++++++++---- include/cxl/event.h | 4 +++- 2 files changed, 17 insertions(+), 5 deletions(-) diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h index 25ebfbc1616c..a77487a257b3 100644 --- a/drivers/cxl/core/trace.h +++ b/drivers/cxl/core/trace.h @@ -214,12 +214,16 @@ TRACE_EVENT(cxl_overflow, #define CXL_EVENT_RECORD_FLAG_PERF_DEGRADED BIT(4) #define CXL_EVENT_RECORD_FLAG_HW_REPLACE BIT(5) #define CXL_EVENT_RECORD_FLAG_MAINT_OP_SUB_CLASS_VALID BIT(6) +#define CXL_EVENT_RECORD_FLAG_LD_ID_VALID BIT(7) +#define CXL_EVENT_RECORD_FLAG_HEAD_ID_VALID BIT(8) #define show_hdr_flags(flags) __print_flags(flags, " | ", \ { CXL_EVENT_RECORD_FLAG_PERMANENT, "PERMANENT_CONDITION" }, \ { CXL_EVENT_RECORD_FLAG_MAINT_NEEDED, "MAINTENANCE_NEEDED" }, \ { CXL_EVENT_RECORD_FLAG_PERF_DEGRADED, "PERFORMANCE_DEGRADED" }, \ { CXL_EVENT_RECORD_FLAG_HW_REPLACE, "HARDWARE_REPLACEMENT_NEEDED" }, \ - { CXL_EVENT_RECORD_FLAG_MAINT_OP_SUB_CLASS_VALID, "MAINT_OP_SUB_CLASS_VALID" } \ + { CXL_EVENT_RECORD_FLAG_MAINT_OP_SUB_CLASS_VALID, "MAINT_OP_SUB_CLASS_VALID" }, \ + { CXL_EVENT_RECORD_FLAG_LD_ID_VALID, "LD_ID_VALID" }, \ + { CXL_EVENT_RECORD_FLAG_HEAD_ID_VALID, "HEAD_ID_VALID" } \ ) /* @@ -247,7 +251,9 @@ TRACE_EVENT(cxl_overflow, __field(u64, hdr_timestamp) \ __field(u8, hdr_length) \ __field(u8, hdr_maint_op_class) \ - __field(u8, hdr_maint_op_sub_class) + __field(u8, hdr_maint_op_sub_class) \ + __field(u16, hdr_ld_id) \ + __field(u8, hdr_head_id) #define CXL_EVT_TP_fast_assign(cxlmd, l, hdr) \ __assign_str(memdev); \ @@ -260,18 +266,22 @@ TRACE_EVENT(cxl_overflow, __entry->hdr_related_handle = le16_to_cpu((hdr).related_handle); \ __entry->hdr_timestamp = le64_to_cpu((hdr).timestamp); \ __entry->hdr_maint_op_class = (hdr).maint_op_class; \ - __entry->hdr_maint_op_sub_class = (hdr).maint_op_sub_class + __entry->hdr_maint_op_sub_class = (hdr).maint_op_sub_class; \ + __entry->hdr_ld_id = le16_to_cpu((hdr).ld_id); \ + __entry->hdr_head_id = (hdr).head_id #define CXL_EVT_TP_printk(fmt, ...) \ TP_printk("memdev=%s host=%s serial=%lld log=%s : time=%llu uuid=%pUb " \ "len=%d flags='%s' handle=%x related_handle=%x " \ - "maint_op_class=%u maint_op_sub_class=%u : " fmt, \ + "maint_op_class=%u maint_op_sub_class=%u " \ + "ld_id=%x head_id=%x : " fmt, \ __get_str(memdev), __get_str(host), __entry->serial, \ cxl_event_log_type_str(__entry->log), \ __entry->hdr_timestamp, &__entry->hdr_uuid, __entry->hdr_length,\ show_hdr_flags(__entry->hdr_flags), __entry->hdr_handle, \ __entry->hdr_related_handle, __entry->hdr_maint_op_class, \ __entry->hdr_maint_op_sub_class, \ + __entry->hdr_ld_id, __entry->hdr_head_id, \ ##__VA_ARGS__) TRACE_EVENT(cxl_generic_event, diff --git a/include/cxl/event.h b/include/cxl/event.h index f9ae1796da85..f4cb8568566b 100644 --- a/include/cxl/event.h +++ b/include/cxl/event.h @@ -19,7 +19,9 @@ struct cxl_event_record_hdr { __le64 timestamp; u8 maint_op_class; u8 maint_op_sub_class; - u8 reserved[14]; + __le16 ld_id; + u8 head_id; + u8 reserved[11]; } __packed; struct cxl_event_media_hdr { From cd3b36cfc659306456d3cf3714c8856307693c01 Mon Sep 17 00:00:00 2001 From: Shiju Jose Date: Thu, 17 Jul 2025 11:18:15 +0100 Subject: [PATCH 24/28] cxl/events: Add extra validity checks for corrected memory error count in General Media Event Record According to the CXL Specification Revision 3.2, Section 8.2.10.2.1.1, Table 8-57 (General Media Event Record), the Corrected Memory Error Count field is valid under the following conditions: 1. The Threshold Event bit is set in the Memory Event Descriptor field, and 2. The Corrected Memory Error Count must be greater than 0 for events where the Advanced Programmable Threshold Counter has expired. Additionally, if the Advanced Programmable Corrected Memory Error Counter Expire bit in the Memory Event Type field is set, then the Threshold Event bit in the Memory Event Descriptor field shall also be set. Add validity checks for the above conditions while reporting the event to the userspace. Reviewed-by: Jonathan Cameron Reviewed-by: Dave Jiang Signed-off-by: Shiju Jose Link: https://patch.msgid.link/20250717101817.2104-3-shiju.jose@huawei.com Signed-off-by: Dave Jiang --- drivers/cxl/core/mbox.c | 9 +++++++++ drivers/cxl/core/trace.h | 5 ++++- 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index 2689e6453c5a..ba4a29afd3aa 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -926,6 +926,15 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd, if (cxl_store_rec_gen_media((struct cxl_memdev *)cxlmd, evt)) dev_dbg(&cxlmd->dev, "CXL store rec_gen_media failed\n"); + if (evt->gen_media.media_hdr.descriptor & + CXL_GMER_EVT_DESC_THRESHOLD_EVENT) + WARN_ON_ONCE((evt->gen_media.media_hdr.type & + CXL_GMER_MEM_EVT_TYPE_AP_CME_COUNTER_EXPIRE) && + !get_unaligned_le24(evt->gen_media.cme_count)); + else + WARN_ON_ONCE(evt->gen_media.media_hdr.type & + CXL_GMER_MEM_EVT_TYPE_AP_CME_COUNTER_EXPIRE); + trace_cxl_general_media(cxlmd, type, cxlr, hpa, hpa_alias, &evt->gen_media); } else if (event_type == CXL_CPER_EVENT_DRAM) { diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h index a77487a257b3..c38f94ca0ca1 100644 --- a/drivers/cxl/core/trace.h +++ b/drivers/cxl/core/trace.h @@ -506,7 +506,10 @@ TRACE_EVENT(cxl_general_media, uuid_copy(&__entry->region_uuid, &uuid_null); } __entry->cme_threshold_ev_flags = rec->cme_threshold_ev_flags; - __entry->cme_count = get_unaligned_le24(rec->cme_count); + if (rec->media_hdr.descriptor & CXL_GMER_EVT_DESC_THRESHOLD_EVENT) + __entry->cme_count = get_unaligned_le24(rec->cme_count); + else + __entry->cme_count = 0; ), CXL_EVT_TP_printk("dpa=%llx dpa_flags='%s' " \ From d8145bb8af5c09d27c4dde4f4030d589771594d1 Mon Sep 17 00:00:00 2001 From: Shiju Jose Date: Thu, 17 Jul 2025 11:18:16 +0100 Subject: [PATCH 25/28] cxl/events: Add extra validity checks for CVME count in DRAM Event Record According to the CXL Specification Revision 3.2, Section 8.2.10.2.1.2, Table 8-58 (DRAM Event Record), the CVME (Corrected Volatile Memory Error) Count field is valid under the following conditions: 1. The Threshold Event bit is set in the Memory Event Descriptor field, and 2. The CVME Count must be greater than 0 for events where the Advanced Programmable Threshold Counter has expired. Additionally, if the Advanced Programmable Corrected Memory Error Counter Expire bit in the Memory Event Type field is set, then the Threshold Event bit in the Memory Event Descriptor field shall also be set. Add validity checks for the above conditions while reporting the event to the userspace. Reviewed-by: Jonathan Cameron Reviewed-by: Dave Jiang Signed-off-by: Shiju Jose Link: https://patch.msgid.link/20250717101817.2104-4-shiju.jose@huawei.com Signed-off-by: Dave Jiang --- drivers/cxl/core/mbox.c | 9 +++++++++ drivers/cxl/core/trace.h | 5 ++++- 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index ba4a29afd3aa..445889b128cd 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -941,6 +941,15 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd, if (cxl_store_rec_dram((struct cxl_memdev *)cxlmd, evt)) dev_dbg(&cxlmd->dev, "CXL store rec_dram failed\n"); + if (evt->dram.media_hdr.descriptor & + CXL_GMER_EVT_DESC_THRESHOLD_EVENT) + WARN_ON_ONCE((evt->dram.media_hdr.type & + CXL_DER_MEM_EVT_TYPE_AP_CME_COUNTER_EXPIRE) && + !get_unaligned_le24(evt->dram.cvme_count)); + else + WARN_ON_ONCE(evt->dram.media_hdr.type & + CXL_DER_MEM_EVT_TYPE_AP_CME_COUNTER_EXPIRE); + trace_cxl_dram(cxlmd, type, cxlr, hpa, hpa_alias, &evt->dram); } diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h index c38f94ca0ca1..462c2e892ba2 100644 --- a/drivers/cxl/core/trace.h +++ b/drivers/cxl/core/trace.h @@ -661,7 +661,10 @@ TRACE_EVENT(cxl_dram, CXL_EVENT_GEN_MED_COMP_ID_SIZE); __entry->sub_channel = rec->sub_channel; __entry->cme_threshold_ev_flags = rec->cme_threshold_ev_flags; - __entry->cvme_count = get_unaligned_le24(rec->cvme_count); + if (rec->media_hdr.descriptor & CXL_GMER_EVT_DESC_THRESHOLD_EVENT) + __entry->cvme_count = get_unaligned_le24(rec->cvme_count); + else + __entry->cvme_count = 0; ), CXL_EVT_TP_printk("dpa=%llx dpa_flags='%s' descriptor='%s' type='%s' sub_type='%s' " \ From f10f46a0ee53420f707195fe33b7c235a1c0e48a Mon Sep 17 00:00:00 2001 From: Shiju Jose Date: Thu, 17 Jul 2025 11:18:17 +0100 Subject: [PATCH 26/28] cxl/events: Trace Memory Sparing Event Record CXL rev 3.2 section 8.2.10.2.1.4 Table 8-60 defines the Memory Sparing Event Record. Determine if the event read is memory sparing record and if so trace the record. Memory device shall produce a memory sparing event record 1. After completion of a PPR maintenance operation if the memory sparing event record enable bit is set (Field: sPPR/hPPR Operation Mode in Table 8-128/Table 8-131). 2. In response to a query request by the host (see section 8.2.10.7.1.4) to determine the availability of sparing resources. The device shall report the resource availability by producing the Memory Sparing Event Record (see Table 8-60) in which the channel, rank, nibble mask, bank group, bank, row, column, sub-channel fields are a copy of the values specified in the request. If the controller does not support reporting whether a resource is available, and a perform maintenance operation for memory sparing is issued with query resources set to 1, the controller shall return invalid input. Example trace log for produce memory sparing event record on completion of a soft PPR operation, cxl_memory_sparing: memdev=mem1 host=0000:0f:00.0 serial=3 log=Informational : time=55045163029 uuid=e71f3a40-2d29-4092-8a39-4d1c966c7c65 len=128 flags='0x1' handle=1 related_handle=0 maint_op_class=2 maint_op_sub_class=1 ld_id=0 head_id=0 : flags='' result=0 validity_flags='CHANNEL|RANK|NIBBLE|BANK GROUP|BANK|ROW|COLUMN' spare resource avail=1 channel=2 rank=5 nibble_mask=a59c bank_group=2 bank=4 row=13 column=23 sub_channel=0 comp_id=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 comp_id_pldm_valid_flags='' pldm_entity_id=0x00 pldm_resource_id=0x00 Note: For memory sparing event record, fields 'maintenance operation class' and 'maintenance operation subclass' are defined twice, first in the common event record (Table 8-55) and second in the memory sparing event record (Table 8-60). Thus those in the sparing event record coded as reserved, to be removed when the spec is updated. Reviewed-by: Jonathan Cameron Reviewed-by: Dave Jiang Signed-off-by: Shiju Jose Link: https://patch.msgid.link/20250717101817.2104-5-shiju.jose@huawei.com Signed-off-by: Dave Jiang --- drivers/cxl/core/mbox.c | 6 +++ drivers/cxl/core/trace.h | 105 +++++++++++++++++++++++++++++++++++++++ drivers/cxl/cxlmem.h | 8 +++ include/cxl/event.h | 33 ++++++++++++ 4 files changed, 152 insertions(+) diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index 445889b128cd..f7e081c00c49 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -899,6 +899,10 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd, trace_cxl_generic_event(cxlmd, type, uuid, &evt->generic); return; } + if (event_type == CXL_CPER_EVENT_MEM_SPARING) { + trace_cxl_memory_sparing(cxlmd, type, &evt->mem_sparing); + return; + } if (trace_cxl_general_media_enabled() || trace_cxl_dram_enabled()) { u64 dpa, hpa = ULLONG_MAX, hpa_alias = ULLONG_MAX; @@ -970,6 +974,8 @@ static void __cxl_event_trace_record(const struct cxl_memdev *cxlmd, ev_type = CXL_CPER_EVENT_DRAM; else if (uuid_equal(uuid, &CXL_EVENT_MEM_MODULE_UUID)) ev_type = CXL_CPER_EVENT_MEM_MODULE; + else if (uuid_equal(uuid, &CXL_EVENT_MEM_SPARING_UUID)) + ev_type = CXL_CPER_EVENT_MEM_SPARING; cxl_event_trace_record(cxlmd, type, ev_type, uuid, &record->event); } diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h index 462c2e892ba2..a53ec4798b12 100644 --- a/drivers/cxl/core/trace.h +++ b/drivers/cxl/core/trace.h @@ -887,6 +887,111 @@ TRACE_EVENT(cxl_memory_module, ) ); +/* + * Memory Sparing Event Record - MSER + * + * CXL rev 3.2 section 8.2.10.2.1.4; Table 8-60 + */ +#define CXL_MSER_QUERY_RESOURCE_FLAG BIT(0) +#define CXL_MSER_HARD_SPARING_FLAG BIT(1) +#define CXL_MSER_DEV_INITED_FLAG BIT(2) +#define show_mem_sparing_flags(flags) __print_flags(flags, "|", \ + { CXL_MSER_QUERY_RESOURCE_FLAG, "Query Resources" }, \ + { CXL_MSER_HARD_SPARING_FLAG, "Hard Sparing" }, \ + { CXL_MSER_DEV_INITED_FLAG, "Device Initiated Sparing" } \ +) + +#define CXL_MSER_VALID_CHANNEL BIT(0) +#define CXL_MSER_VALID_RANK BIT(1) +#define CXL_MSER_VALID_NIBBLE BIT(2) +#define CXL_MSER_VALID_BANK_GROUP BIT(3) +#define CXL_MSER_VALID_BANK BIT(4) +#define CXL_MSER_VALID_ROW BIT(5) +#define CXL_MSER_VALID_COLUMN BIT(6) +#define CXL_MSER_VALID_COMPONENT_ID BIT(7) +#define CXL_MSER_VALID_COMPONENT_ID_FORMAT BIT(8) +#define CXL_MSER_VALID_SUB_CHANNEL BIT(9) +#define show_mem_sparing_valid_flags(flags) __print_flags(flags, "|", \ + { CXL_MSER_VALID_CHANNEL, "CHANNEL" }, \ + { CXL_MSER_VALID_RANK, "RANK" }, \ + { CXL_MSER_VALID_NIBBLE, "NIBBLE" }, \ + { CXL_MSER_VALID_BANK_GROUP, "BANK GROUP" }, \ + { CXL_MSER_VALID_BANK, "BANK" }, \ + { CXL_MSER_VALID_ROW, "ROW" }, \ + { CXL_MSER_VALID_COLUMN, "COLUMN" }, \ + { CXL_MSER_VALID_COMPONENT_ID, "COMPONENT ID" }, \ + { CXL_MSER_VALID_COMPONENT_ID_FORMAT, "COMPONENT ID PLDM FORMAT" }, \ + { CXL_MSER_VALID_SUB_CHANNEL, "SUB CHANNEL" } \ +) + +TRACE_EVENT(cxl_memory_sparing, + + TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log, + struct cxl_event_mem_sparing *rec), + + TP_ARGS(cxlmd, log, rec), + + TP_STRUCT__entry( + CXL_EVT_TP_entry + + /* Memory Sparing Event */ + __field(u8, flags) + __field(u8, result) + __field(u16, validity_flags) + __field(u16, res_avail) + __field(u8, channel) + __field(u8, rank) + __field(u32, nibble_mask) + __field(u8, bank_group) + __field(u8, bank) + __field(u32, row) + __field(u16, column) + __field(u8, sub_channel) + __array(u8, comp_id, CXL_EVENT_GEN_MED_COMP_ID_SIZE) + ), + + TP_fast_assign( + CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr); + __entry->hdr_uuid = CXL_EVENT_MEM_SPARING_UUID; + + /* Memory Sparing Event */ + __entry->flags = rec->flags; + __entry->result = rec->result; + __entry->validity_flags = le16_to_cpu(rec->validity_flags); + __entry->res_avail = le16_to_cpu(rec->res_avail); + __entry->channel = rec->channel; + __entry->rank = rec->rank; + __entry->nibble_mask = get_unaligned_le24(rec->nibble_mask); + __entry->bank_group = rec->bank_group; + __entry->bank = rec->bank; + __entry->row = get_unaligned_le24(rec->row); + __entry->column = le16_to_cpu(rec->column); + __entry->sub_channel = rec->sub_channel; + memcpy(__entry->comp_id, &rec->component_id, + CXL_EVENT_GEN_MED_COMP_ID_SIZE); + ), + + CXL_EVT_TP_printk("flags='%s' result=%u validity_flags='%s' " \ + "spare resource avail=%u channel=%u rank=%u " \ + "nibble_mask=%x bank_group=%u bank=%u " \ + "row=%u column=%u sub_channel=%u " \ + "comp_id=%s comp_id_pldm_valid_flags='%s' " \ + "pldm_entity_id=%s pldm_resource_id=%s", + show_mem_sparing_flags(__entry->flags), + __entry->result, + show_mem_sparing_valid_flags(__entry->validity_flags), + __entry->res_avail, __entry->channel, __entry->rank, + __entry->nibble_mask, __entry->bank_group, __entry->bank, + __entry->row, __entry->column, __entry->sub_channel, + __print_hex(__entry->comp_id, CXL_EVENT_GEN_MED_COMP_ID_SIZE), + show_comp_id_pldm_flags(__entry->comp_id[0]), + show_pldm_entity_id(__entry->validity_flags, CXL_MSER_VALID_COMPONENT_ID, + CXL_MSER_VALID_COMPONENT_ID_FORMAT, __entry->comp_id), + show_pldm_resource_id(__entry->validity_flags, CXL_MSER_VALID_COMPONENT_ID, + CXL_MSER_VALID_COMPONENT_ID_FORMAT, __entry->comp_id) + ) +); + #define show_poison_trace_type(type) \ __print_symbolic(type, \ { CXL_POISON_TRACE_LIST, "List" }, \ diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index 551b0ba2caa1..f98311f357b7 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -633,6 +633,14 @@ struct cxl_mbox_identify { UUID_INIT(0xfe927475, 0xdd59, 0x4339, 0xa5, 0x86, 0x79, 0xba, 0xb1, \ 0x13, 0xb7, 0x74) +/* + * Memory Sparing Event Record UUID + * CXL rev 3.2 section 8.2.10.2.1.4: Table 8-60 + */ +#define CXL_EVENT_MEM_SPARING_UUID \ + UUID_INIT(0xe71f3a40, 0x2d29, 0x4092, 0x8a, 0x39, 0x4d, 0x1c, 0x96, \ + 0x6c, 0x7c, 0x65) + /* * Get Event Records output payload * CXL rev 3.0 section 8.2.9.2.2; Table 8-50 diff --git a/include/cxl/event.h b/include/cxl/event.h index f4cb8568566b..6fd90f9cc203 100644 --- a/include/cxl/event.h +++ b/include/cxl/event.h @@ -110,11 +110,43 @@ struct cxl_event_mem_module { u8 reserved[0x2a]; } __packed; +/* + * Memory Sparing Event Record - MSER + * CXL rev 3.2 section 8.2.10.2.1.4; Table 8-60 + */ +struct cxl_event_mem_sparing { + struct cxl_event_record_hdr hdr; + /* + * The fields maintenance operation class and maintenance operation + * subclass defined in the Memory Sparing Event Record are the + * duplication of the same in the common event record. Thus defined + * as reserved and to be removed after the spec correction. + */ + u8 rsv1; + u8 rsv2; + u8 flags; + u8 result; + __le16 validity_flags; + u8 reserved1[6]; + __le16 res_avail; + u8 channel; + u8 rank; + u8 nibble_mask[3]; + u8 bank_group; + u8 bank; + u8 row[3]; + __le16 column; + u8 component_id[CXL_EVENT_GEN_MED_COMP_ID_SIZE]; + u8 sub_channel; + u8 reserved2[0x25]; +} __packed; + union cxl_event { struct cxl_event_generic generic; struct cxl_event_gen_media gen_media; struct cxl_event_dram dram; struct cxl_event_mem_module mem_module; + struct cxl_event_mem_sparing mem_sparing; /* dram & gen_media event header */ struct cxl_event_media_hdr media_hdr; } __packed; @@ -133,6 +165,7 @@ enum cxl_event_type { CXL_CPER_EVENT_GEN_MEDIA, CXL_CPER_EVENT_DRAM, CXL_CPER_EVENT_MEM_MODULE, + CXL_CPER_EVENT_MEM_SPARING, }; #define CPER_CXL_DEVICE_ID_VALID BIT(0) From 49d6e658e758e42aaff8ae5ecdd2d06b29abf53e Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Fri, 18 Jul 2025 16:22:40 -0500 Subject: [PATCH 27/28] cxl/region: Fix an ERR_PTR() vs NULL bug The __cxl_decoder_detach() function is expected to return NULL on error but this error path accidentally returns an error pointer. It could potentially lead to an error pointer dereference in the caller. Change it to return NULL. Fixes: b3a88225519c ("cxl/region: Consolidate cxl_decoder_kill_region() and cxl_region_detach()") Signed-off-by: Dan Carpenter Link: https://patch.msgid.link/7def7da0-326a-410d-8c92-718c8963c0a2@sabinyo.mountain Signed-off-by: Dave Jiang --- drivers/cxl/core/region.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index e2e9cce13cd2..e9bf42d91689 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -2108,7 +2108,7 @@ __cxl_decoder_detach(struct cxl_region *cxlr, if (pos >= p->interleave_ways) { dev_dbg(&cxlr->dev, "position %d out of range %d\n", pos, p->interleave_ways); - return ERR_PTR(-ENXIO); + return NULL; } if (!p->targets[pos]) From f11a5f89910a7ae970fbce4fdc02d86a8ba8570f Mon Sep 17 00:00:00 2001 From: Alison Schofield Date: Thu, 24 Jul 2025 15:43:06 -0700 Subject: [PATCH 28/28] Documentation/ABI/testing/debugfs-cxl: Add 'cxl' to clear_poison path 'cxl' is missing from the path to the clear_poison attribute. Add it. Signed-off-by: Alison Schofield Reviewed-by: Jonathan Cameron Link: https://patch.msgid.link/20250724224308.2101255-1-alison.schofield@intel.com Signed-off-by: Dave Jiang --- Documentation/ABI/testing/debugfs-cxl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/ABI/testing/debugfs-cxl b/Documentation/ABI/testing/debugfs-cxl index 12488c14be64..e95e21f131e9 100644 --- a/Documentation/ABI/testing/debugfs-cxl +++ b/Documentation/ABI/testing/debugfs-cxl @@ -20,7 +20,7 @@ Description: visible for devices supporting the capability. -What: /sys/kernel/debug/memX/clear_poison +What: /sys/kernel/debug/cxl/memX/clear_poison Date: April, 2023 KernelVersion: v6.4 Contact: linux-cxl@vger.kernel.org