- Add sysfs 'reset_subordinate' to reset hierarchy below bridge (Keith
Busch)
- Warn if we reset a running device where driver didn't register
pci_error_handlers notification callbacks (Keith Busch)
* pci/reset:
PCI: Warn if a running device is unaware of reset
PCI: Add 'reset_subordinate' to reset hierarchy below bridge
Bindings:
- Enable dtc "interrupt_provider" warnings for binding examples.
Fix the warnings in fsl,mu-msi and ti,sci-inta due to this.
- Convert zii,rave-sp-wdt, zii,rave-sp-pwrbutton, and
altr,fpga-passive-serial to DT schema format
- Add some documentation on the different forms of YAML text blocks
which are a constant source of review comments
- Fix some schema errors in constraints for arrays
- Add compatibles for qcom,sar2130p-pdc and onnn,adt7462
DT core:
- Allow overlay kunit tests to run CONFIG_OF_OVERLAY=n
- Add some warnings on deprecated address handling
- Rework early_init_dt_scan() so the arch can pass in the phys address
of the DTB as __pa() is not always valid to use. This fixes a warning
for arm64 with kexec.
- Add and use some new DT graph iterators for iterating over ports and
endpoints
- Rework reserved-memory handling to be sized dynamically for fixed
regions
- Optimize of_modalias() to avoid a strlen() call
- Constify struct device_node and property pointers where ever possible
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAmc7qaoACgkQ+vtdtY28
YcN54g/+Ifz4hQTSWV+VBhihovMMPiQUdxZ+MfJfPnPcZ7NJzaTf+zqhZyS4wQou
v0pdtyR0B1fCM/EvKaYD+1aTTAQFEIT5Dqac+9ePwqaYqSk+yCTxyzW9m+P3rTPV
THo8SGRss7T+Rs+2WaUGxphTJItMGIRdbBvoqK+82EdKFXXKw2BSD8tlJTWwbTam
9xkrpUzw7f4FvVY8vVhRyOd5i8/v+FH8D65DMIT6ME9zRn4MzKVzCg6udgYeCBld
C2XbV+wnyewtjrN2IX+2uQ2mheb7yJu3AEI3iFR5x/sRrsSLpisxrUl38xOOpxrM
XxYtHgE3omjagQ+y+L2PMthlKvhFrXVXIvhUH8xxje5z1Vyq3VMfiABkHlMpAnys
5LY4xEhvqDkPNo65UmjMiHxGW/xtcKsmAZBOp+HLerZfCJIFvl380fi8mNg/Sjvz
7ExCSpzCPsHASZg7QCTplU3BUtg+067Ch/k8Hsn/Og73Pqm3xH4IezQZKwweN9ZT
LC6OQBI7C3Yt1hom9qgUcA4H4/aaPxTVV7i0DGuAKh8Lon6SaoX2yFpweUBgbsL/
c9DIW4vbYBIGASxxUbHlNMKvPCKACKmpFXhsnH5Waj+VWSOwsJ8bjGpH8PfMKdFW
dyJB/r94GqCGpCW7+FC1qGmXiQJGkCo89pKBVjSf4Kj45ht/76o=
=NCYS
-----END PGP SIGNATURE-----
Merge tag 'devicetree-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree updates from Rob Herring:
"Bindings:
- Enable dtc "interrupt_provider" warnings for binding examples. Fix
the warnings in fsl,mu-msi and ti,sci-inta due to this.
- Convert zii,rave-sp-wdt, zii,rave-sp-pwrbutton, and
altr,fpga-passive-serial to DT schema format
- Add some documentation on the different forms of YAML text blocks
which are a constant source of review comments
- Fix some schema errors in constraints for arrays
- Add compatibles for qcom,sar2130p-pdc and onnn,adt7462
DT core:
- Allow overlay kunit tests to run CONFIG_OF_OVERLAY=n
- Add some warnings on deprecated address handling
- Rework early_init_dt_scan() so the arch can pass in the phys
address of the DTB as __pa() is not always valid to use. This fixes
a warning for arm64 with kexec.
- Add and use some new DT graph iterators for iterating over ports
and endpoints
- Rework reserved-memory handling to be sized dynamically for fixed
regions
- Optimize of_modalias() to avoid a strlen() call
- Constify struct device_node and property pointers where ever
possible"
* tag 'devicetree-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (36 commits)
of: Allow overlay kunit tests to run CONFIG_OF_OVERLAY=n
dt-bindings: interrupt-controller: qcom,pdc: Add SAR2130P compatible
of/address: Rework bus matching to avoid warnings
of: WARN on deprecated #address-cells/#size-cells handling
of/fdt: Don't use default address cell sizes for address translation
dt-bindings: Enable dtc "interrupt_provider" warnings
of/fdt: add dt_phys arg to early_init_dt_scan and early_init_dt_verify
dt-bindings: cache: qcom,llcc: Fix X1E80100 reg entries
dt-bindings: watchdog: convert zii,rave-sp-wdt.txt to yaml format
dt-bindings: input: convert zii,rave-sp-pwrbutton.txt to yaml
media: xilinx-tpg: use new of_graph functions
fbdev: omapfb: use new of_graph functions
gpu: drm: omapdrm: use new of_graph functions
ASoC: audio-graph-card2: use new of_graph functions
ASoC: audio-graph-card: use new of_graph functions
ASoC: test-component: use new of_graph functions
of: property: use new of_graph functions
of: property: add of_graph_get_next_port_endpoint()
of: property: add of_graph_get_next_port()
of: module: remove strlen() call in of_modalias()
...
This mostly reverts the commit b4c7d2076b ("PCI/LINK: Remove bandwidth
notification"). An upcoming commit extends this driver building PCIe
bandwidth controller on top of it.
PCIe bandwidth notifications were first added in the commit e8303bb7a7
("PCI/LINK: Report degraded links via link bandwidth notification") but
later had to be removed. The significant changes compared with the old
bandwidth notification driver include:
1) Don't print the notifications into kernel log, just keep the Link
Speed cached in struct pci_bus updated. While somewhat unfortunate,
the log spam was the source of complaints that eventually lead to
the removal of the bandwidth notifications driver (see the links
below for further information).
2) Besides the Link Bandwidth Management Interrupt, also enable Link
Autonomous Bandwidth Interrupt to cover the other source of bandwidth
changes.
3) Handle Link Speed updates robustly. Refresh the cached Link Speed
when enabling Bandwidth Notification Interrupts, and solve the race
between Link Speed read and LBMS/LABS update in
pcie_bwnotif_irq_thread().
4) Use concurrency safe LNKCTL RMW operations.
5) The driver is now called PCIe bwctrl (bandwidth controller) instead
of just bandwidth notifications because of increased scope and
functionality within the driver.
6) Coexist with the Target Link Speed quirk in pcie_failed_link_retrain().
Provide LBMS counting API for it.
7) Tweaks to variable/functions names for consistency and length reasons.
Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus to
keep track PCIe Link Speed changes.
[bhelgaas: This is based on previous work by Alexandru Gagniuc
<mr.nuke.me@gmail.com>; see e8303bb7a7 ("PCI/LINK: Report degraded links
via link bandwidth notification")]
Link: https://lore.kernel.org/r/20241018144755.7875-7-ilpo.jarvinen@linux.intel.com
Link: https://lore.kernel.org/all/20190429185611.121751-1-helgaas@kernel.org/
Link: https://lore.kernel.org/linux-pci/20190501142942.26972-1-keith.busch@intel.com/
Link: https://lore.kernel.org/linux-pci/20200115221008.GA191037@google.com/
Suggested-by: Lukas Wunner <lukas@wunner.de> # Building bwctrl on top of bwnotif
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: squash fix to drop IRQF_ONESHOT and convert to hardirq handler:
https://lore.kernel.org/r/20241115165717.15233-1-ilpo.jarvinen@linux.intel.com]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Stefan Wahren <wahrenst@gmx.net>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
If a reset is issued to a running device with a driver that didn't register
the notification callbacks, the driver may be unaware of this event and
have an inconsistent view of the device's state. Log a warning of this
event because there's nothing else indicating the event occured, which
could be confusing when debugging such situations.
Link: https://lore.kernel.org/r/20241025222755.3756162-2-kbusch@meta.com
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Amey Narkhede <ameynarkhede03@gmail.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
The "bus" and "cxl_bus" reset methods reset a device by asserting Secondary
Bus Reset on the bridge leading to the device. These only work if the
device is the only device below the bridge.
Add a sysfs 'reset_subordinate' attribute on bridges that can assert
Secondary Bus Reset regardless of how many devices are below the bridge.
This resets all the devices below a bridge in a single command, including
the locking and config space save/restore that reset methods normally do.
This may be the only way to reset devices that don't support other reset
methods (ACPI, FLR, PM reset, etc).
Link: https://lore.kernel.org/r/20241025222755.3756162-1-kbusch@meta.com
Signed-off-by: Keith Busch <kbusch@kernel.org>
[bhelgaas: commit log, add capable(CAP_SYS_ADMIN) check]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Amey Narkhede <ameynarkhede03@gmail.com>
The PCIe bandwidth controller added by a subsequent commit will require
selecting PCIe Link Speeds that are lower than the Maximum Link Speed.
The struct pci_bus only stores max_bus_speed. Even if PCIe r6.1 sec 8.2.1
currently disallows gaps in supported Link Speeds, the Implementation Note
in PCIe r6.1 sec 7.5.3.18, recommends determining supported Link Speeds
using the Supported Link Speeds Vector in the Link Capabilities 2 Register
(when available) to "avoid software being confused if a future
specification defines Links that do not require support for all slower
speeds."
Reuse code in pcie_get_speed_cap() to add pcie_get_supported_speeds() to
query the Supported Link Speeds Vector of a PCIe device. The value is taken
directly from the Supported Link Speeds Vector or synthesized from the Max
Link Speed in the Link Capabilities Register when the Link Capabilities 2
Register is not available.
The Supported Link Speeds Vector in the Link Capabilities Register 2
corresponds to the bus below on Root Ports and Downstream Ports, whereas it
corresponds to the bus above on Upstream Ports and Endpoints (PCIe r6.1 sec
7.5.3.18):
Supported Link Speeds Vector - This field indicates the supported Link
speed(s) of the associated Port.
Add supported_speeds into the struct pci_dev that caches the
Supported Link Speeds Vector.
supported_speeds contains a set of Link Speeds only in the case where PCIe
Link Speed can be determined. Root Complex Integrated Endpoints do not have
a well-defined Link Speed because they do not implement either of the Link
Capabilities Registers, which is allowed by PCIe r6.1 sec 7.5.3 (the same
limitation applies to determining cur_bus_speed and max_bus_speed that are
PCI_SPEED_UNKNOWN in such case). This is of no concern from PCIe bandwidth
controller point of view because such devices are not attached into a PCIe
Root Port that could be controlled.
The supported_speeds field keeps the extra reserved zero at the least
significant bit to match the Link Capabilities 2 Register layout.
An attempt was made to store supported_speeds field into the struct pci_bus
as an intersection of both ends of the Link, however, the subordinate
struct pci_bus is not available early enough. The Target Speed quirk (in
pcie_failed_link_retrain()) can run either during initial scan or later,
requiring it to use the API provided by the PCIe bandwidth controller to
set the Target Link Speed in order to co-exist with the bandwidth
controller. When the Target Speed quirk is calling the bandwidth controller
during initial scan, the struct pci_bus is not yet initialized. As such,
storing supported_speeds into the struct pci_bus is not viable.
Suggested-by: Lukas Wunner <lukas@wunner.de>
Link: https://lore.kernel.org/r/20241018144755.7875-4-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: move pcie_get_supported_speeds() decl to drivers/pci/pci.h]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
There are ACS quirks that hijack the normal ACS processing and deliver to
to special quirk code. The enable path needs to call
pci_dev_specific_enable_acs() and then pci_dev_specific_acs_enabled() will
report the hidden ACS state controlled by the quirk.
The recent rework got this out of order and we should try to call
pci_dev_specific_enable_acs() regardless of any actual ACS support in the
device.
As before command line parameters that effect standard PCI ACS don't
interact with the quirk versions, including the new config_acs= option.
Link: https://lore.kernel.org/r/0-v1-f96b686c625b+124-pci_acs_quirk_fix_jgg@nvidia.com
Fixes: 47c8846a49 ("PCI: Extend ACS configurability")
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Closes: https://lore.kernel.org/all/e89107da-ac99-4d3a-9527-a4df9986e120@kernel.org
Closes: https://bugzilla.suse.com/show_bug.cgi?id=1229019
Tested-by: Steffen Dirkwinkel <me@steffen.cc>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
pci_register_io_range() does not modify the passed in fwnode_handle, so
make it const.
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20241010-dt-const-v1-1-87a51f558425@kernel.org
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Convert open-coded resource size calculations to use
resource_set_{range,size}() helpers.
While at it, use SZ_* for size parameter where appropriate which makes the
intent of code more obvious.
Also, cast sizes to resource_size_t, not u64.
Link: https://lore.kernel.org/r/20240614100606.15830-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
In reset_method_store(), a string is allocated via kstrndup() and assigned
to the local "options". options is then used in with strsep() to find
spaces:
while ((name = strsep(&options, " ")) != NULL) {
If there are no remaining spaces, then options is set to NULL by strsep(),
so the subsequent kfree(options) doesn't free the memory allocated via
kstrndup().
Fix by using a separate tmp_options to iterate with strsep() so options is
preserved.
Link: https://lore.kernel.org/r/20241001231147.3583649-1-tkjos@google.com
Fixes: d88f521da3 ("PCI: Allow userspace to query and set device reset mechanism")
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Add support for PCIe TLP Processing Hints (TPH) support (see PCIe r6.2,
sec 6.17).
Add TPH register definitions in pci_regs.h, including the TPH Requester
capability register, TPH Requester control register, TPH Completer
capability, and the ST fields of MSI-X entry.
Introduce pcie_enable_tph() and pcie_disable_tph(), enabling drivers to
toggle TPH support and configure specific ST mode as needed. Also add a new
kernel parameter, "pci=notph", allowing users to disable TPH support across
the entire system.
Link: https://lore.kernel.org/r/20241002165954.128085-2-wei.huang2@amd.com
Co-developed-by: Jing Liu <jing2.liu@intel.com>
Co-developed-by: Paul Luse <paul.e.luse@linux.intel.com>
Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Paul Luse <paul.e.luse@linux.intel.com>
Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmbseugUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vxdwxAAvdvDyTuiPo2R8pQtvKg4YL2IUnK5
UR28mBxZDK5DFhLtD/QzmVVG/eaLY6bJHthHgJgTApzekkqU0h9dcRI0eegXrvcz
I3HRsZK2yatUky9l8O148OLzF897r7vXL3QtGe6qjKU+9D83IEeooLKgBca+GoBC
bRLvG/fYRzdjOe8UHFqCoeMIg3IOY7CNifvFOihAGpJpxfZQktj6hSKu6q7BL1Rx
NRgYlxh0eLcb7vAJqz6RZpQ8PRCwhAjlDuu0BOkES8/6EwisD1xUh3qdDxfVgNA6
FpcAb/53yr46cs4tM9ZTwluka86AskuXj3jwSKf7nE3zqr4nM9OD3sGOSYzK8UdE
EDBKj+9iEpYRC6rJMk5gNH2AZkR1OEpNUisR6+kEn81A9yNNoTmkHdHUOWo8TuxD
btc0sTM+eWApvTiZwgL4VjMZulQllV51K8tcfvODRhlMkbOPNWGWdmpWqEbUS2HU
i7+zzQC3DC5iPlAKgRSeYB0aad6la6brqPW16sGhGovNhgwbzakDLCUJJGn/LNuO
wd0UNpJTnHlfChbvNh2bBxiMOo0cab1tJ5Jp97STQYhLg2nW93s/dAfdpSAsYO4S
5YzjSADWeyeuDsHE1RdUdDvYAPMb1VZBUd2OSHis5zw7kmh25c9KYXEkDJ25q/ju
sVXK4oMNW/Gnd5M=
=L3s9
-----END PGP SIGNATURE-----
Merge tag 'pci-v6.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Wait for device readiness after reset by polling Vendor ID and
looking for Configuration RRS instead of polling the Command
register and looking for non-error completions, to avoid hardware
retries done for RRS on non-Vendor ID reads (Bjorn Helgaas)
- Rename CRS Completion Status to RRS ('Request Retry Status') to
match PCIe r6.0 spec usage (Bjorn Helgaas)
- Clear LBMS bit after a manual link retrain so we don't try to
retrain a link when there's no downstream device anymore (Maciej W.
Rozycki)
- Revert to the original link speed after retraining fails instead of
leaving it restricted to 2.5GT/s, so a future device has a chance
to use higher speeds (Maciej W. Rozycki)
- Wait for each level of downstream bus, not just the first, to
become accessible before restoring devices on that bus (Ilpo
Järvinen)
- Add ARCH_PCI_DEV_GROUPS so s390 can add its own attribute_groups
without having to stomp on the core's pdev->dev.groups (Lukas
Wunner)
Driver binding:
- Export pcim_request_region(), a managed counterpart of
pci_request_region(), for use by drivers (Philipp Stanner)
- Export pcim_iomap_region() and deprecate pcim_iomap_regions()
(Philipp Stanner)
- Request the PCI BAR used by xboxvideo (Philipp Stanner)
- Request and map drm/ast BARs with pcim_iomap_region() (Philipp
Stanner)
MSI:
- Add MSI_FLAG_NO_AFFINITY flag for devices that mux MSIs onto a
single IRQ line and cannot set the affinity of each MSI to a
specific CPU core (Marek Vasut)
- Use MSI_FLAG_NO_AFFINITY and remove unnecessary .irq_set_affinity()
implementations in aardvark, altera, brcmstb, dwc, mediatek-gen3,
mediatek, mobiveil, plda, rcar, tegra, vmd, xilinx-nwl,
xilinx-xdma, and xilinx drivers to avoid 'IRQ: set affinity failed'
warnings (Marek Vasut)
Power management:
- Add pwrctl support for ATH11K inside the WCN6855 package (Konrad
Dybcio)
PCI device hotplug:
- Remove unnecessary hpc_ops struct from shpchp (ngn)
- Check for PCI_POSSIBLE_ERROR(), not 0xffffffff, in cpqphp
(weiyufeng)
Virtualization:
- Mark Creative Labs EMU20k2 INTx masking as broken (Alex Williamson)
- Add an ACS quirk for Qualcomm SA8775P, which doesn't advertise ACS
but does provide ACS-like features (Subramanian Ananthanarayanan)
IOMMU:
- Add function 0 DMA alias quirk for Glenfly Arise audio function,
which uses the function 0 Requester ID (WangYuli)
NPEM:
- Add Native PCIe Enclosure Management (NPEM) support for sysfs
control of NVMe RAID storage indicators (ok/fail/locate/
rebuild/etc) (Mariusz Tkaczyk)
- Add support for the ACPI _DSM PCIe SSD status LED management, which
is functionally similar to NPEM but mediated by platform firmware
(Mariusz Tkaczyk)
Device trees:
- Drop minItems and maxItems from ranges in PCI generic host binding
since host bridges may have several MMIO and I/O port apertures
(Frank Li)
- Add kirin, rcar-gen2, uniphier DT binding top-level constraints for
clocks (Krzysztof Kozlowski)
Altera PCIe controller driver:
- Convert altera DT bindings from text to YAML (Matthew Gerlach)
- Replace TLP_REQ_ID() with macro PCI_DEVID(), which does the same
thing and is what other drivers use (Jinjie Ruan)
Broadcom STB PCIe controller driver:
- Add DT binding maxItems for reset controllers (Jim Quinlan)
- Use the 'bridge' reset method if described in the DT (Jim Quinlan)
- Use the 'swinit' reset method if described in the DT (Jim Quinlan)
- Add 'has_phy' so the existence of a 'rescal' reset controller
doesn't imply software control of it (Jim Quinlan)
- Add support for many inbound DMA windows (Jim Quinlan)
- Rename SoC 'type' to 'soc_base' express the fact that SoCs come in
families of multiple similar devices (Jim Quinlan)
- Add Broadcom 7712 DT description and driver support (Jim Quinlan)
- Sort enums, pcie_offsets[], pcie_cfg_data, .compatible strings for
maintainability (Bjorn Helgaas)
Freescale i.MX6 PCIe controller driver:
- Add imx6q-pcie 'dbi2' and 'atu' reg-names for i.MX8M Endpoints
(Richard Zhu)
- Fix a code restructuring error that caused i.MX8MM and i.MX8MP
Endpoints to fail to establish link (Richard Zhu)
- Fix i.MX8MP Endpoint occasional failure to trigger MSI by enforcing
outbound alignment requirement (Richard Zhu)
- Call phy_power_off() in the .probe() error path (Frank Li)
- Rename internal names from imx6_* to imx_* since i.MX7/8/9 are also
supported (Frank Li)
- Manage Refclk by using SoC-specific callbacks instead of switch
statements (Frank Li)
- Manage core reset by using SoC-specific callbacks instead of switch
statements (Frank Li)
- Expand comments for erratum ERR010728 workaround (Frank Li)
- Use generic PHY APIs to configure mode, speed, and submode, which
is harmless for devices that implement their own internal PHY
management and don't set the generic imx_pcie->phy (Frank Li)
- Add i.MX8Q (i.MX8QM, i.MX8QXP, and i.MX8DXL) DT binding and driver
Root Complex support (Richard Zhu)
Freescale Layerscape PCIe controller driver:
- Replace layerscape-pcie DT binding compatible fsl,lx2160a-pcie with
fsl,lx2160ar2-pcie (Frank Li)
- Add layerscape-pcie DT binding deprecated 'num-viewport' property
to address a DT checker warning (Frank Li)
- Change layerscape-pcie DT binding 'fsl,pcie-scfg' to phandle-array
(Frank Li)
Loongson PCIe controller driver:
- Increase max PCI hosts to 8 for Loongson-3C6000 and newer chipsets
(Huacai Chen)
Marvell Aardvark PCIe controller driver:
- Fix issue with emulating Configuration RRS for two-byte reads of
Vendor ID; previously it only worked for four-byte reads (Bjorn
Helgaas)
MediaTek PCIe Gen3 controller driver:
- Add per-SoC struct mtk_gen3_pcie_pdata to support multiple SoC
types (Lorenzo Bianconi)
- Use reset_bulk APIs to manage PHY reset lines (Lorenzo Bianconi)
- Add DT and driver support for Airoha EN7581 PCIe controller
(Lorenzo Bianconi)
Qualcomm PCIe controller driver:
- Update qcom,pcie-sc7280 DT binding with eight interrupts (Rayyan
Ansari)
- Add back DT 'vddpe-3v3-supply', which was incorrectly removed
earlier (Johan Hovold)
- Drop endpoint redundant masking of global IRQ events (Manivannan
Sadhasivam)
- Clarify unknown global IRQ message and only log it once to avoid a
flood (Manivannan Sadhasivam)
- Add 'linux,pci-domain' property to endpoint DT binding (Manivannan
Sadhasivam)
- Assign PCI domain number for endpoint controllers (Manivannan
Sadhasivam)
- Add 'qcom_pcie_ep' and the PCI domain number to IRQ names for
endpoint controller (Manivannan Sadhasivam)
- Add global SPI interrupt for PCIe link events to DT binding
(Manivannan Sadhasivam)
- Add global RC interrupt handler to handle 'Link up' events and
automatically enumerate hot-added devices (Manivannan Sadhasivam)
- Avoid mirroring of DBI and iATU register space so it doesn't
overlap BAR MMIO space (Prudhvi Yarlagadda)
- Enable controller resources like PHY only after PERST# is
deasserted to partially avoid the problem that the endpoint SoC
crashes when accessing things when Refclk is absent (Manivannan
Sadhasivam)
- Add 16.0 GT/s equalization and RX lane margining settings (Shashank
Babu Chinta Venkata)
- Pass domain number to pci_bus_release_domain_nr() explicitly to
avoid a NULL pointer dereference (Manivannan Sadhasivam)
Renesas R-Car PCIe controller driver:
- Make the read-only const array 'check_addr' static (Colin Ian King)
- Add R-Car V4M (R8A779H0) PCIe host and endpoint to DT binding
(Yoshihiro Shimoda)
TI DRA7xx PCIe controller driver:
- Request IRQF_ONESHOT for 'dra7xx-pcie-main' IRQ since the primary
handler is NULL (Siddharth Vadapalli)
- Handle IRQ request errors during root port and endpoint probe
(Siddharth Vadapalli)
TI J721E PCIe driver:
- Add DT 'ti,syscon-acspcie-proxy-ctrl' and driver support to enable
the ACSPCIE module to drive Refclk for the Endpoint (Siddharth
Vadapalli)
- Extract the cadence link setup from cdns_pcie_host_setup() so link
setup can be done separately during resume (Thomas Richard)
- Add T_PERST_CLK_US definition for the mandatory delay between
Refclk becoming stable and PERST# being deasserted (Thomas Richard)
- Add j721e suspend and resume support (Théo Lebrun)
TI Keystone PCIe controller driver:
- Fix NULL pointer checking when applying MRRS limitation quirk for
AM65x SR 1.0 Errata #i2037 (Dan Carpenter)
Xilinx NWL PCIe controller driver:
- Fix off-by-one error in INTx IRQ handler that caused INTx
interrupts to be lost or delivered as the wrong interrupt (Sean
Anderson)
- Rate-limit misc interrupt messages (Sean Anderson)
- Turn off the clock on probe failure and device removal (Sean
Anderson)
- Add DT binding and driver support for enabling/disabling PHYs (Sean
Anderson)
- Add PCIe phy bindings for the ZCU102 (Sean Anderson)
Xilinx XDMA PCIe controller driver:
- Add support for Xilinx QDMA Soft IP PCIe Root Port Bridge to DT
binding and xilinx-dma-pl driver (Thippeswamy Havalige)
Miscellaneous:
- Fix buffer overflow in kirin_pcie_parse_port() (Alexandra Diupina)
- Fix minor kerneldoc issues and typos (Bjorn Helgaas)
- Use PCI_DEVID() macro in aer_inject() instead of open-coding it
(Jinjie Ruan)
- Check pcie_find_root_port() return in x86 fixups to avoid NULL
pointer dereferences (Samasth Norway Ananda)
- Make pci_bus_type constant (Kunwu Chan)
- Remove unused declarations of __pci_pme_wakeup() and
pci_vpd_release() (Yue Haibing)
- Remove any leftover .*.cmd files with make clean (zhang jiao)
- Remove unused BILLION macro (zhang jiao)"
* tag 'pci-v6.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (132 commits)
PCI: Fix typos
dt-bindings: PCI: qcom: Allow 'vddpe-3v3-supply' again
tools: PCI: Remove unused BILLION macro
tools: PCI: Remove .*.cmd files with make clean
PCI: Pass domain number to pci_bus_release_domain_nr() explicitly
PCI: dra7xx: Fix error handling when IRQ request fails in probe
PCI: dra7xx: Fix threaded IRQ request for "dra7xx-pcie-main" IRQ
PCI: qcom: Add RX lane margining settings for 16.0 GT/s
PCI: qcom: Add equalization settings for 16.0 GT/s
PCI: dwc: Always cache the maximum link speed value in dw_pcie::max_link_speed
PCI: dwc: Rename 'dw_pcie::link_gen' to 'dw_pcie::max_link_speed'
PCI: qcom-ep: Enable controller resources like PHY only after refclk is available
PCI: Mark Creative Labs EMU20k2 INTx masking as broken
dt-bindings: PCI: imx6q-pcie: Add reg-name "dbi2" and "atu" for i.MX8M PCIe Endpoint
dt-bindings: PCI: altera: msi: Convert to YAML
PCI: imx6: Add i.MX8Q PCIe Root Complex (RC) support
PCI: Rename CRS Completion Status to RRS
PCI: aardvark: Correct Configuration RRS checking
PCI: Wait for device readiness with Configuration RRS
PCI: brcmstb: Sort enums, pcie_offsets[], pcie_cfg_data, .compatible strings
...
- Drop endpoint redundant masking of global IRQ events (Manivannan
Sadhasivam)
- Clarify unknown global IRQ message and only log it once to avoid a flood
(Manivannan Sadhasivam)
- Add Manivannan Sadhasivam as maintainer of qcom endpoint driver
(Manivannan Sadhasivam)
- Add 'linux,pci-domain' property to endpoint DT binding (Manivannan
Sadhasivam)
- Assign PCI domain number for endpoint controllers (Manivannan Sadhasivam)
- Add 'qcom_pcie_ep' and the PCI domain number to IRQ names for endpoint
controller (Manivannan Sadhasivam)
- Add global SPI interrupt for PCIe link events to DT binding (Manivannan
Sadhasivam)
- Add global RC interrupt handler to handle 'Link up' events and
automatically enumerate hot-added devices (Manivannan Sadhasivam)
- Avoid mirroring of DBI and iATU register space so it doesn't overlap BAR
MMIO space (Prudhvi Yarlagadda)
- Enable controller resources like PHY only after PERST# is deasserted to
partially avoid the problem that the endpoint SoC crashes when accessing
things when Refclk is absent (Manivannan Sadhasivam)
- Rename dw_pcie.link_gen to max_link_speed to avoid ambiguity (Manivannan
Sadhasivam)
- Cache maximum link speed value in dw_pcie.max_link_speed for use by
vendor drivers (Manivannan Sadhasivam)
- Add 16.0 GT/s equalization and RX lane margining settings (Shashank Babu
Chinta Venkata)
- Pass domain number to pci_bus_release_domain_nr() explicitly to avoid a
NULL pointer dereference (Manivannan Sadhasivam)
* pci/controller/qcom:
PCI: Pass domain number to pci_bus_release_domain_nr() explicitly
PCI: qcom: Add RX lane margining settings for 16.0 GT/s
PCI: qcom: Add equalization settings for 16.0 GT/s
PCI: dwc: Always cache the maximum link speed value in dw_pcie::max_link_speed
PCI: dwc: Rename 'dw_pcie::link_gen' to 'dw_pcie::max_link_speed'
PCI: qcom-ep: Enable controller resources like PHY only after refclk is available
PCI: qcom: Disable mirroring of DBI and iATU register space in BAR region
PCI: qcom: Enumerate endpoints based on Link up event in 'global_irq' interrupt
dt-bindings: PCI: qcom,pcie-sm8450: Add 'global' interrupt
PCI: qcom-ep: Modify 'global_irq' and 'perst_irq' IRQ device names
PCI: endpoint: Assign PCI domain number for endpoint controllers
dt-bindings: PCI: pci-ep: Document 'linux,pci-domain' property
dt-bindings: PCI: pci-ep: Update Maintainers
PCI: qcom-ep: Reword the error message for receiving unknown global IRQ event
PCI: qcom-ep: Drop the redundant masking of global IRQ events
- Wait for each level of downstream bus, not just the first, to become
accessible before restoring devices on that bus (Ilpo Järvinen)
* pci/reset:
PCI: Wait for Link before restoring Downstream Buses
- Clear LBMS bit after a manual link retrain so we don't try to retrain a
link when there's no downstream device anymore (Maciej W. Rozycki)
- Revert to the original link speed after retraining fails instead of
leaving it restricted to 2.5GT/s, so a future device has a chance to use
higher speeds (Maciej W. Rozycki)
- Correct interpretation of pcie_retrain_link() return status and update it
to return 0/errno instead of true/false (Maciej W. Rozycki)
* pci/enumeration:
PCI: Use an error code with PCIe failed link retraining
PCI: Correct error reporting with PCIe failed link retraining
PCI: Revert to the original speed after PCIe failed link retraining
PCI: Clear the LBMS bit after a link retrain
The pci_bus_release_domain_nr() API is supposed to free the domain
number allocated by pci_bus_find_domain_nr(). Most of the callers of
pci_bus_find_domain_nr(), store the domain number in pci_bus::domain_nr.
As such, the pci_bus_release_domain_nr() implicitly frees the domain
number by dereferencing 'struct pci_bus'. However, one of the callers
of this API, the PCI endpoint subsystem, doesn't have 'struct pci_bus',
so it only passes NULL. Due to this, the API will end up dereferencing
the NULL pointer.
To fix this issue, pass the domain number to this API explicitly. Since
'struct pci_bus' is not used for anything else other than extracting the
domain number, it makes sense to pass the domain number directly.
Fixes: 0328947c50 ("PCI: endpoint: Assign PCI domain number for endpoint controllers")
Closes: https://lore.kernel.org/linux-pci/c0c40ddb-bf64-4b22-9dd1-8dbb18aa2813@stanley.mountain
Link: https://lore.kernel.org/linux-pci/20240912053025.25314-1-manivannan.sadhasivam@linaro.org
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
PCIe r6.0 changed the abbreviation for "Configuration Request Retry Status"
Completion Status from "CRS" to "RRS" and uses the terminology of
"Configuration RRS Software Visibility" instead of "CRS Software
Visibility".
Align the Linux usage with the r6.0 spec language. No functional change
intended.
It's confusing to make this change, but I think "RRS" *is* a better
abbreviation because it was easy to interpret "CRS" as "Completion Retry
Status", which really didn't make any sense.
Link: https://lore.kernel.org/r/20240827234848.4429-4-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
After a device reset, delays are required before the device can
successfully complete config accesses. PCIe r6.0, sec 6.6, specifies some
delays required before software can perform config accesses. Devices that
require more time after those delays may respond to config accesses with
Configuration Request Retry Status (RRS) completions.
Callers of pci_dev_wait() are responsible for delays until the device can
respond to config accesses. pci_dev_wait() waits any additional time until
the device can successfully complete config accesses.
Reading config space of devices that are not present or not ready typically
returns ~0 (PCI_ERROR_RESPONSE). Previously we polled the Command register
until we got a value other than ~0. This is sometimes a problem because
Root Complex handling of RRS completions may include several retries and
implementation-specific behavior that is invisible to software (see sec
2.3.2), so the exponential backoff in pci_dev_wait() may not work as
intended.
Linux enables Configuration RRS Software Visibility on all Root Ports that
support it. If it is enabled, read the Vendor ID instead of the Command
register. RRS completions cause immediate return of the 0x0001 reserved
Vendor ID value, so the pci_dev_wait() backoff works correctly.
When a read of Vendor ID eventually completes successfully by returning a
non-0x0001 value (the Vendor ID or 0xffff for VFs), the device should be
initialized and ready to respond to config requests.
For conventional PCI devices or devices below Root Ports that don't support
Configuration RRS Software Visibility, poll the Command register as before.
This was developed independently, but is very similar to Stanislav
Spassov's previous work at
https://lore.kernel.org/linux-pci/20200223122057.6504-1-stanspas@amazon.com
Link: https://lore.kernel.org/r/20240827234848.4429-2-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Duc Dang <ducdang@google.com>
Given how the call place in pcie_wait_for_link_delay() got structured now,
and that pcie_retrain_link() returns a potentially useful error code,
convert pcie_failed_link_retrain() to return an error code rather than a
boolean status, fixing handling at the call site mentioned. Update the
other call site accordingly.
Fixes: 1abb473903 ("Merge branch 'pci/enumeration'")
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2408091156530.61955@angie.orcam.me.uk
Reported-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/aa2d1c4e-9961-d54a-00c7-ddf8e858a9b0@linux.intel.com/
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v6.5+
The LBMS bit, where implemented, is set by hardware either in response
to the completion of retraining caused by writing 1 to the Retrain Link
bit or whenever hardware has changed the link speed or width in attempt
to correct unreliable link operation. It is never cleared by hardware
other than by software writing 1 to the bit position in the Link Status
register and we never do such a write.
We currently have two places, namely apply_bad_link_workaround() and
pcie_failed_link_retrain() in drivers/pci/controller/dwc/pcie-tegra194.c
and drivers/pci/quirks.c respectively where we check the state of the LBMS
bit and neither is interested in the state of the bit resulting from the
completion of retraining, both check for a link fault.
And in particular pcie_failed_link_retrain() causes issues consequently, by
trying to retrain a link where there's no downstream device anymore and the
state of 1 in the LBMS bit has been retained from when there was a device
downstream that has since been removed.
Clear the LBMS bit then at the conclusion of pcie_retrain_link(), so that
we have a single place that controls it and that our code can track link
speed or width changes resulting from unreliable link operation.
Fixes: a89c82249c ("PCI: Work around PCIe link training failures")
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2408091133140.61955@angie.orcam.me.uk
Reported-by: Matthew W Carlis <mattc@purestorage.com>
Link: https://lore.kernel.org/r/20240806000659.30859-1-mattc@purestorage.com/
Link: https://lore.kernel.org/r/20240722193407.23255-1-mattc@purestorage.com/
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Cc: <stable@vger.kernel.org> # v6.5+
__pci_reset_bus() calls pci_bridge_secondary_bus_reset() to perform the
reset and also waits for the Secondary Bus to become again accessible.
__pci_reset_bus() then calls pci_bus_restore_locked() that restores the PCI
devices connected to the bus, and if necessary, recursively restores also
the subordinate buses and their devices.
The logic in pci_bus_restore_locked() does not take into account that after
restoring a device on one level, there might be another Link Downstream
that can only start to come up after restore has been performed for its
Downstream Port device. That is, the Link may require additional wait until
it becomes accessible.
Similarly, pci_slot_restore_locked() lacks wait.
Amend pci_bus_restore_locked() and pci_slot_restore_locked() to wait for
the Secondary Bus before recursively performing the restore of that bus.
Fixes: 090a3c5322 ("PCI: Add pci_reset_slot() and pci_reset_bus()")
Link: https://lore.kernel.org/r/20240808121708.2523-1-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
pci_intx() becomes managed if pcim_enable_device() has been called in
advance. Commit 25216afc9d ("PCI: Add managed pcim_intx()") changed this
behavior so that pci_intx() always leads to creation of a separate device
resource for itself, whereas earlier, a shared resource was used for all
PCI devres operations.
Unfortunately, pci_intx() seems to be used in some drivers' remove() paths;
in the managed case this causes a device resource to be created on driver
detach, which causes .probe() to fail if the driver is reloaded:
pci 0000:00:1f.2: Resources present before probing
Fix the regression by only redirecting pci_intx() to its managed twin
pcim_intx() if the pci_command changes.
Link: https://lore.kernel.org/r/20240725120729.59788-2-pstanner@redhat.com
Fixes: 25216afc9d ("PCI: Add managed pcim_intx()")
Reported-by: Damien Le Moal <dlemoal@kernel.org>
Closes: https://lore.kernel.org/all/b8f4ba97-84fc-4b7e-ba1a-99de2d9f0118@kernel.org/
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
[bhelgaas: add error message to commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Damien Le Moal <dlemoal@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmaahiEUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vwypg/+LSzrx0CyyXruwwkjuoMIzqXoEpxV
SSdJv47E9rnJymQvd0RAeNyc1BPbtRcP1FdEvV/G1ovb8qJSOJgU22PSSiMQsQ0h
2WGBl1ShubQDDLBdy1AggAsRJhIH4P4tWZ4k5Ftz6WZPWA1UcrDqmjN4d02UIYZb
A3YYcBEIm6bvrixxy+xq/Ii7S9A2idikabDLLGXOMSliFHx0ehWDNXyQEBONlrDh
rEHih21rPtOltVEdJl7yF+SIA467HI09NuXfTviHWnJ1hinFoSlEHIhz4j+i+r//
xOj7iDqtk/UAIToVsxtwgOnElNwY6ab/h/t1AmSSxX4FUEV2TiS1YEpUfX7pByt+
dytgvepjQyycC/ZHUtRZFZ6+1M0z+Vgb5c3+jXyPh8pQEPqmXt8+KYVIi/wychmJ
Opo4xniiDoKHSZ4E0bg/wMbe9yVCjTpX0i0S7BbNa/TRjud6vAhXvgx/y092jsdg
h4lU0ywNCgea/rZFHZYomPjncx9xJ+rtOaH+/dVQhCm/wuRHnj7tJGZnl5LfCWVw
+yNOcExQaE+lRvKqp6mQvUva3+4UArAL2tnFC00tGd0emRLIvXrxY2lF1sqp9wCZ
AJu65El4nnpFNU7vJR7x4X31BvcdquFEvfofPxPXbPz09N8hPRhkunKzgd5ftKZS
mcxMfStvIFXiMEM=
=vw2i
-----END PGP SIGNATURE-----
Merge tag 'pci-v6.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Define PCIE_RESET_CONFIG_DEVICE_WAIT_MS for the generic 100ms
required after reset before config access (Kevin Xie)
- Define PCIE_T_RRS_READY_MS for the generic 100ms required after
reset before config access (probably should be unified with
PCIE_RESET_CONFIG_DEVICE_WAIT_MS) (Damien Le Moal)
Resource management:
- Rename find_resource() to find_resource_space() to be more
descriptive (Ilpo Järvinen)
- Export find_resource_space() for use by PCI core, which needs to
learn whether there is available space for a bridge window (Ilpo
Järvinen)
- Prevent double counting of resources so window size doesn't grow on
each remove/rescan cycle (Ilpo Järvinen)
- Relax bridge window sizing algorithm so a device doesn't break
simply because it was removed and rescanned (Ilpo Järvinen)
- Evaluate the ACPI PRESERVE_BOOT_CONFIG _DSM in
pci_register_host_bridge() (not acpi_pci_root_create()) so we can
unify it with similar DT functionality (Vidya Sagar)
- Extend use of DT "linux,pci-probe-only" property so it works
per-host bridge as well as globally (Vidya Sagar)
- Unify support for ACPI PRESERVE_BOOT_CONFIG _DSM and the DT
"linux,pci-probe-only" property in pci_preserve_config() (Vidya
Sagar)
Driver binding:
- Add devres infrastructure for managed request and map of partial
BAR resources (Philipp Stanner)
- Deprecate pcim_iomap_table() because uses like
"pcim_iomap_table()[0]" have no good way to return errors (Philipp
Stanner)
- Add an always-managed pcim_request_region() for use instead of
pci_request_region() and similar, which are sometimes managed
depending on whether pcim_enable_device() has been called
previously (Philipp Stanner)
- Reimplement pcim_set_mwi() so it doesn't need to keep store MWI
state (Philipp Stanner)
- Add pcim_intx() for use instead of pci_intx(), which is sometimes
managed depending on whether pcim_enable_device() has been called
previously (Philipp Stanner)
- Add managed pcim_iomap_range() to allow mapping of a partial BAR
(Philipp Stanner)
- Fix a devres mapping leak in drm/vboxvideo (Philipp Stanner)
Error handling:
- Add missing bridge locking in device reset path and add a warning
for other possible lock issues (Dan Williams)
- Fix use-after-free on concurrent DPC and hot-removal (Lukas Wunner)
Power management:
- Disable AER and DPC during suspend to avoid spurious wakeups if
they share an interrupt with PME (Kai-Heng Feng)
PCIe native device hotplug:
- Detect if a device was removed or replaced during system sleep so
we don't assume a new device is the one that used to be there
(Lukas Wunner)
Virtualization:
- Add an ACS quirk for Broadcom BCM5760X multi-function NIC; it
prevents transactions between functions even though it doesn't
advertise ACS, so the functions can be attached individually via
VFIO (Ajit Khaparde)
Peer-to-peer DMA:
- Add a "pci=config_acs=" kernel command-line parameter to relax
default ACS settings to enable additional peer-to-peer
configurations. Requires expert knowledge of topology and ACS
operation (Vidya Sagar)
Endpoint framework:
- Remove unused struct pci_epf_group.type_group (Christophe JAILLET)
- Fix error handling in vpci_scan_bus() and epf_ntb_epc_cleanup()
(Dan Carpenter)
- Make struct pci_epc_class constant (Greg Kroah-Hartman)
- Remove unused pci_endpoint_test_bar_{readl,writel} functions
(Jiapeng Chong)
- Rename "BME" to "Bus Master Enable" (Manivannan Sadhasivam)
- Rename struct pci_epc_event_ops.core_init() callback to epc_init()
(Manivannan Sadhasivam)
- Move DMA init to MHI .epc_init() callback for uniformity
(Manivannan Sadhasivam)
- Cancel EPF test delayed work when link goes down (Manivannan
Sadhasivam)
- Add struct pci_epc_event_ops.epc_deinit() callback for cleanup
needed on fundamental reset (Manivannan Sadhasivam)
- Add 64KB alignment to endpoint test to support Rockchip rk3588
(Niklas Cassel)
- Optimize endpoint test by using memcpy() instead of readl() (Niklas
Cassel)
Device tree bindings:
- Add generic "ats-supported" property to advertise that a PCIe Root
Complex supports ATS (Jean-Philippe Brucker)
Amazon Annapurna Labs PCIe controller driver:
- Validate IORESOURCE_BUS presence to avoid NULL pointer dereference
(Aleksandr Mishin)
Axis ARTPEC-6 PCIe controller driver:
- Rename .cpu_addr_fixup() parameter to reflect that it is a PCI
address, not a CPU address (Niklas Cassel)
Freescale i.MX6 PCIe controller driver:
- Convert to agnostic GPIO API (Andy Shevchenko)
Freescale Layerscape PCIe controller driver:
- Make struct mobiveil_rp_ops constant (Christophe JAILLET)
- Use new generic dw_pcie_ep_linkdown() to handle link-down events
(Manivannan Sadhasivam)
HiSilicon Kirin PCIe controller driver:
- Convert to agnostic GPIO API (Andy Shevchenko)
- Use _scoped() iterator for OF children to ensure refcounts are
decremented at loop exit (Javier Carrasco)
Intel VMD host bridge driver:
- Create sysfs "domain" symlink before downstream devices are exposed
to userspace by pci_bus_add_devices() (Jiwei Sun)
Loongson PCIe controller driver:
- Enable MSI when LS7A is used with new CPUs that have integrated
PCIe Root Complex, e.g., Loongson-3C6000, so downstream devices can
use MSI (Huacai Chen)
Microchip AXI PolarFlare PCIe controller driver:
- Move pcie-microchip-host.c to a new PLDA directory (Minda Chen)
- Factor PLDA generic items out to a common
plda,xpressrich3-axi-common.yaml binding (Minda Chen)
- Factor PLDA generic data structures and code out to shared
pcie-plda.h, pcie-plda-host.c (Minda Chen)
- Add PLDA generic interrupt handling with a .request_event_irq()
callback for vendor-specific events (Minda Chen)
- Add PLDA generic host init/deinit and map bus functions for use by
vendor-specific drivers (Minda Chen)
- Rework to use PLDA core (Minda Chen)
Microsoft Hyper-V host bridge driver:
- Return zero, not garbage, when reading PCI_INTERRUPT_PIN (Wei Liu)
NVIDIA Tegra194 PCIe controller driver:
- Remove unused struct tegra_pcie_soc (Dr. David Alan Gilbert)
- Set 64KB inbound ATU alignment restriction (Jon Hunter)
Qualcomm PCIe controller driver:
- Make the MHI reg region mandatory for X1E80100, since all PCIe
controllers have it (Abel Vesa)
- Prevent use of uninitialized data and possible error pointer
dereference (Dan Carpenter)
- Return error, not success, if dev_pm_opp_find_freq_floor() fails
(Dan Carpenter)
- Add Operating Performance Points (OPP) support to scale performance
state based on aggregate link bandwidth to improve SoC power
efficiency (Krishna chaitanya chundru)
- Vote for the CPU-PCIe ICC (interconnect) path to ensure it stays
active even if other drivers don't vote for it (Krishna chaitanya
chundru)
- Use devm_clk_bulk_get_all() to get all the clocks from DT to avoid
writing out all the clock names (Manivannan Sadhasivam)
- Add DT binding and driver support for the SA8775P SoC (Mrinmay
Sarkar)
- Add HDMA support for the SA8775P SoC (Mrinmay Sarkar)
- Override the SA8775P NO_SNOOP default to avoid possible memory
corruption (Mrinmay Sarkar)
- Make sure resources are disabled during PERST# assertion, even if
the link is already disabled (Manivannan Sadhasivam)
- Use new generic dw_pcie_ep_linkdown() to handle link-down events
(Manivannan Sadhasivam)
- Add DT and endpoint driver support for the SA8775P SoC (Mrinmay
Sarkar)
- Add Hyper DMA (HDMA) support for the SA8775P SoC and enable it in
the EPF MHI driver (Mrinmay Sarkar)
- Set PCIE_PARF_NO_SNOOP_OVERIDE to override the default NO_SNOOP
attribute on the SA8775P SoC (both Root Complex and Endpoint mode)
to avoid possible memory corruption (Mrinmay Sarkar)
Renesas R-Car PCIe controller driver:
- Demote WARN() to dev_warn_ratelimited() in rcar_pcie_wakeup() to
avoid unnecessary backtrace (Marek Vasut)
- Add DT and driver support for R-Car V4H (R8A779G0) host and
endpoint. This requires separate proprietary firmware (Yoshihiro
Shimoda)
Rockchip PCIe controller driver:
- Assert PERST# for 100ms after power is stable (Damien Le Moal)
- Wait PCIE_T_RRS_READY_MS (100ms) after reset before starting
configuration (Damien Le Moal)
- Use GPIOD_OUT_LOW flag while requesting ep_gpio to fix a firmware
crash on Qcom-based modems with Rockpro64 board (Manivannan
Sadhasivam)
Rockchip DesignWare PCIe controller driver:
- Factor common parts of rockchip-dw-pcie DT binding to be shared by
Root Complex and Endpoint mode (Niklas Cassel)
- Add missing INTx signals to common DT binding (Niklas Cassel)
- Add eDMA items to DT binding for Endpoint controller (Niklas
Cassel)
- Fix initial dw-rockchip PERST# GPIO value to prevent unnecessary
short assert/deassert that causes issues with some WLAN controllers
(Niklas Cassel)
- Refactor dw-rockchip and add support for Endpoint mode (Niklas
Cassel)
- Call pci_epc_init_notify() and drop dw_pcie_ep_init_notify()
wrapper (Niklas Cassel)
- Add error messages in .probe() error paths to improve user
experience (Uwe Kleine-König)
Samsung Exynos PCIe controller driver:
- Use bulk clock APIs to simplify clock setup (Shradha Todi)
StarFive PCIe controller driver:
- Add DT binding and driver support for the StarFive JH7110
PLDA-based PCIe controller (Minda Chen)
Synopsys DesignWare PCIe controller driver:
- Add generic support for sending PME_Turn_Off when system suspends
(Frank Li)
- Fix incorrect interpretation of iATU slot 0 after PERST#
assert/deassert (Frank Li)
- Use msleep() instead of usleep_range() while waiting for link
(Konrad Dybcio)
- Refactor dw_pcie_edma_find_chip() to enable adding support for
Hyper DMA (HDMA) (Manivannan Sadhasivam)
- Enable drivers to supply the eDMA channel count since some can't
auto detect this (Manivannan Sadhasivam)
- Call pci_epc_init_notify() and drop dw_pcie_ep_init_notify()
wrapper (Manivannan Sadhasivam)
- Pass the eDMA mapping format directly from drivers instead of
maintaining a capability for it (Manivannan Sadhasivam)
- Add generic dw_pcie_ep_linkdown() to notify EPF drivers about
link-down events and restore non-sticky DWC registers lost on link
down (Manivannan Sadhasivam)
- Add vendor-specific "apb" reg name, interrupt names, INTx names to
generic binding (Niklas Cassel)
- Enforce DWC restriction that 64-bit BARs must start with an
even-numbered BAR (Niklas Cassel)
- Consolidate args of dw_pcie_prog_outbound_atu() into a structure
(Yoshihiro Shimoda)
- Add support for endpoints to send Message TLPs, e.g., for INTx
emulation (Yoshihiro Shimoda)
TI DRA7xx PCIe controller driver:
- Rename .cpu_addr_fixup() parameter to reflect that it is a PCI
address, not a CPU address (Niklas Cassel)
TI Keystone PCIe controller driver:
- Validate IORESOURCE_BUS presence to avoid NULL pointer dereference
(Aleksandr Mishin)
- Work around AM65x/DRA80xM Errata #i2037 that corrupts TLPs and
causes processor hangs by limiting Max_Read_Request_Size (MRRS) and
Max_Payload_Size (MPS) (Kishon Vijay Abraham I)
- Leave BAR 0 disabled for AM654x to fix a regression caused by
6ab15b5e70 ("PCI: dwc: keystone: Convert .scan_bus() callback to
use add_bus"), which caused a 45-second boot delay (Siddharth
Vadapalli)
Xilinx Versal CPM PCIe controller driver:
- Fix overlapping bridge registers and 32-bit BAR addresses in DT
binding (Thippeswamy Havalige)
MicroSemi Switchtec management driver:
- Make struct switchtec_class constant (Greg Kroah-Hartman)
Miscellaneous:
- Remove unused struct acpi_handle_node (Dr. David Alan Gilbert)
- Add missing MODULE_DESCRIPTION() macros (Jeff Johnson)"
* tag 'pci-v6.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (154 commits)
PCI: loongson: Enable MSI in LS7A Root Complex
PCI: Extend ACS configurability
PCI: Add missing bridge lock to pci_bus_lock()
drm/vboxvideo: fix mapping leaks
PCI: Add managed pcim_iomap_range()
PCI: Remove legacy pcim_release()
PCI: Add managed pcim_intx()
PCI: vmd: Create domain symlink before pci_bus_add_devices()
PCI: qcom: Prevent use of uninitialized data in qcom_pcie_suspend_noirq()
PCI: qcom: Prevent potential error pointer dereference
PCI: qcom: Fix missing error code in qcom_pcie_probe()
PCI: Give pcim_set_mwi() its own devres cleanup callback
PCI: Move struct pci_devres.pinned bit to struct pci_dev
PCI: Remove struct pci_devres.enabled status bit
PCI: Document hybrid devres hazards
PCI: Add managed pcim_request_region()
PCI: Deprecate pcim_iomap_table(), pcim_iomap_regions_request_all()
PCI: Add managed partial-BAR request and map infrastructure
PCI: Add devres helpers for iomap table
PCI: Add and use devres helper for bit masks
...
- Use devm_clk_bulk_get_all() to get all the clocks from DT to avoid
writing out all the clock names (Manivannan Sadhasivam)
- Add DT binding and driver support for the SA8775P SoC (Mrinmay Sarkar)
- Refactor dw_pcie_edma_find_chip() to enable adding support for Hyper DMA
(HDMA) (Manivannan Sadhasivam)
- Enable drivers to supply the eDMA channel count since some can't auto
detect this (Manivannan Sadhasivam)
- Add HDMA support for the SA8775P SoC (Mrinmay Sarkar)
- Override the SA8775P NO_SNOOP default to avoid possible memory corruption
(Mrinmay Sarkar)
- Make sure resources are disabled during PERST# assertion, even if the
link is already disabled (Manivannan Sadhasivam)
- Vote for the CPU-PCIe ICC (interconnect) path to ensure it stays active
even if other drivers don't vote for it (Krishna chaitanya chundru)
- Add Operating Performance Points (OPP) to scale performance state based
on aggregate link bandwidth to improve SoC power efficiency (Krishna
chaitanya chundru)
- Return failure instead of success if dev_pm_opp_find_freq_floor() fails
(Dan Carpenter)
- Avoid an error pointer dereference if dev_pm_opp_find_freq_exact() fails
(Dan Carpenter)
- Prevent use of uninitialized data in qcom_pcie_suspend_noirq() (Dan
Carpenter)
* pci/controller/qcom:
PCI: qcom: Prevent use of uninitialized data in qcom_pcie_suspend_noirq()
PCI: qcom: Prevent potential error pointer dereference
PCI: qcom: Fix missing error code in qcom_pcie_probe()
PCI: qcom: Add OPP support to scale performance
PCI: Bring the PCIe speed to MBps logic to new pcie_dev_speed_mbps()
PCI: qcom: Add ICC bandwidth vote for CPU to PCIe path
PCI: qcom-ep: Disable resources unconditionally during PERST# assert
PCI: qcom-ep: Override NO_SNOOP attribute for SA8775P EP
PCI: qcom: Override NO_SNOOP attribute for SA8775P RC
PCI: epf-mhi: Enable HDMA for SA8775P SoC
PCI: qcom-ep: Add HDMA support for SA8775P SoC
PCI: dwc: Pass the eDMA mapping format flag directly from glue drivers
PCI: dwc: Skip finding eDMA channels count for HDMA platforms
PCI: dwc: Refactor dw_pcie_edma_find_chip() API
PCI: qcom-ep: Add support for SA8775P SOC
dt-bindings: PCI: qcom-ep: Add support for SA8775P SoC
PCI: qcom: Use devm_clk_bulk_get_all() API
- Warn about doing a Secondary Bus Reset without holding the device lock
(Dan Williams)
- Lock bridge in addition to downstream hierarchy before doing a Secondary
Bus Reset (Dan Williams)
* pci/reset:
PCI: Add missing bridge lock to pci_bus_lock()
PCI: Warn on missing cfg_access_lock during secondary bus reset
- If there's a device below a bridge, prevent a use-after-free by holding a
reference to the device while waiting for the secondary bus to be ready
in case the device is concurrently removed, e.g., by DPC (Lukas Wunner)
* pci/dpc:
PCI/DPC: Fix use-after-free on concurrent DPC and hot-removal
- Add pcim_add_mapping_to_legacy_table() and
pcim_remove_mapping_from_legacy_table() helper functions to simplify
devres iomap table (Philipp Stanner)
- Reimplement devres that take a bit mask of BARs in a way that can be used
to map partial BARs as well as entire BARs (Philipp Stanner)
- Deprecate pcim_iomap_table() and pcim_iomap_regions_request_all() in
favor of pcim_* request plus pcim_* mapping (Philipp Stanner)
- Add pcim_request_region(), a managed interface to request a single BAR
(Philipp Stanner)
- Use the existing pci_is_enabled() interface to replace the struct
devres.enabled bit (Philipp Stanner)
- Move the struct pci_devres.pinned bit to struct pci_dev (Philipp Stanner)
- Reimplement pcim_set_mwi() so it uses its own devres cleanup callback
instead of a special-purpose bit in struct pci_devres (Philipp Stanner)
- Add pcim_intx(), which is unambiguously managed, unlike pci_intx(), which
is managed if pcim_enable_device() has been called but unmanaged
otherwise (Philipp Stanner)
- Remove pcim_release(), which is no longer needed after previous cleanups
of pcim_set_mwi() and pci_intx() (Philipp Stanner)
- Add pcim_iomap_range(), a managed interface to map part of a BAR (Philipp
Stanner)
- Fix vboxvideo leak by using the new pcim_iomap_range() instead of the
unmanaged pci_iomap_range() (Philipp Stanner)
* pci/devres:
drm/vboxvideo: fix mapping leaks
PCI: Add managed pcim_iomap_range()
PCI: Remove legacy pcim_release()
PCI: Add managed pcim_intx()
PCI: Give pcim_set_mwi() its own devres cleanup callback
PCI: Move struct pci_devres.pinned bit to struct pci_dev
PCI: Remove struct pci_devres.enabled status bit
PCI: Document hybrid devres hazards
PCI: Add managed pcim_request_region()
PCI: Deprecate pcim_iomap_table(), pcim_iomap_regions_request_all()
PCI: Add managed partial-BAR request and map infrastructure
PCI: Add devres helpers for iomap table
PCI: Add and use devres helper for bit masks
PCIe ACS settings control the level of isolation and the possible P2P paths
between devices. With greater isolation the kernel will create smaller
iommu_groups and with less isolation there is more HW that can achieve P2P
transfers. From a virtualization perspective all devices in the same
iommu_group must be assigned to the same VM as they lack security
isolation.
There is no way for the kernel to automatically know the correct ACS
settings for any given system and workload. Existing command line options
(e.g., disable_acs_redir) allow only for large scale change, disabling all
isolation, but this is not sufficient for more complex cases.
Add a kernel command-line option 'config_acs' to directly control all the
ACS bits for specific devices, which allows the operator to setup the right
level of isolation to achieve the desired P2P configuration. The
definition is future proof; when new ACS bits are added to the spec the
open syntax can be extended.
ACS needs to be setup early in the kernel boot as the ACS settings affect
how iommu_groups are formed. iommu_group formation is a one time event
during initial device discovery, so changing ACS bits after kernel boot can
result in an inaccurate view of the iommu_groups compared to the current
isolation configuration.
ACS applies to PCIe Downstream Ports and multi-function devices. The
default ACS settings are strict and deny any direct traffic between two
functions. This results in the smallest iommu_group the HW can support.
Frequently these values result in slow or non-working P2PDMA.
ACS offers a range of security choices controlling how traffic is
allowed to go directly between two devices. Some popular choices:
- Full prevention
- Translated requests can be direct, with various options
- Asymmetric direct traffic, A can reach B but not the reverse
- All traffic can be direct
Along with some other less common ones for special topologies.
The intention is that this option would be used with expert knowledge of
the HW capability and workload to achieve the desired configuration.
Link: https://lore.kernel.org/r/20240625153150.159310-1-vidyas@nvidia.com
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
[bhelgaas: add example, tidy printk formats]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
One of the true positives that the cfg_access_lock lockdep effort
identified is this sequence:
WARNING: CPU: 14 PID: 1 at drivers/pci/pci.c:4886 pci_bridge_secondary_bus_reset+0x5d/0x70
RIP: 0010:pci_bridge_secondary_bus_reset+0x5d/0x70
Call Trace:
<TASK>
? __warn+0x8c/0x190
? pci_bridge_secondary_bus_reset+0x5d/0x70
? report_bug+0x1f8/0x200
? handle_bug+0x3c/0x70
? exc_invalid_op+0x18/0x70
? asm_exc_invalid_op+0x1a/0x20
? pci_bridge_secondary_bus_reset+0x5d/0x70
pci_reset_bus+0x1d8/0x270
vmd_probe+0x778/0xa10
pci_device_probe+0x95/0x120
Where pci_reset_bus() users are triggering unlocked secondary bus resets.
Ironically pci_bus_reset(), several calls down from pci_reset_bus(), uses
pci_bus_lock() before issuing the reset which locks everything *but* the
bridge itself.
For the same motivation as adding:
bridge = pci_upstream_bridge(dev);
if (bridge)
pci_dev_lock(bridge);
to pci_reset_function() for the "bus" and "cxl_bus" reset cases, add
pci_dev_lock() for @bus->self to pci_bus_lock().
Link: https://lore.kernel.org/r/171711747501.1628941.15217746952476635316.stgit@dwillia2-xfh.jf.intel.com
Reported-by: Imre Deak <imre.deak@intel.com>
Closes: http://lore.kernel.org/r/6657833b3b5ae_14984b29437@dwillia2-xfh.jf.intel.com.notmuch
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
[bhelgaas: squash in recursive locking deadlock fix from Keith Busch:
https://lore.kernel.org/r/20240711193650.701834-1-kbusch@meta.com]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Hans de Goede <hdegoede@redhat.com>
Tested-by: Kalle Valo <kvalo@kernel.org>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
The struct pci_devres has a separate boolean to track whether a device is
enabled. That, however, can easily be tracked in an agnostic manner through
the function pci_is_enabled().
Using it allows for simplifying the PCI devres implementation.
Replace the separate 'enabled' status bit from struct pci_devres with
calls to pci_is_enabled() at the appropriate places.
Link: https://lore.kernel.org/r/20240613115032.29098-8-pstanner@redhat.com
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
These functions:
pci_request_region()
pci_request_regions()
pci_request_regions_exclusive()
pci_request_selected_regions()
pci_request_selected_regions_exclusive()
pci_intx()
are "hybrid" functions that are managed if pcim_enable_device() has been
called, but unmanaged otherwise.
This is confusing and has already caused a bug (in 8558de401b
("drm/vboxvideo: use managed pci functions")) because users believe all PCI
functions, such as pci_iomap_range(), can become managed that way, which is
not the case.
Add comments to the relevant functions' docstrings that warn users about
this behavior.
Link: https://lore.kernel.org/r/20240613115032.29098-7-pstanner@redhat.com
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
These existing functions:
pci_request_region()
pci_request_selected_regions()
pci_request_selected_regions_exclusive()
are "hybrid" functions built on __pci_request_region() and are managed if
pcim_enable_device() has been called, but unmanaged otherwise.
Add these new functions:
pcim_request_region()
pcim_request_region_exclusive()
These are *always* managed and use the new pcim_addr_devres tracking
infrastructure instead of find_pci_dr() and struct pci_devres.region_mask.
Implement the hybrid functions using the new "pure" functions and remove
struct pci_devres.region_mask, which is no longer needed.
Link: https://lore.kernel.org/r/20240613115032.29098-6-pstanner@redhat.com
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The pcim_iomap_devres table tracks entire-BAR mappings, so we can't use it
to build a managed version of pci_iomap_range(), which maps partial BARs.
Add struct pcim_addr_devres, which can track request and mapping of both
entire BARs and partial BARs.
Add the following internal devres functions based on struct
pcim_addr_devres:
pcim_iomap_region() # request & map entire BAR
pcim_iounmap_region() # unmap & release entire BAR
pcim_request_region() # request entire BAR
pcim_release_region() # release entire BAR
pcim_request_all_regions() # request all entire BARs
pcim_release_all_regions() # release all entire BARs
Rework the following public interfaces using the new infrastructure
listed above:
pcim_iomap() # map partial BAR
pcim_iounmap() # unmap partial BAR
pcim_iomap_regions() # request & map specified BARs
pcim_iomap_regions_request_all() # request all BARs, map specified BARs
pcim_iounmap_regions() # unmap & release specified BARs
Link: https://lore.kernel.org/r/20240613115032.29098-4-pstanner@redhat.com
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Bring the switch case in pcie_link_speed_mbps() to new function to
the header file so that it can be used in other places like
in controller driver.
Link: https://lore.kernel.org/linux-pci/20240619-opp_support-v15-3-aa769a2173a3@quicinc.com
Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Keith reports a use-after-free when a DPC event occurs concurrently to
hot-removal of the same portion of the hierarchy:
The dpc_handler() awaits readiness of the secondary bus below the
Downstream Port where the DPC event occurred. To do so, it polls the
config space of the first child device on the secondary bus. If that
child device is concurrently removed, accesses to its struct pci_dev
cause the kernel to oops.
That's because pci_bridge_wait_for_secondary_bus() neglects to hold a
reference on the child device. Before v6.3, the function was only
called on resume from system sleep or on runtime resume. Holding a
reference wasn't necessary back then because the pciehp IRQ thread
could never run concurrently. (On resume from system sleep, IRQs are
not enabled until after the resume_noirq phase. And runtime resume is
always awaited before a PCI device is removed.)
However starting with v6.3, pci_bridge_wait_for_secondary_bus() is also
called on a DPC event. Commit 53b54ad074 ("PCI/DPC: Await readiness
of secondary bus after reset"), which introduced that, failed to
appreciate that pci_bridge_wait_for_secondary_bus() now needs to hold a
reference on the child device because dpc_handler() and pciehp may
indeed run concurrently. The commit was backported to v5.10+ stable
kernels, so that's the oldest one affected.
Add the missing reference acquisition.
Abridged stack trace:
BUG: unable to handle page fault for address: 00000000091400c0
CPU: 15 PID: 2464 Comm: irq/53-pcie-dpc 6.9.0
RIP: pci_bus_read_config_dword+0x17/0x50
pci_dev_wait()
pci_bridge_wait_for_secondary_bus()
dpc_reset_link()
pcie_do_recovery()
dpc_handler()
Fixes: 53b54ad074 ("PCI/DPC: Await readiness of secondary bus after reset")
Closes: https://lore.kernel.org/r/20240612181625.3604512-3-kbusch@meta.com/
Link: https://lore.kernel.org/linux-pci/8e4bcd4116fd94f592f2bf2749f168099c480ddf.1718707743.git.lukas@wunner.de
Reported-by: Keith Busch <kbusch@kernel.org>
Tested-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: stable@vger.kernel.org # v5.10+
While the experiment did reveal that there are additional places that are
missing the lock during secondary bus reset, one of the places that needs
to take cfg_access_lock (pci_bus_lock()) is not prepared for lockdep
annotation.
Specifically, pci_bus_lock() takes pci_dev_lock() recursively and is
currently dependent on the fact that the device_lock() is marked
lockdep_set_novalidate_class(&dev->mutex). Otherwise, without that
annotation, pci_bus_lock() would need to use something like a new
pci_dev_lock_nested() helper, a scheme to track a PCI device's depth in the
topology, and a hope that the depth of a PCI tree never exceeds the max
value for a lockdep subclass.
The alternative to ripping out the lockdep coverage would be to deploy a
dynamic lock key for every PCI device. Unfortunately, there is evidence
that increasing the number of keys that lockdep needs to track to be
per-PCI-device is prohibitively expensive for something like the
cfg_access_lock.
The main motivation for adding the annotation in the first place was to
catch unlocked secondary bus resets, not necessarily catch lock ordering
problems between cfg_access_lock and other locks. Solve that narrower
problem with follow-on patches, and just due to targeted revert for now.
Link: https://lore.kernel.org/r/171711746402.1628941.14575335981264103013.stgit@dwillia2-xfh.jf.intel.com
Fixes: 7e89efc6e9 ("PCI: Lock upstream bridge for pci_reset_function()")
Reported-by: Imre Deak <imre.deak@intel.com>
Closes: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_134186v1/shard-dg2-1/igt@device_reset@unbind-reset-rebind.html
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Hans de Goede <hdegoede@redhat.com>
Tested-by: Kalle Valo <kvalo@kernel.org>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Cc: Jani Saarinen <jani.saarinen@intel.com>
- Avoid D3cold for HP Pavilion 17 PC/1972 PCIe Ports because we can't get
them back out of D3cold (Mario Limonciello)
* pci/pm:
PCI/PM: Avoid D3cold for HP Pavilion 17 PC/1972 PCIe Ports
- Clear bridge Secondary Status errors after enumeration since enumeration
causes many errors (Vidya Sagar)
- Wait for Link Training==0 before starting Link retrain to avoid a race;
this was done previously but broken by a faulty merge (Ilpo Järvinen)
- Rename PCI_IRQ_LEGACY to PCI_IRQ_INTX to be more specific about what
"LEGACY" means (Damien Le Moal)
- Update return types of pci_find_capability() stubs to match the extern
declarations for the actual implementations (Bjorn Helgaas)
- Drop unnecessary pci_enable_device_io() from pata_cs5520 (Heiner
Kallweit)
- Drop unused pci_enable_device_io() (Heiner Kallweit)
- On 2016 and newer BIOSes, skip early E820 check for ECAM regions
described in ACPI MCFG; there's no spec requirement for E820
reservations, and some machines don't provide them (Bjorn Helgaas)
- If devices were disconnected while suspended, don't wait for them when
resuming (Ilpo Järvinen)
* pci/enumeration:
PCI: Do not wait for disconnected devices when resuming
x86/pci: Skip early E820 check for ECAM region
PCI: Remove unused pci_enable_device_io()
ata: pata_cs5520: Remove unnecessary call to pci_enable_device_io()
PCI: Update pci_find_capability() stub return types
PCI: Remove PCI_IRQ_LEGACY
scsi: vmw_pvscsi: Do not use PCI_IRQ_LEGACY instead of PCI_IRQ_LEGACY
scsi: pmcraid: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
scsi: mpt3sas: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
scsi: megaraid_sas: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
scsi: ipr: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
scsi: hpsa: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
scsi: arcmsr: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
wifi: rtw89: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
wifi: rtw88: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
wifi: ath10k: Refer to INTX instead of LEGACY
net: wangxun: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
r8169: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
net: alx: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
net: atlantic: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
net: amd-xgbe: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
VMCI: Use PCI_IRQ_ALL_TYPES to remove PCI_IRQ_LEGACY use
RDMA/vmw_pvrdma: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
IB/qib: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
drm/amdgpu: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
mfd: intel-lpss: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
ntb: idt: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
platform/x86: intel_ips: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
tty: 8250_pci: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
usb: hcd-pci: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
ASoC: Intel: avs: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
Documentation: PCI: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
PCI/portdrv: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
PCI/MSI: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
PCI: Clarify intent of LT wait
PCI: Wait for Link Training==0 before starting Link retrain
PCI: Clear Secondary Status errors after enumeration
On runtime resume, pci_dev_wait() is called:
pci_pm_runtime_resume()
pci_pm_bridge_power_up_actions()
pci_bridge_wait_for_secondary_bus()
pci_dev_wait()
While a device is runtime suspended along with its PCI hierarchy, the
device could get disconnected. In such case, the link will not come up no
matter how long pci_dev_wait() waits for it.
Besides the above mentioned case, there could be other ways to get the
device disconnected while pci_dev_wait() is waiting for the link to come
up.
Make pci_dev_wait() exit if the device is already disconnected to avoid
unnecessary delay.
The use cases of pci_dev_wait() boil down to two:
1. Waiting for the device after reset
2. pci_bridge_wait_for_secondary_bus()
The callers in both cases seem to benefit from propagating the
disconnection as error even if device disconnection would be more
analoguous to the case where there is no device in the first place which
return 0 from pci_dev_wait(). In the case 2, it results in unnecessary
marking of the devices disconnected again but that is just harmless extra
work.
Also make sure compiler does not become too clever with dev->error_state
and use READ_ONCE() to force a fetch for the up-to-date value.
Link: https://lore.kernel.org/r/20240208132322.4811-1-ilpo.jarvinen@linux.intel.com
Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
After the last user was removed, remove this PCI core function. It's very
unlikely that we'll see a new device requiring io space access, even though
memory space access is supported.
Link: https://lore.kernel.org/r/213ebf62-53a3-42b7-8518-ecd5cd6d6b08@gmail.com
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
By default Secondary Bus Reset (SBR) is masked for CXL Ports (see CXL r3.1,
sec 8.1.5.2).
Add cxl_reset_bus_function() (method "cxl_bus") to set the "Unmask SBR" bit
in the upstream CXL Port before performing the bus reset and restore the
original value afterwards.
This method allows the user to perform a bus reset on a CXL device without
needing to set the "Unmask SBR" bit via a user tool.
Link: https://lore.kernel.org/r/20240502165851.1948523-5-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
[bhelgaas: simplify commit log, invert condition to avoid negation]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Per CXL spec r3.1, sec 8.1.5.2, the Secondary Bus Reset (SBR) bit in the
Bridge Control register of a CXL port has no effect unless the "Unmask SBR"
bit is set.
Return -ENOTTY if we attempt a bus reset on a device below a CXL Port where
"Unmask SBR" is 0. Otherwise, the bus reset would appear to have succeeded
even though setting the bridge SBR bit had no effect.
Link: https://lore.kernel.org/linux-cxl/20240220203956.GA1502351@bhelgaas/
Link: https://lore.kernel.org/r/20240502165851.1948523-4-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
[bhelgaas: simplify commit log and comments]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Fix a long-standing locking gap for missing pci_cfg_access_lock() while
manipulating bridge reset registers and configuration during
pci_reset_bus_function().
If there is an upstream bridge, lock it before locking the device itself.
pci_dev_lock() calls pci_cfg_access_lock(), which blocks the writing of PCI
config space by user space.
Add lockdep assertion via pci_dev->cfg_access_lock to verify
pci_dev->block_cfg_access is set.
Co-developed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20240502165851.1948523-3-dave.jiang@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Clarify the comment relating to the LT wait and the purpose of the check
that implements the implementation note in PCIe r6.1 sec 7.5.3.7.
Suggested-by: Maciej W. Rozycki <macro@orcam.me.uk>
Link: https://lore.kernel.org/r/20240423130820.43824-2-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Two changes were made in link retraining logic independent of each other.
The commit e7e3975636 ("PCI/ASPM: Avoid link retraining race") added a
check to pcie_retrain_link() to ensure no Link Training is currently active
to address the Implementation Note in PCIe r6.1 sec 7.5.3.7. At that time
pcie_wait_for_retrain() only checked for the Link Training (LT) bit being
cleared.
The commit 680e9c47a2 ("PCI: Add support for polling DLLLA to
pcie_retrain_link()") generalized pcie_wait_for_retrain() into
pcie_wait_for_link_status() which can wait either for LT or the Data Link
Layer Link Active (DLLLA) bit with 'use_lt' argument and supporting waiting
for either cleared or set using 'active' argument.
In the merge commit 1abb473903 ("Merge branch 'pci/enumeration'"), those
two divergent branches converged. The merge changed LT bit checking added
in the commit e7e3975636 ("PCI/ASPM: Avoid link retraining race") to now
wait for completion of any ongoing Link Training using DLLLA bit being set
if 'use_lt' is false.
When 'use_lt' is false, the pseudo-code steps of what occurs in
pcie_retrain_link():
1. Wait for DLLLA==1
2. Trigger link to retrain
3. Wait for DLLLA==1
Step 3 waits for the link to come up from the retraining triggered by Step
2. As Step 1 is supposed to wait for any ongoing retraining to end, using
DLLLA also for it does not make sense because link training being active is
still indicated using LT bit, not with DLLLA.
Correct the pcie_wait_for_link_status() parameters in Step 1 to only wait
for LT==0 to ensure there is no ongoing Link Training.
This only impacts the Target Speed quirk, which is the only case where
waiting for DLLLA bit is used. It currently works in the problematic case
by means of link training getting initiated by hardware repeatedly and
respecting the new link parameters set by the caller, which then make
training succeed and bring the link up, setting DLLLA and causing
pcie_wait_for_link_status() to return success. We are not supposed to rely
on luck and need to make sure that LT transitioned through the inactive
state though before we initiate link training by hand via RL (Retrain Link)
bit.
Fixes: 1abb473903 ("Merge branch 'pci/enumeration'")
Link: https://lore.kernel.org/r/20240423130820.43824-1-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Hewlett-Packard HP Pavilion 17 Notebook PC/1972 is an Intel Ivy Bridge
system with a muxless AMD Radeon dGPU. Attempting to use the dGPU fails
with the following sequence:
ACPI Error: Aborting method \AMD3._ON due to previous error (AE_AML_LOOP_TIMEOUT) (20230628/psparse-529)
radeon 0000:01:00.0: not ready 1023ms after resume; waiting
radeon 0000:01:00.0: not ready 2047ms after resume; waiting
radeon 0000:01:00.0: not ready 4095ms after resume; waiting
radeon 0000:01:00.0: not ready 8191ms after resume; waiting
radeon 0000:01:00.0: not ready 16383ms after resume; waiting
radeon 0000:01:00.0: not ready 32767ms after resume; waiting
radeon 0000:01:00.0: not ready 65535ms after resume; giving up
radeon 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
The issue is that the Root Port the dGPU is connected to can't handle the
transition from D3cold to D0 so the dGPU can't properly exit runtime PM.
The existing logic in pci_bridge_d3_possible() checks for systems that are
newer than 2015 to decide that D3 is safe. This would nominally work for
an Ivy Bridge system (which was discontinued in 2015), but this system
appears to have continued to receive BIOS updates until 2017 and so this
existing logic doesn't appropriately capture it.
Add the system to bridge_d3_blacklist to prevent D3cold from being used.
Link: https://lore.kernel.org/r/20240307163709.323-1-mario.limonciello@amd.com
Reported-by: Eric Heintzmann <heintzmann.eric@free.fr>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3229
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Eric Heintzmann <heintzmann.eric@free.fr>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmXw04sUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vyT3xAAsp5+c2IcbrXpZZM7figwx4y9PPRp
jcQ4AYSGP41xqTUGXUTcVZYvRorSIAFEOz33U0SL1UNxoOZz8j/M6SD58k8a6XRr
9SSPuKja1OKJjONhS1bzrcbVtuzr71ISrECXfLkvW5hY5hvq+3+anMtu3/UIEHu6
M1vVc+basRjjPJNTixMWvVqS3R+4gDAFeBtdZl/D+U6v0v2xOK+81YZqjfZZCw9v
xmdHHK2dKNEdysNoRJ5cafY3b1NnSsrxlHbIhBnKt+7uRSWKD1dHcBQj7wDc/HrX
yBGca+BZBKitXEJM3p5KcWWs4ijaywGw0GSffUIKrN9i6RIfwnxBH9jUbwDngifU
2IR/kLEqdjYi/WnENxIHpQATLyXhXZ8uEnLS0xMlRIA96u3M5B0mrYOZxaN3bo12
A3aE+aPOTw0u1wf7G8dBX6IdYnjZ/ZuR9Q+fVoKpZBvsYUVaKyiKCtKMCNaVirn5
z7nxR1W71ee+35+37KthPXhiw+YtURGz1wBWt+wWUMjBcpIj2bpzU9wQDE9ZMdYt
XJoJcatrRhJzefO3uzd0egft+vwk0xrj5LQEDhMQyDrnBLC4EgI5niKPWqbay5Nx
Cnll01CI82xAnIF6eu7OOuI1nYGtoFcY8rP3hTC85cWN7Xi8SAOLTZZcVTpfBMUr
l2uEll8p+8dZ6IY=
=AP3I
-----END PGP SIGNATURE-----
Merge tag 'pci-v6.9-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI updates from Bjorn Helgaas:
"Enumeration:
- Consolidate interrupt related code in irq.c (Ilpo Järvinen)
- Reduce kernel size by replacing sysfs resource macros with
functions (Ilpo Järvinen)
- Reduce kernel size by compiling sysfs support only when
CONFIG_SYSFS=y (Lukas Wunner)
- Avoid using Extended Tags on 3ware-9650SE Root Port to work around
an apparent hardware defect (Jörg Wedekind)
Resource management:
- Fix an MMIO mapping leak in pci_iounmap() (Philipp Stanner)
- Move pci_iomap.c and other PCI-specific devres code to drivers/pci
(Philipp Stanner)
- Consolidate PCI devres code in devres.c (Philipp Stanner)
Power management:
- Avoid D3cold on Asus B1400 PCI-NVMe bridge, where firmware doesn't
know how to return correctly to D0, and remove previous quirk that
wasn't as specific (Daniel Drake)
- Allow runtime PM when the driver enables it but doesn't need any
runtime PM callbacks (Raag Jadav)
- Drain runtime-idle callbacks before driver removal to avoid races
between .remove() and .runtime_idle(), which caused intermittent
page faults when the rtsx .runtime_idle() accessed registers that
its .remove() had already unmapped (Rafael J. Wysocki)
Virtualization:
- Avoid Secondary Bus Reset on LSI FW643 so it can be assigned to VMs
with VFIO, e.g., for professional audio software on many Apple
machines, at the cost of leaking state between VMs (Edmund Raile)
Error handling:
- Print all logged TLP Prefixes, not just the first, after AER or DPC
errors (Ilpo Järvinen)
- Quirk the DPC PIO log size for Intel Raptor Lake Root Ports, which
still don't advertise a legal size (Paul Menzel)
- Ignore expected DPC Surprise Down errors on hot removal (Smita
Koralahalli)
- Block runtime suspend while handling AER errors to avoid races that
prevent the device form being resumed from D3hot (Stanislaw
Gruszka)
Peer-to-peer DMA:
- Use atomic XA allocation in RCU read section (Christophe JAILLET)
ASPM:
- Collect bits of ASPM-related code that we need even without
CONFIG_PCIEASPM into aspm.c (David E. Box)
- Save/restore L1 PM Substates config for suspend/resume (David E.
Box)
- Update save_save when ASPM config is changed, so a .slot_reset()
during error recovery restores the changed config, not the
.probe()-time config (Vidya Sagar)
Endpoint framework:
- Refactor and improve pci_epf_alloc_space() API (Niklas Cassel)
- Clean up endpoint BAR descriptions (Niklas Cassel)
- Fix ntb_register_device() name leak in error path (Yang Yingliang)
- Return actual error code for pci_vntb_probe() failure (Yang
Yingliang)
Broadcom STB PCIe controller driver:
- Fix MDIO write polling, which previously never waited for
completion (Jonathan Bell)
Cadence PCIe endpoint driver:
- Clear the ARI "Next Function Number" of last function (Jasko-EXT
Wojciech)
Freescale i.MX6 PCIe controller driver:
- Simplify by replacing switch statements with function pointers for
different hardware variants (Frank Li)
- Simplify by using clk_bulk*() API (Frank Li)
- Remove redundant DT clock and reg/reg-name details (Frank Li)
- Add i.MX95 DT and driver support for both Root Complex and Endpoint
mode (Frank Li)
Microsoft Hyper-V host bridge driver:
- Reduce memory usage by limiting ring buffer size to 16KB instead of
4 pages (Michael Kelley)
Qualcomm PCIe controller driver:
- Add X1E80100 DT and driver support (Abel Vesa)
- Add DT 'required-opps' for SoCs that require a minimum performance
level (Johan Hovold)
- Make DT 'msi-map-mask' optional, depending on how MSI interrupts
are mapped (Johan Hovold)
- Disable ASPM L0s for sc8280xp, sa8540p and sa8295p because the PHY
configuration isn't tuned correctly for L0s (Johan Hovold)
- Split dt-binding qcom,pcie.yaml into qcom,pcie-common.yaml and
separate files for SA8775p, SC7280, SC8180X, SC8280XP, SM8150,
SM8250, SM8350, SM8450, SM8550 for easier reviewing (Krzysztof
Kozlowski)
- Enable BDF to SID translation by disabling bypass mode (Manivannan
Sadhasivam)
- Add endpoint MHI support for Snapdragon SA8775P SoC (Mrinmay
Sarkar)
Synopsys DesignWare PCIe controller driver:
- Allocate 64-bit MSI address if no 32-bit address is available (Ajay
Agarwal)
- Fix endpoint Resizable BAR to actually advertise the required 1MB
size (Niklas Cassel)
MicroSemi Switchtec management driver:
- Release resources if the .probe() fails (Christophe JAILLET)
Miscellaneous:
- Make pcie_port_bus_type const (Ricardo B. Marliere)"
* tag 'pci-v6.9-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (77 commits)
PCI/ASPM: Update save_state when configuration changes
PCI/ASPM: Disable L1 before configuring L1 Substates
PCI/ASPM: Call pci_save_ltr_state() from pci_save_pcie_state()
PCI/ASPM: Save L1 PM Substates Capability for suspend/resume
PCI: hv: Fix ring buffer size calculation
PCI: dwc: endpoint: Fix advertised resizable BAR size
PCI: cadence: Clear the ARI Capability Next Function Number of the last function
PCI: dwc: Strengthen the MSI address allocation logic
PCI: brcmstb: Fix broken brcm_pcie_mdio_write() polling
PCI: qcom: Add X1E80100 PCIe support
dt-bindings: PCI: qcom: Document the X1E80100 PCIe Controller
PCI: qcom: Enable BDF to SID translation properly
PCI/AER: Generalize TLP Header Log reading
PCI/AER: Use explicit register size for PCI_ERR_CAP
PCI: qcom: Disable ASPM L0s for sc8280xp, sa8540p and sa8295p
dt-bindings: PCI: qcom: Do not require 'msi-map-mask'
dt-bindings: PCI: qcom: Allow 'required-opps'
PCI/AER: Block runtime suspend when handling errors
PCI/ASPM: Move pci_save_ltr_state() to aspm.c
PCI/ASPM: Always build aspm.c
...
- Allow the Energy Model to be updated dynamically (Lukasz Luba).
- Add support for LZ4 compression algorithm to the hibernation image
creation and loading code (Nikhil V).
- Fix and clean up system suspend statistics collection (Rafael
Wysocki).
- Simplify device suspend and resume handling in the power management
core code (Rafael Wysocki).
- Fix PCI hibernation support description (Yiwei Lin).
- Make hibernation take set_memory_ro() return values into account as
appropriate (Christophe Leroy).
- Set mem_sleep_current during kernel command line setup to avoid an
ordering issue with handling it (Maulik Shah).
- Fix wake IRQs handling when pm_runtime_force_suspend() is used as a
driver's system suspend callback (Qingliang Li).
- Simplify pm_runtime_get_if_active() usage and add a replacement for
pm_runtime_put_autosuspend() (Sakari Ailus).
- Add a tracepoint for runtime_status changes tracking (Vilas Bhat).
- Fix section title markdown in the runtime PM documentation (Yiwei
Lin).
- Enable preferred core support in the amd-pstate cpufreq driver (Meng
Li).
- Fix min_perf assignment in amd_pstate_adjust_perf() and make the
min/max limit perf values in amd-pstate always stay within the
(highest perf, lowest perf) range (Tor Vic, Meng Li).
- Allow intel_pstate to assign model-specific values to strings used in
the EPP sysfs interface and make it do so on Meteor Lake (Srinivas
Pandruvada).
- Drop long-unused cpudata::prev_cummulative_iowait from the
intel_pstate cpufreq driver (Jiri Slaby).
- Prevent scaling_cur_freq from exceeding scaling_max_freq when the
latter is an inefficient frequency (Shivnandan Kumar).
- Change default transition delay in cpufreq to 2ms (Qais Yousef).
- Remove references to 10ms minimum sampling rate from comments in the
cpufreq code (Pierre Gondois).
- Honour transition_latency over transition_delay_us in cpufreq (Qais
Yousef).
- Stop unregistering cpufreq cooling on CPU hot-remove (Viresh Kumar).
- General enhancements / cleanups to ARM cpufreq drivers (tianyu2,
Nícolas F. R. A. Prado, Erick Archer, Arnd Bergmann, Anastasia
Belova).
- Update cpufreq-dt-platdev to block/approve devices (Richard Acayan).
- Make the SCMI cpufreq driver get a transition delay value from
firmware (Pierre Gondois).
- Prevent the haltpoll cpuidle governor from shrinking guest
poll_limit_ns below grow_start (Parshuram Sangle).
- Avoid potential overflow in integer multiplication when computing
cpuidle state parameters (C Cheng).
- Adjust MWAIT hint target C-state computation in the ACPI cpuidle
driver and in intel_idle to return a correct value for C0 (He
Rongguang).
- Address multiple issues in the TPMI RAPL driver and add support for
new platforms (Lunar Lake-M, Arrow Lake) to Intel RAPL (Zhang Rui).
- Fix freq_qos_add_request() return value check in dtpm_cpu (Daniel
Lezcano).
- Fix kernel-doc for dtpm_create_hierarchy() (Yang Li).
- Fix file leak in get_pkg_num() in x86_energy_perf_policy (Samasth
Norway Ananda).
- Fix cpupower-frequency-info.1 man page typo (Jan Kratochvil).
- Fix a couple of warnings in the OPP core code related to W=1
builds (Viresh Kumar).
- Move dev_pm_opp_{init|free}_cpufreq_table() to pm_opp.h (Viresh
Kumar).
- Extend dev_pm_opp_data with turbo support (Sibi Sankar).
- dt-bindings: drop maxItems from inner items (David Heidelberg).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmXvI/ISHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRx24sP/jxg6fOGme8raHQvpTXG3/H56wlGzQ4P
YUvvKUXnfD3yf1zNISsUl7VQebZqDt8rygkwSdymXlUVZX1eubN0RpCFc0F8GZuc
THG/YQhYQr/9zro3FpKhfDj5evk21PCQzjf+dGvfQF9qVMxNPG1JzEFK6PnolT5X
2BvkonY1XFWZjCMbZ83B/jt35lTDb0cmeNbCpfD5UJgcnxmMOtZYpORdyfPWTJpG
GVCwmAFVVXxXlust/AIpt3mmOpKzSA9GnrtJkhtQe5GN+Y4OjnJiFJmTC7EfCctj
JlWgVUA716mtFMUrjXgjfI54firF2oQpqaSa2HG/V/A96JWQqjarGz5dAV1IrPEt
ZmYpvMe4E90S411wF1OWyrEqjXUuDnH1OWUvUdWSt4E7DhFw3esDi/jLW2tyVKAT
hIy+/O4wzbDSTX/h9Cgt1Qjhew6lKUIwvhEXclB3fuJ+JoviWNkC9lnK93e2H0A3
VYfkd/lpUD74035l0FrCJ/49MjX9kqrsn+TipHsIlSXAi8ZRdKbVvxOTD8RYudcI
GvCiDDrkMgNwGlyedgbtTBUepCvSg93b+vVmRj7YMPtBhioOUo3qCn6wpqhxfnth
9BCnPW7JxqUw/NJdlk9hKumaUZq+MK8G+kdYcIDg6xmAkWSUVP2QKlWavfMCxqRP
+dN6T2iHsKFe
=UePT
-----END PGP SIGNATURE-----
Merge tag 'pm-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"From the functional perspective, the most significant change here is
the addition of support for Energy Models that can be updated
dynamically at run time.
There is also the addition of LZ4 compression support for hibernation,
the new preferred core support in amd-pstate, new platforms support in
the Intel RAPL driver, new model-specific EPP handling in intel_pstate
and more.
Apart from that, the cpufreq default transition delay is reduced from
10 ms to 2 ms (along with some related adjustments), the system
suspend statistics code undergoes a significant rework and there is a
usual bunch of fixes and code cleanups all over.
Specifics:
- Allow the Energy Model to be updated dynamically (Lukasz Luba)
- Add support for LZ4 compression algorithm to the hibernation image
creation and loading code (Nikhil V)
- Fix and clean up system suspend statistics collection (Rafael
Wysocki)
- Simplify device suspend and resume handling in the power management
core code (Rafael Wysocki)
- Fix PCI hibernation support description (Yiwei Lin)
- Make hibernation take set_memory_ro() return values into account as
appropriate (Christophe Leroy)
- Set mem_sleep_current during kernel command line setup to avoid an
ordering issue with handling it (Maulik Shah)
- Fix wake IRQs handling when pm_runtime_force_suspend() is used as a
driver's system suspend callback (Qingliang Li)
- Simplify pm_runtime_get_if_active() usage and add a replacement for
pm_runtime_put_autosuspend() (Sakari Ailus)
- Add a tracepoint for runtime_status changes tracking (Vilas Bhat)
- Fix section title markdown in the runtime PM documentation (Yiwei
Lin)
- Enable preferred core support in the amd-pstate cpufreq driver
(Meng Li)
- Fix min_perf assignment in amd_pstate_adjust_perf() and make the
min/max limit perf values in amd-pstate always stay within the
(highest perf, lowest perf) range (Tor Vic, Meng Li)
- Allow intel_pstate to assign model-specific values to strings used
in the EPP sysfs interface and make it do so on Meteor Lake
(Srinivas Pandruvada)
- Drop long-unused cpudata::prev_cummulative_iowait from the
intel_pstate cpufreq driver (Jiri Slaby)
- Prevent scaling_cur_freq from exceeding scaling_max_freq when the
latter is an inefficient frequency (Shivnandan Kumar)
- Change default transition delay in cpufreq to 2ms (Qais Yousef)
- Remove references to 10ms minimum sampling rate from comments in
the cpufreq code (Pierre Gondois)
- Honour transition_latency over transition_delay_us in cpufreq (Qais
Yousef)
- Stop unregistering cpufreq cooling on CPU hot-remove (Viresh Kumar)
- General enhancements / cleanups to ARM cpufreq drivers (tianyu2,
Nícolas F. R. A. Prado, Erick Archer, Arnd Bergmann, Anastasia
Belova)
- Update cpufreq-dt-platdev to block/approve devices (Richard Acayan)
- Make the SCMI cpufreq driver get a transition delay value from
firmware (Pierre Gondois)
- Prevent the haltpoll cpuidle governor from shrinking guest
poll_limit_ns below grow_start (Parshuram Sangle)
- Avoid potential overflow in integer multiplication when computing
cpuidle state parameters (C Cheng)
- Adjust MWAIT hint target C-state computation in the ACPI cpuidle
driver and in intel_idle to return a correct value for C0 (He
Rongguang)
- Address multiple issues in the TPMI RAPL driver and add support for
new platforms (Lunar Lake-M, Arrow Lake) to Intel RAPL (Zhang Rui)
- Fix freq_qos_add_request() return value check in dtpm_cpu (Daniel
Lezcano)
- Fix kernel-doc for dtpm_create_hierarchy() (Yang Li)
- Fix file leak in get_pkg_num() in x86_energy_perf_policy (Samasth
Norway Ananda)
- Fix cpupower-frequency-info.1 man page typo (Jan Kratochvil)
- Fix a couple of warnings in the OPP core code related to W=1 builds
(Viresh Kumar)
- Move dev_pm_opp_{init|free}_cpufreq_table() to pm_opp.h (Viresh
Kumar)
- Extend dev_pm_opp_data with turbo support (Sibi Sankar)
- dt-bindings: drop maxItems from inner items (David Heidelberg)"
* tag 'pm-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (95 commits)
dt-bindings: opp: drop maxItems from inner items
OPP: debugfs: Fix warning around icc_get_name()
OPP: debugfs: Fix warning with W=1 builds
cpufreq: Move dev_pm_opp_{init|free}_cpufreq_table() to pm_opp.h
OPP: Extend dev_pm_opp_data with turbo support
Fix cpupower-frequency-info.1 man page typo
cpufreq: scmi: Set transition_delay_us
firmware: arm_scmi: Populate fast channel rate_limit
firmware: arm_scmi: Populate perf commands rate_limit
cpuidle: ACPI/intel: fix MWAIT hint target C-state computation
PM: sleep: wakeirq: fix wake irq warning in system suspend
powercap: dtpm: Fix kernel-doc for dtpm_create_hierarchy() function
cpufreq: Don't unregister cpufreq cooling on CPU hotplug
PM: suspend: Set mem_sleep_current during kernel command line setup
cpufreq: Honour transition_latency over transition_delay_us
cpufreq: Limit resolving a frequency to policy min/max
Documentation: PM: Fix runtime_pm.rst markdown syntax
cpufreq: amd-pstate: adjust min/max limit perf
cpufreq: Remove references to 10ms min sampling rate
cpufreq: intel_pstate: Update default EPPs for Meteor Lake
...
- Collect interrupt-related code in irq.c (Ilpo Järvinen)
- Mark 3ware-9650SE Root Port Extended Tags as broken (Jörg Wedekind)
* pci/enumeration:
PCI: Mark 3ware-9650SE Root Port Extended Tags as broken
PCI: Place interrupt related code into irq.c
# Conflicts:
# drivers/pci/Makefile
- Unmap MMIO mappings in pci_iounmap() to avoid a leak when
ARCH_HAS_GENERIC_IOPORT_MAP is defined (Philipp Stanner)
- Move pci_iomap.c to drivers/pci/ since it's all PCI-related (Philipp
Stanner)
- Move other PCI-related devres code from lib/devres.c to drivers/pci/
(Philipp Stanner)
- Move other devres code from pci.c to devres.c (Philipp Stanner)
* pci/devres:
PCI: Move devres code from pci.c to devres.c
PCI: Move PCI-specific devres code to drivers/pci/
PCI: Move pci_iomap.c to drivers/pci/
pci_iounmap(): Fix MMIO mapping leak
- Collect ASPM-related code into aspm.c (David E. Box)
- Save and restore ASPM L1 PM Substates configuration so these states
continue working after suspend/resume (David E. Box)
- Move the ASPM L1.2-related LTR save/restore next to the ASPM save/restore
(David E. Box)
- Move the required L1 disable before L1 Substate configuration into
pci_restore_aspm_l1ss_state() (Bjorn Helgaas)
- Update save_save when ASPM config is changed, so a .slot_reset() during
error recovery restores the changed config, not the .probe()-time config
(Vidya Sagar)
* pci/aspm:
PCI/ASPM: Update save_state when configuration changes
PCI/ASPM: Disable L1 before configuring L1 Substates
PCI/ASPM: Call pci_save_ltr_state() from pci_save_pcie_state()
PCI/ASPM: Save L1 PM Substates Capability for suspend/resume
PCI/ASPM: Move pci_save_ltr_state() to aspm.c
PCI/ASPM: Always build aspm.c
PCI/ASPM: Move pci_configure_ltr() to aspm.c
Per PCIe r6.1, sec 5.5.4, L1 must be disabled while setting ASPM L1 PM
Substates enable bits. Previously this was enforced by clearing
PCI_EXP_LNKCTL_ASPMC before calling pci_restore_aspm_l1ss_state().
Move the L1 (and L0s, although that doesn't seem required) disable into
pci_restore_aspm_l1ss_state() itself so it's closer to the code that
depends on it.
Link: https://lore.kernel.org/r/20240223213733.GA115410@bhelgaas
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
4ff116d0d5 ("PCI/ASPM: Save L1 PM Substates Capability for
suspend/resume") restored the L1 PM Substates Capability after resume,
which reduced power consumption by making the ASPM L1.x states work after
resume.
a7152be79b ("Revert "PCI/ASPM: Save L1 PM Substates Capability for
suspend/resume"") reverted 4ff116d0d5 because resume failed on some
systems, so power consumption after resume increased again.
a7152be79b mentioned that we restore L1 PM substate configuration even
though ASPM L1 may already be enabled. This is due the fact that the
pci_restore_aspm_l1ss_state() was called before pci_restore_pcie_state().
Save and restore the L1 PM Substates Capability, following PCIe r6.1, sec
5.5.4 more closely by:
1) Do not restore ASPM configuration in pci_restore_pcie_state() but
do that after PCIe capability is restored in pci_restore_aspm_state()
following PCIe r6.1, sec 5.5.4.
2) If BIOS reenables L1SS, particularly L1.2, we need to clear the
enables in the right order, downstream before upstream. Defer
restoring the L1SS config until we are at the downstream component.
Then update the config for both ends of the link in the prescribed
order.
3) Program ASPM L1 PM substate configuration before L1 enables.
4) Program ASPM L1 PM substate enables last, after rest of the fields
in the capability are programmed.
[bhelgaas: commit log, squash L1SS-related patches, do both LNKCTL restores
in pci_restore_pcie_state()]
Link: https://lore.kernel.org/r/20240128233212.1139663-3-david.e.box@linux.intel.com
Link: https://lore.kernel.org/r/20240128233212.1139663-4-david.e.box@linux.intel.com
Link: https://lore.kernel.org/r/20240223205851.114931-5-helgaas@kernel.org
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217321
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216782
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216877
Co-developed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Co-developed-by: David E. Box <david.e.box@linux.intel.com>
Reported-by: Koba Ko <koba.ko@canonical.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Tasev Nikola <tasev.stefanoska@skynet.be> # Asus UX305FA
Cc: Mark Enriquez <enriquezmark36@gmail.com>
Cc: Thomas Witt <kernel@witt.link>
Cc: Werner Sembach <wse@tuxedocomputers.com>
Cc: Vidya Sagar <vidyas@nvidia.com>
ioremap_page_range() should be used for ranges within vmalloc range only.
The vmalloc ranges are allocated by get_vm_area(). PCI has "resource"
allocator that manages PCI_IOBASE, IO_SPACE_LIMIT address range, hence
introduce vmap_page_range() to be used exclusively to map pages
in PCI address space.
Fixes: 3e49a866c9 ("mm: Enforce VM_IOREMAP flag and range in ioremap_page_range.")
Reported-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Miguel Ojeda <ojeda@kernel.org>
Link: https://lore.kernel.org/bpf/CANiq72ka4rir+RTN2FQoT=Vvprp_Ao-CvoYEkSNqtSY+RZj+AA@mail.gmail.com
Merge changes related to the runtime power management of devices for
6.9-rc1:
- Simplify pm_runtime_get_if_active() usage and add a replacement for
pm_runtime_put_autosuspend() (Sakari Ailus).
- Add a tracepoint for runtime_status changes tracking (Vilas Bhat).
- Fix section title markdown in the runtime PM documentation (Yiwei
Lin).
* pm-runtime:
Documentation: PM: Fix runtime_pm.rst markdown syntax
PM: runtime: add tracepoint for runtime_status changes
PM: runtime: Add pm_runtime_put_autosuspend() replacement
PM: runtime: Simplify pm_runtime_get_if_active() usage
Both AER and DPC RP PIO provide TLP Header Log registers (PCIe r6.1 secs
7.8.4 & 7.9.14) to convey error diagnostics but the struct is named after
AER as the struct aer_header_log_regs. Also, not all places that handle TLP
Header Log use the struct and the struct members are named individually.
Generalize the struct name and members, and use it consistently where TLP
Header Log is being handled so that a pcie_read_tlp_log() helper can be
easily added.
Link: https://lore.kernel.org/r/20240206135717.8565-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: drop ixgbe changes for now, tidy whitespace]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Even when CONFIG_PCIEASPM is not set, we save and restore the LTR
Capability so that if ASPM L1.2 and LTR were configured by the platform,
ASPM L1.2 will still work after suspend/resume, when that platform
configuration may be lost. See dbbfadf231 ("PCI/ASPM: Save LTR Capability
for suspend/resume").
Since ASPM L1.2 depends on the LTR Capability, move the save/restore code
to the part of aspm.c that is always compiled regardless of
CONFIG_PCIEASPM. No functional change intended.
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20240128233212.1139663-5-david.e.box@linux.intel.com
[bhelgaas: commit log, reorder to make this a pure move]
Link: https://lore.kernel.org/r/20240223205851.114931-4-helgaas@kernel.org
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The Latency Tolerance Reporting (LTR) mechanism supports the ASPM L1.2
state and is only configured when CONFIG_PCIEASPM is set.
Move pci_configure_ltr() and pci_bridge_reconfigure_ltr() into aspm.c since
they only build when CONFIG_PCIEASPM is set. No functional change
intended.
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20240128233212.1139663-2-david.e.box@linux.intel.com
[bhelgaas: commit log, split build change from function moves]
Link: https://lore.kernel.org/r/20240223205851.114931-2-helgaas@kernel.org
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmXP7J8UHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vzs1RAApf7x7D+8yr7gURm2iDxaTnsVmhgS
7jqvyqGsKHNZBl8EZ3kJYLLOffZNTx9FtdavunWUT/sho76v/igV1z597d4PgVYJ
nyqo2ASol5aQgSlA/oiXcVJhagghiSyKeXLBEwZe13p2U5LSPMIpCV2DOevpb9Zc
YclgSfTk+Ne9N522GkAQEwjegXNY2F+Nkfmztj3AG4c0nv9ywya3AmjJsiTwbYCs
HwkizpBe7Id9YkfbC65VG4DEK2gMzPskvn8a4R0e2xJDZyRT2HAGiXSBO1OMxfR9
D2j/iXSJwMmLy5YOqgjP8SbLBNW1YwekapBuyht7VDd/TDDyEWNSFwgFLtkOtWs7
P6GEOUuoe8JfBNXxrSf+iGEJu5f6h6E2blrRASqXdd/Rffwi/l6Zi3otcp/9LQEZ
zG2h3fuu884LBBLG6SOdCURh60uFpZOzVmZfsc0malis0M+cOB8zo8Rdb0mkabge
07/S1R/lgdEZbApm6Skd4b/RrYpvo28zTfEe1yJ08UzqhfoGt/bKRStlhiIbfr6d
ZN/hBx1DEnSH3fbgWrlKd/dz+9bcQ2xO65wrRKfMpGbREkBoumAoUHPQkMbfjYbj
Iqr534wYR0Kbuo63g+7Bm+lM4ivcIMmJIRzq3G0jzoRQEyINI49IBD/i3VBefej7
PCEB9+dKE71H/BQ=
=qDwL
-----END PGP SIGNATURE-----
Merge tag 'pci-v6.8-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fixes from Bjorn Helgaas:
- Keep bridges in D0 if we need to poll downstream devices for PME to
resolve a v6.6 regression where we failed to enumerate devices below
bridges put in D3hot by runtime PM, e.g., NVMe drives connected via
Thunderbolt or USB4 docks (Alex Williamson)
- Add Siddharth Vadapalli as PCI TI DRA7XX/J721E reviewer
* tag 'pci-v6.8-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
MAINTAINERS: Add Siddharth Vadapalli as PCI TI DRA7XX/J721E reviewer
PCI: Fix active state requirement in PME polling
The file pci.c is very large and contains a number of devres functions.
These functions should now reside in devres.c.
Move as much devres-specific code from pci.c to devres.c as possible.
There are a few callers left in pci.c that do devres operations. These
should be ported in the future. Add corresponding TODOs.
The reason they are not moved right now in this commit is that PCI's devres
currently implements a sort of "hybrid-mode": pci_request_region(), for
instance, does not have a corresponding pcim_ equivalent, yet. Instead, the
function can be made managed by previously calling pcim_enable_device()
(instead of pci_enable_device()). This makes it unreasonable to move
pci_request_region() to devres.c. Moving the functions would require
changes to PCI's API and is, therefore, left for future work.
In summary, this commit serves as a preparation step for a following
patch series that will cleanly separate the PCI's managed and unmanaged
API.
Link: https://lore.kernel.org/r/20240131090023.12331-5-pstanner@redhat.com
Suggested-by: Danilo Krummrich <dakr@redhat.com>
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
There are two ways to opportunistically increment a device's runtime PM
usage count, calling either pm_runtime_get_if_active() or
pm_runtime_get_if_in_use(). The former has an argument to tell whether to
ignore the usage count or not, and the latter simply calls the former with
ign_usage_count set to false. The other users that want to ignore the
usage_count will have to explicitly set that argument to true which is a
bit cumbersome.
To make this function more practical to use, remove the ign_usage_count
argument from the function. The main implementation is in a static
function called pm_runtime_get_conditional() and implementations of
pm_runtime_get_if_active() and pm_runtime_get_if_in_use() are moved to
runtime.c.
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Takashi Iwai <tiwai@suse.de> # sound/
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> # drivers/accel/ivpu/
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> # drivers/gpu/drm/i915/
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # drivers/pci/
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The commit noted in fixes added a bogus requirement that runtime PM managed
devices need to be in the RPM_ACTIVE state for PME polling. In fact, only
devices in low power states should be polled.
However there's still a requirement that the device config space must be
accessible, which has implications for both the current state of the polled
device and the parent bridge, when present. It's not sufficient to assume
the bridge remains in D0 and cases have been observed where the bridge
passes the D0 test, but the PM state indicates RPM_SUSPENDING and config
space of the polled device becomes inaccessible during pci_pme_wakeup().
Therefore, since the bridge is already effectively required to be in the
RPM_ACTIVE state, formalize this in the code and elevate the PM usage count
to maintain the state while polling the subordinate device.
This resolves a regression reported in the bugzilla below where a
Thunderbolt/USB4 hierarchy fails to scan for an attached NVMe endpoint
downstream of a bridge in a D3hot power state.
Link: https://lore.kernel.org/r/20240123185548.1040096-1-alex.williamson@redhat.com
Fixes: d3fcd73603 ("PCI: Fix runtime PM race with PME polling")
Reported-by: Sanath S <sanath.s@amd.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218360
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Sanath S <sanath.s@amd.com>
Reviewed-by: Rafael J. Wysocki <rafael@kernel.org>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
A last minute revert in 6.7-final introduced a potential deadlock when
enabling ASPM during probe of Qualcomm PCIe controllers as reported by
lockdep:
============================================
WARNING: possible recursive locking detected
6.7.0 #40 Not tainted
--------------------------------------------
kworker/u16:5/90 is trying to acquire lock:
ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pcie_aspm_pm_state_change+0x58/0xdc
but task is already holding lock:
ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pci_walk_bus+0x34/0xbc
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(pci_bus_sem);
lock(pci_bus_sem);
*** DEADLOCK ***
Call trace:
print_deadlock_bug+0x25c/0x348
__lock_acquire+0x10a4/0x2064
lock_acquire+0x1e8/0x318
down_read+0x60/0x184
pcie_aspm_pm_state_change+0x58/0xdc
pci_set_full_power_state+0xa8/0x114
pci_set_power_state+0xc4/0x120
qcom_pcie_enable_aspm+0x1c/0x3c [pcie_qcom]
pci_walk_bus+0x64/0xbc
qcom_pcie_host_post_init_2_7_0+0x28/0x34 [pcie_qcom]
The deadlock can easily be reproduced on machines like the Lenovo ThinkPad
X13s by adding a delay to increase the race window during asynchronous
probe where another thread can take a write lock.
Add a new pci_set_power_state_locked() and associated helper functions that
can be called with the PCI bus semaphore held to avoid taking the read lock
twice.
Link: https://lore.kernel.org/r/ZZu0qx2cmn7IwTyQ@hovoldconsulting.com
Link: https://lore.kernel.org/r/20240130100243.11011-1-johan+linaro@kernel.org
Fixes: f93e71aea6 ("Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"")
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: <stable@vger.kernel.org> # 6.7
- Add support for parsing the Coherent Device Attribute Table (CDAT)
- Add support for calculating a platform CXL QoS class from CDAT data
- Unify the tracing of EFI CXL Events with native CXL Events.
- Add Get Timestamp support
- Miscellaneous cleanups and fixups
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCZaHVvAAKCRDfioYZHlFs
Z3sCAQDPHSsHmj845k4lvKbWjys3eh78MKKEFyTXLQgYhOlsGAEAigQY2ZiSum52
nwdIgpOOADNt0Iq6yXuLsmn9xvY9bAU=
=HjCl
-----END PGP SIGNATURE-----
Merge tag 'cxl-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull CXL (Compute Express Link) updates from Dan Williams:
"The bulk of this update is support for enumerating the performance
capabilities of CXL memory targets and connecting that to a platform
CXL memory QoS class. Some follow-on work remains to hook up this data
into core-mm policy, but that is saved for v6.9.
The next significant update is unifying how CXL event records (things
like background scrub errors) are processed between so called
"firmware first" and native error record retrieval. The CXL driver
handler that processes the record retrieved from the device mailbox is
now the handler for that same record format coming from an EFI/ACPI
notification source.
This also contains miscellaneous feature updates, like Get Timestamp,
and other fixups.
Summary:
- Add support for parsing the Coherent Device Attribute Table (CDAT)
- Add support for calculating a platform CXL QoS class from CDAT data
- Unify the tracing of EFI CXL Events with native CXL Events.
- Add Get Timestamp support
- Miscellaneous cleanups and fixups"
* tag 'cxl-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (41 commits)
cxl/core: use sysfs_emit() for attr's _show()
cxl/pci: Register for and process CPER events
PCI: Introduce cleanup helpers for device reference counts and locks
acpi/ghes: Process CXL Component Events
cxl/events: Create a CXL event union
cxl/events: Separate UUID from event structures
cxl/events: Remove passing a UUID to known event traces
cxl/events: Create common event UUID defines
cxl/events: Promote CXL event structures to a core header
cxl: Refactor to use __free() for cxl_root allocation in cxl_endpoint_port_probe()
cxl: Refactor to use __free() for cxl_root allocation in cxl_find_nvdimm_bridge()
cxl: Fix device reference leak in cxl_port_perf_data_calculate()
cxl: Convert find_cxl_root() to return a 'struct cxl_root *'
cxl: Introduce put_cxl_root() helper
cxl/port: Fix missing target list lock
cxl/port: Fix decoder initialization when nr_targets > interleave_ways
cxl/region: fix x9 interleave typo
cxl/trace: Pass UUID explicitly to event traces
cxl/region: use %pap format to print resource_size_t
cxl/region: Add dev_dbg() detail on failure to allocate HPA space
...
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmWldYsUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vyxUhAAs2ctoK/sMAfTOO2b1UAD/ig7CGGz
DlDt38RezFU4uqeY0Ix4heFs3RIt8YGuns76Fejfyevh1I7SOA9lbhFuMLBfO9j0
LU+KuZeGoXtIe5Kd6hCQIUgVvwISs407yp7JUUzqxFQ2rv7bin64xiDb407ZQGaK
5v4oRsnQn1KBhgZ2wfQ/S+adAma9IroK9F3C/Bm+IJ+mpNxJcbWPqnf9+5ExoxzU
MFyu0azan1crqWA/geJBetL4zVoRJx4qNEve0gqwk06vwLeIKyzB2jPO5dmn9pAb
kfAFCQgtTUGZHvZWyBZMWQcMKEQLSupOLYXU4b2Vf+oR9U0jvevqs3LArBsUceM9
vQw8Vg9RZiWs9lVeVYSQErYQecMhdiHYCXFuteaNH9tvATN4PumXiT2ZM9OsX6uy
jrXW7YLawJbGLIDNsAyrn8JESzY/CsRPpCIUq3JzL2VQdInC3mEl18rTEuKTBeZF
zE/RgwudhWDT58/vceS2LHa5KNd/vAzMTmUHEUwHg1N7TV3qkSgpPaVcvx4KklXv
1nKT2KcfD5K1Yy/InjxUYdGhRPYa7azl+l7W4hJ+NCGxwL+tUCg3knp80+empTJ0
mZm6/VSbc245nKjx3ydLlTbQ/xNMQXgHHDKPW6eO4ezZaydJZG2xkK3x6eF1+i0k
PWHSLjUxrK1AGrg=
=ri0M
-----END PGP SIGNATURE-----
Merge tag 'pci-v6.8-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Reserve ECAM so we don't assign it to PCI BARs; this works around
bugs where BIOS included ECAM in a PNP0A03 host bridge window,
didn't reserve it via a PNP0C02 motherboard device, and didn't
allocate space for SR-IOV VF BARs (Bjorn Helgaas)
- Add MMCONFIG/ECAM debug logging (Bjorn Helgaas)
- Rename 'MMCONFIG' to 'ECAM' to match spec usage (Bjorn Helgaas)
- Log device type (Root Port, Switch Port, etc) during enumeration
(Bjorn Helgaas)
- Log bridges before downstream devices so the dmesg order is more
logical (Bjorn Helgaas)
- Log resource names (BAR 0, VF BAR 0, bridge window, etc)
consistently instead of a mix of names and "reg 0x10" (Puranjay
Mohan, Bjorn Helgaas)
- Fix 64GT/s effective data rate calculation to use 1b/1b encoding
rather than the 8b/10b or 128b/130b used by lower rates (Ilpo
Järvinen)
- Use PCI_HEADER_TYPE_* instead of literals in x86, powerpc, SCSI
lpfc (Ilpo Järvinen)
- Clean up open-coded PCIBIOS return code mangling (Ilpo Järvinen)
Resource management:
- Restructure pci_dev_for_each_resource() to avoid computing the
address of an out-of-bounds array element (the bounds check was
performed later so the element was never actually *read*, but it's
nicer to avoid even computing an out-of-bounds address) (Andy
Shevchenko)
Driver binding:
- Convert pci-host-common.c platform .remove() callback to
.remove_new() returning 'void' since it's not useful to return
error codes here (Uwe Kleine-König)
- Convert exynos, keystone, kirin from .remove() to .remove_new(),
which returns void instead of int (Uwe Kleine-König)
- Drop unused struct pci_driver.node member (Mathias Krause)
Virtualization:
- Add ACS quirk for more Zhaoxin Root Ports (LeoLiuoc)
Error handling:
- Log AER errors as "Correctable" (not "Corrected") or
"Uncorrectable" to match spec terminology (Bjorn Helgaas)
- Decode Requester ID when no error info found instead of printing
the raw hex value (Bjorn Helgaas)
Endpoint framework:
- Use a unique test pattern for each BAR in the pci_endpoint_test to
make it easier to debug address translation issues (Niklas Cassel)
Broadcom STB PCIe controller driver:
- Add DT property "brcm,clkreq-mode" and driver support for different
CLKREQ# modes to make ASPM L1.x states possible (Jim Quinlan)
Freescale Layerscape PCIe controller driver:
- Add suspend/resume support for Layerscape LS1043a and LS1021a,
including software-managed PME_Turn_Off and transitions between L0,
L2/L3_Ready Link states (Frank Li)
MediaTek PCIe controller driver:
- Clear MSI interrupt status before handler to avoid missing MSIs
that occur after the handler (qizhong cheng)
MediaTek PCIe Gen3 controller driver:
- Update mediatek-gen3 translation window setup to handle MMIO space
that is not a power of two in size (Jianjun Wang)
Qualcomm PCIe controller driver:
- Increase qcom iommu-map maxItems to accommodate SDX55 (five
entries) and SDM845 (sixteen entries) (Krzysztof Kozlowski)
- Describe qcom,pcie-sc8180x clocks and resets accurately (Krzysztof
Kozlowski)
- Describe qcom,pcie-sm8150 clocks and resets accurately (Krzysztof
Kozlowski)
- Correct the qcom "reset-name" property, previously incorrectly
called "reset-names" (Krzysztof Kozlowski)
- Document qcom,pcie-sm8650, based on qcom,pcie-sm8550 (Neil
Armstrong)
Renesas R-Car PCIe controller driver:
- Replace of_device.h with explicit of.h include to untangle header
usage (Rob Herring)
- Add DT and driver support for optional miniPCIe 1.5v and 3.3v
regulators on KingFisher (Wolfram Sang)
SiFive FU740 PCIe controller driver:
- Convert fu740 CONFIG_PCIE_FU740 dependency from SOC_SIFIVE to
ARCH_SIFIVE (Conor Dooley)
Synopsys DesignWare PCIe controller driver:
- Align iATU mapping for endpoint MSI-X (Niklas Cassel)
- Drop "host_" prefix from struct dw_pcie_host_ops members (Yoshihiro
Shimoda)
- Drop "ep_" prefix from struct dw_pcie_ep_ops members (Yoshihiro
Shimoda)
- Rename struct dw_pcie_ep_ops.func_conf_select() to
.get_dbi_offset() to be more descriptive (Yoshihiro Shimoda)
- Add Endpoint DBI accessors to encapsulate offset lookups (Yoshihiro
Shimoda)
TI J721E PCIe driver:
- Add j721e DT and driver support for 'num-lanes' for devices that
support x1, x2, or x4 Links (Matt Ranostay)
- Add j721e DT compatible strings and driver support for j784s4 (Matt
Ranostay)
- Make TI J721E Kconfig depend on ARCH_K3 since the hardware is
specific to those TI SoC parts (Peter Robinson)
TI Keystone PCIe controller driver:
- Hold power management references to all PHYs while enabling them to
avoid a race when one provides clocks to others (Siddharth
Vadapalli)
Xilinx XDMA PCIe controller driver:
- Remove redundant dev_err(), since platform_get_irq() and
platform_get_irq_byname() already log errors (Yang Li)
- Fix uninitialized symbols in xilinx_pl_dma_pcie_setup_irq()
(Krzysztof Wilczyński)
- Fix xilinx_pl_dma_pcie_init_irq_domain() error return when
irq_domain_add_linear() fails (Harshit Mogalapalli)
MicroSemi Switchtec management driver:
- Do dma_mrpc cleanup during switchtec_pci_remove() to match its devm
ioremapping in switchtec_pci_probe(). Previously the cleanup was
done in stdev_release(), which used stale pointers if stdev->cdev
happened to be open when the PCI device was removed (Daniel
Stodden)
Miscellaneous:
- Convert interrupt terminology from "legacy" to "INTx" to be more
specific and match spec terminology (Damien Le Moal)
- In dw-xdata-pcie, pci_endpoint_test, and vmd, replace usage of
deprecated ida_simple_*() API with ida_alloc() and ida_free()
(Christophe JAILLET)"
* tag 'pci-v6.8-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (97 commits)
PCI: Fix kernel-doc issues
PCI: brcmstb: Configure HW CLKREQ# mode appropriate for downstream device
dt-bindings: PCI: brcmstb: Add property "brcm,clkreq-mode"
PCI: mediatek-gen3: Fix translation window size calculation
PCI: mediatek: Clear interrupt status before dispatching handler
PCI: keystone: Fix race condition when initializing PHYs
PCI: xilinx-xdma: Fix error code in xilinx_pl_dma_pcie_init_irq_domain()
PCI: xilinx-xdma: Fix uninitialized symbols in xilinx_pl_dma_pcie_setup_irq()
PCI: rcar-gen4: Fix -Wvoid-pointer-to-enum-cast error
PCI: iproc: Fix -Wvoid-pointer-to-enum-cast warning
PCI: dwc: Add dw_pcie_ep_{read,write}_dbi[2] helpers
PCI: dwc: Rename .func_conf_select to .get_dbi_offset in struct dw_pcie_ep_ops
PCI: dwc: Rename .ep_init to .init in struct dw_pcie_ep_ops
PCI: dwc: Drop host prefix from struct dw_pcie_host_ops members
misc: pci_endpoint_test: Use a unique test pattern for each BAR
PCI: j721e: Make TI J721E depend on ARCH_K3
PCI: j721e: Add TI J784S4 PCIe configuration
PCI/AER: Use explicit register sizes for struct members
PCI/AER: Decode Requester ID when no error info found
PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors
...
This reverts commit 08d0cc5f34.
Michael reported that when attempting to resume from suspend to RAM on ASUS
mini PC PN51-BB757MDE1 (DMI model: MINIPC PN51-E1), 08d0cc5f34
("PCI/ASPM: Remove pcie_aspm_pm_state_change()") caused a 12-second delay
with no output, followed by a reboot.
Workarounds include:
- Reverting 08d0cc5f34 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()")
- Booting with "pcie_aspm=off"
- Booting with "pcie_aspm.policy=performance"
- "echo 0 | sudo tee /sys/bus/pci/devices/0000:03:00.0/link/l1_aspm"
before suspending
- Connecting a USB flash drive
Link: https://lore.kernel.org/r/20240102232550.1751655-1-helgaas@kernel.org
Fixes: 08d0cc5f34 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()")
Reported-by: Michael Schaller <michael@5challer.de>
Link: https://lore.kernel.org/r/76c61361-b8b4-435f-a9f1-32b716763d62@5challer.de
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: <stable@vger.kernel.org>
The latency is calculated by dividing the flit size over the bandwidth. Add
support to retrieve the flit size for the CXL switch device and calculate
the latency of the PCIe link. Cache the latency number with cxl_dport.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/170319621931.2212653.6800240203604822886.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Use the pci_resource_name() to get the name of the resource and use it
while printing log messages.
[bhelgaas: rename to match struct resource * names, also use names in other
BAR messages]
Link: https://lore.kernel.org/r/20211106112606.192563-3-puranjay12@gmail.com
Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The PCI log messages print the register offsets at some places and BAR
numbers at other places. There is no uniformity in this logging mechanism.
It would be better to print names than register offsets.
Add a helper function that aids in printing more meaningful information
about the BAR numbers like "VF BAR", "ROM", "bridge window", etc. This
function can be called while printing PCI log messages.
[bhelgaas: fold in Lukas' static array suggestion from
https://lore.kernel.org/all/20211106115831.GA7452@wunner.de/]
Link: https://lore.kernel.org/r/20211106112606.192563-2-puranjay12@gmail.com
Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Currently, the time it took a PCI device to become ready after reset is
only printed if it was longer than 1000ms ('PCI_RESET_WAIT'). However,
for debugging purposes it is useful to know this time even if it was
shorter. For example, with the device I am working on, hardware
engineers asked to verify that it becomes ready on the first try (no
delay).
To that end, add a debug level print that can be enabled using dynamic
debug. Example:
# echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/reset
# dmesg -c | grep ready
# echo "file drivers/pci/pci.c +p" > /sys/kernel/debug/dynamic_debug/control
# echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/reset
# dmesg -c | grep ready
[ 396.060335] mlxsw_spectrum4 0000:01:00.0: ready 0ms after bus reset
# echo "file drivers/pci/pci.c -p" > /sys/kernel/debug/dynamic_debug/control
# echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/reset
# dmesg -c | grep ready
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Use FIELD_GET()/FIELD_PREP() when possible throughout drivers/pci/ (Ilpo
Järvinen, Bjorn Helgaas)
- Rework DPC control programming for clarity (Ilpo Järvinen)
* pci/field-get:
PCI/portdrv: Use FIELD_GET()
PCI/VC: Use FIELD_GET()
PCI/PTM: Use FIELD_GET()
PCI/PME: Use FIELD_GET()
PCI/ATS: Use FIELD_GET()
PCI/ATS: Show PASID Capability register width in bitmasks
PCI: Use FIELD_GET() in Sapphire RX 5600 XT Pulse quirk
PCI: Use FIELD_GET()
PCI/MSI: Use FIELD_GET/PREP()
PCI/DPC: Use defines with DPC reason fields
PCI/DPC: Use defined fields with DPC_CTL register
PCI/DPC: Use FIELD_GET()
PCI: hotplug: Use FIELD_GET/PREP()
PCI: dwc: Use FIELD_GET/PREP()
PCI: cadence: Use FIELD_GET()
PCI: Use FIELD_GET() to extract Link Width
PCI: mvebu: Use FIELD_PREP() with Link Width
PCI: tegra194: Use FIELD_GET()/FIELD_PREP() with Link Width fields
# Conflicts:
# drivers/pci/controller/dwc/pcie-tegra194.c
- Simplify config accessor error checking (Ilpo Järvinen)
* pci/config-errs:
scsi: ipr: Do PCI error checks on own line
PCI: xgene: Do PCI error check on own line & keep return value
PCI: Do error check on own line to split long "if" conditions
atm: iphase: Do PCI error checks on own line
sh: pci: Do PCI error check on own line
alpha: Streamline convoluted PCI error handling
Use FIELD_GET() to remove dependences on the field position, i.e., the
shift value. No functional change intended.
Separate because this isn't as trivial as the other FIELD_GET() changes.
See 907830b0fc ("PCI: Add a REBAR size quirk for Sapphire RX 5600 XT
Pulse")
Link: https://lore.kernel.org/r/20231010204436.1000644-3-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: Nirmoy Das <nirmoy.das@amd.com>
Use FIELD_GET() and FIELD_PREP() to remove dependences on the field
position, i.e., the shift value. No functional change intended.
Link: https://lore.kernel.org/r/20231010204436.1000644-2-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Use FIELD_GET() to extract PCIe Negotiated and Maximum Link Width fields
instead of custom masking and shifting.
Link: https://lore.kernel.org/r/20230919125648.1920-7-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: drop duplicate include of <linux/bitfield.h>]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Placing PCI error code check inside "if" condition usually results in need
to split lines. Combined with additional conditions the "if" condition
becomes messy.
Convert to the usual error handling pattern with an additional variable to
improve code readability. In addition, reverse the logic in
pci_find_vsec_capability() to get rid of &&.
No functional changes intended.
Link: https://lore.kernel.org/r/20230911125354.25501-5-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: PCI_POSSIBLE_ERROR()]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Replace literals under drivers/pci/ with PCI_HEADER_TYPE_MASK,
PCI_HEADER_TYPE_NORMAL, and PCI_HEADER_TYPE_MFD.
Also replace !! boolean conversions with FIELD_GET().
Link: https://lore.kernel.org/r/20231003125300.5541-4-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com> # for Renesas R-Car
- Reorder struct pci_dev to avoid holes and reduce size (Christophe
JAILLET)
- Change pdev->rom_attr_enabled to single bit since it's only a boolean
value (Christophe JAILLET)
- Use struct_size() in pirq_convert_irt_table() instead of hand-writing it
(Christophe JAILLET)
- Explicitly include correct DT includes to untangle headers (Rob Herring)
- Fix a DOE race between destroy_work_on_stack() and the stack-allocated
task->work struct going out of scope in pci_doe() (Ira Weiny)
- Use pci_dev_id() when possible instead of manually composing ID from
dev->bus->number and dev->devfn (Xiongfeng Wang, Zheng Zengkai)
- Move pci_create_resource_files() declarations to linux/pci.h for alpha
build warnings (Arnd Bergmann)
- Remove unused hotplug function declarations (Yue Haibing)
- Remove unused mvebu struct mvebu_pcie.busn (Pali Rohár)
- Unexport pcie_port_bus_type (Bjorn Helgaas)
- Remove unnecessary sysfs ID local variable initialization (Bjorn Helgaas)
- Fix BAR value printk formatting to accommodate 32-bit values (Bjorn
Helgaas)
- Use consistent pointer types for config access syscall get_user() and
put_user() uses (Bjorn Helgaas)
- Simplify AER_RECOVER_RING_SIZE definition (Bjorn Helgaas)
- Simplify pci_pio_to_address() (Bjorn Helgaas)
- Simplify pci_dev_driver() (Bjorn Helgaas)
- Fix pci_bus_resetable(), pci_slot_resetable() name typos (Bjorn Helgaas)
- Fix code and doc typos and code formatting (Bjorn Helgaas)
- Tidy config space save/restore messages (Bjorn Helgaas)
* pci/misc:
PCI: Tidy config space save/restore messages
PCI: Fix code formatting inconsistencies
PCI: Fix typos in docs and comments
PCI: Fix pci_bus_resetable(), pci_slot_resetable() name typos
PCI: Simplify pci_dev_driver()
PCI: Simplify pci_pio_to_address()
PCI/AER: Simplify AER_RECOVER_RING_SIZE definition
PCI: Use consistent put_user() pointer types
PCI: Fix printk field formatting
PCI: Remove unnecessary initializations
PCI: Unexport pcie_port_bus_type
PCI: mvebu: Remove unused busn member
PCI: Remove unused function declarations
PCI/sysfs: Move declarations to linux/pci.h
PCI/P2PDMA: Use pci_dev_id() to simplify the code
PCI/IOV: Use pci_dev_id() to simplify the code
PCI/AER: Use pci_dev_id() to simplify the code
PCI: apple: Use pci_dev_id() to simplify the code
PCI/DOE: Fix destroy_work_on_stack() race
PCI: Explicitly include correct DT includes
x86/PCI: Use struct_size() in pirq_convert_irt_table()
PCI: Change pdev->rom_attr_enabled to single bit
PCI: Reorder pci_dev fields to reduce holes
- Ensure device is accessible before VPD access via sysfs (Alex Williamson)
- Ensure device doesn't go to a low-power state while we're polling for PME
(Alex Williamson)
* pci/vpd:
PCI: Fix runtime PM race with PME polling
PCI/VPD: Add runtime power management to sysfs interface
- Only read PCI_PM_CTRL register when available, to avoid reading the wrong
register and corrupting dev->current_state (Feiyang Chen)
* pci/pm:
PCI/PM: Only read PCI_PM_CTRL register when available
For a device with no Power Management Capability, pci_power_up() previously
returned 0 (success) if the platform was able to put the device in D0,
which led to pci_set_full_power_state() trying to read PCI_PM_CTRL, even
though it doesn't exist.
Since dev->pm_cap == 0 in this case, pci_set_full_power_state() actually
read the wrong register, interpreted it as PCI_PM_CTRL, and corrupted
dev->current_state. This led to messages like this in some cases:
pci 0000:01:00.0: Refused to change power state from D3hot to D0
To prevent this, make pci_power_up() always return a negative failure code
if the device lacks a Power Management Capability, even if non-PCI platform
power management has been able to put the device in D0. The failure will
prevent pci_set_full_power_state() from trying to access PCI_PM_CTRL.
Fixes: e200904b27 ("PCI/PM: Split pci_power_up()")
Link: https://lore.kernel.org/r/20230824013738.1894965-1-chenfeiyang@loongson.cn
Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: stable@vger.kernel.org # v5.19+
Update config space save/restore debug messages so they line up better.
Previously:
nvme 0000:05:00.0: saving config space at offset 0x4 (reading 0x20100006)
nvme 0000:05:00.0: saving config space at offset 0x8 (reading 0x1080200)
nvme 0000:05:00.0: saving config space at offset 0xc (reading 0x0)
nvme 0000:05:00.0: restoring config space at offset 0x4 (was 0x0, writing 0x20100006)
Now:
nvme 0000:05:00.0: save config 0x04: 0x20100006
nvme 0000:05:00.0: save config 0x08: 0x01080200
nvme 0000:05:00.0: save config 0x0c: 0x00000000
nvme 0000:05:00.0: restore config 0x04: 0x00000000 -> 0x20100006
No functional change intended. Enable these messages by setting
CONFIG_DYNAMIC_DEBUG=y and adding 'dyndbg="file drivers/pci/* +p"'
to kernel parameters.
Link: https://lore.kernel.org/r/20230823191831.476579-1-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Fix typos in the pci_bus_resetable() and pci_slot_resetable() function
names. No functional change intended.
Link: https://lore.kernel.org/r/20230824193712.542167-10-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Simplify pci_pio_to_address() by removing an unnecessary local variable.
No functional change intended.
Link: https://lore.kernel.org/r/20230824193712.542167-8-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Testing that a device is not currently in a low power state provides no
guarantees that the device is not imminently transitioning to such a state.
Increment the PM usage counter before accessing the device. Since we don't
wish to wake the device for PME polling, do so only if the device is
already active by using pm_runtime_get_if_active().
Link: https://lore.kernel.org/r/20230803171233.3810944-3-alex.williamson@redhat.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Don't assume that the device is fully under the control of PCI core. Use
RMW capability accessors in link retraining which do proper locking to
avoid losing concurrent updates to the register values.
Suggested-by: Lukas Wunner <lukas@wunner.de>
Fixes: 4ec73791a6 ("PCI: Work around Pericom PCIe-to-PCI bridge Retrain Link erratum")
Fixes: 7d715a6c1a ("PCI: add PCI Express ASPM support")
Link: https://lore.kernel.org/r/20230717120503.15276-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: "Rafael J. Wysocki" <rafael@kernel.org>
- Reduce wait time for secondary bus to be ready to speed up resume (Mika
Westerberg)
- Avoid putting EloPOS E2/S2/H2 (as well as Elo i2) PCIe Ports in D3cold
(Ondrej Zary)
- Call _REG when transitioning D-states so AML that uses the PCI config
space OpRegion works, which fixes some ASMedia GPIO controllers (Mario
Limonciello)
* pci/pm:
PCI/ACPI: Call _REG when transitioning D-states
PCI/ACPI: Validate acpi_pci_set_power_state() parameter
PCI/PM: Avoid putting EloPOS E2/S2/H2 PCIe Ports in D3cold
PCI/PM: Shorten pci_bridge_wait_for_secondary_bus() wait time for slow links
- Add PCI_EXT_CAP_ID_PL_32GT define (Ben Dooks)
- Propagate firmware node by calling device_set_node() for better
modularity (Andy Shevchenko)
- Discover Data Link Layer Link Active Reporting earlier so quirks can take
advantage of it (Maciej W. Rozycki)
- Use cached Data Link Layer Link Active Reporting capability in pciehp,
powerpc/eeh, and mlx5 (Maciej W. Rozycki)
- Run quirk for devices that require OS to clear Retrain Link earlier, so
later quirks can rely on it (Maciej W. Rozycki)
- Export pcie_retrain_link() for use outside ASPM (Maciej W. Rozycki)
- Add Data Link Layer Link Active Reporting as another way for
pcie_retrain_link() to determine the link is up (Maciej W. Rozycki)
- Work around link training failures (especially on the ASMedia ASM2824
switch) by training first at 2.5GT/s and then attempting higher rates
(Maciej W. Rozycki)
* pci/enumeration:
PCI: Add failed link recovery for device reset events
PCI: Work around PCIe link training failures
PCI: Use pcie_wait_for_link_status() in pcie_wait_for_link_delay()
PCI: Add support for polling DLLLA to pcie_retrain_link()
PCI: Export pcie_retrain_link() for use outside ASPM
PCI: Export PCIe link retrain timeout
PCI: Execute quirk_enable_clear_retrain_link() earlier
PCI/ASPM: Factor out waiting for link training to complete
PCI/ASPM: Avoid unnecessary pcie_link_state use
PCI/ASPM: Use distinct local vars in pcie_retrain_link()
net/mlx5: Rely on dev->link_active_reporting
powerpc/eeh: Rely on dev->link_active_reporting
PCI: pciehp: Rely on dev->link_active_reporting
PCI: Initialize dev->link_active_reporting earlier
PCI: of: Propagate firmware node by calling device_set_node()
PCI: Add PCI_EXT_CAP_ID_PL_32GT define
# Conflicts:
# drivers/pci/pcie/aspm.c
Request failed link recovery with any upstream PCIe bridge where a device
has not come back after reset within PCI_RESET_WAIT time. Reset the
polling interval if recovery succeeded, otherwise continue as usual.
[bhelgaas: inline pcie_parent_link_retrain()]
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2306111631050.64925@angie.orcam.me.uk
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Attempt to handle cases such as with a downstream port of the ASMedia
ASM2824 PCIe switch where link training never completes and the link
continues switching between speeds indefinitely with the data link layer
never reaching the active state.
It has been observed with a downstream port of the ASMedia ASM2824 Gen 3
switch wired to the upstream port of the Pericom PI7C9X2G304 Gen 2 switch,
using a Delock Riser Card PCI Express x1 > 2 x PCIe x1 device, P/N 41433,
wired to a SiFive HiFive Unmatched board. In this setup the switches
should negotiate a link speed of 5.0GT/s, falling back to 2.5GT/s if
necessary.
Instead the link continues oscillating between the two speeds, at the rate
of 34-35 times per second, with link training reported repeatedly active
~84% of the time. Limiting the target link speed to 2.5GT/s with the
upstream ASM2824 device makes the two switches communicate correctly.
Removing the speed restriction afterwards makes the two devices switch to
5.0GT/s then.
Make use of these observations and detect the inability to train the link
by checking for the Data Link Layer Link Active status bit being off while
the Link Bandwidth Management Status indicating that hardware has changed
the link speed or width in an attempt to correct unreliable link operation.
Restrict the speed to 2.5GT/s then with the Target Link Speed field,
request a retrain and wait 200ms for the data link to go up. If this is
successful, lift the restriction, letting the devices negotiate a higher
speed.
Also check for a 2.5GT/s speed restriction the firmware may have already
arranged and lift it too with ports of devices known to continue working
afterwards (currently only ASM2824), that already report their data link
being up.
[bhelgaas: reorder and squash stubs from
https://lore.kernel.org/r/alpine.DEB.2.21.2306111619570.64925@angie.orcam.me.uk
to avoid adding stubs that do nothing]
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2203022037020.56670@angie.orcam.me.uk/
Link: https://source.denx.de/u-boot/u-boot/-/commit/a398a51ccc68
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2305310038540.59226@angie.orcam.me.uk
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>