Commit Graph

1351133 Commits

Author SHA1 Message Date
Thomas Gleixner
6a08164de9 Merge irq/cleanup fragments into irq/msi
Pick up the PCI changes to avoid conflicts.
2025-05-16 21:24:38 +02:00
Marc Zyngier
7dd20bf2f0 irqchip/gic-v3-its: Use allocation size from the prepare call
Now that .msi_prepare() gets called at the right time and not
with semi-random parameters, remove the ugly hack that tried
to fix up the number of allocated vectors.

It is now correct by construction.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250513163144.2215824-6-maz@kernel.org
2025-05-14 12:36:42 +02:00
Marc Zyngier
03c298760e genirq/msi: Engage the .msi_teardown() callback on domain removal
Kindly inform the MSI driver that the domain is torn down, providing the
allocation context previously populated on domain creation.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250513163144.2215824-5-maz@kernel.org
2025-05-14 12:36:42 +02:00
Marc Zyngier
1396e89e09 genirq/msi: Move prepare() call to per-device allocation
The current device MSI infrastructure is subtly broken, as it will issue an
.msi_prepare() callback into the MSI controller driver every time it needs
to allocate an MSI. That's pretty wrong, as the contract (or unwarranted
assumption, depending who you ask) between the MSI controller and the core
code is that .msi_prepare() is called exactly once per device.

This leads to some subtle breakage in some MSI controller drivers, as it
gives the impression that there are multiple endpoints sharing a bus
identifier (RID in PCI parlance, DID for GICv3+). It implies that whatever
allocation the ITS driver (for example) has done on behalf of these devices
cannot be undone, as there is no way to track the shared state. This is
particularly bad for wire-MSI devices, for which .msi_prepare() is called
for each input line.

To address this issue, move the call to .msi_prepare() to take place at the
point of irq domain allocation, which is the only place that makes
sense. The msi_alloc_info_t structure is made part of the
msi_domain_template, so that its life-cycle is that of the domain as well.

Finally, the msi_info::alloc_data field is made to point at this allocation
tracking structure, ensuring that it is carried around the block.

This is all pretty straightforward, except for the non-device-MSI
leftovers, which still have to call .msi_prepare() at the old spot. One
day...

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250513163144.2215824-4-maz@kernel.org
2025-05-14 12:36:41 +02:00
Marc Zyngier
713335b6ee irqchip/gic-v3-its: Implement .msi_teardown() callback
The ITS driver currently nukes the structure representing an endpoint
device translating via an ITS on freeing the last LPI allocated for it.

That's an unfortunate state of affair, as it is pretty common for a driver
to allocate a single MSI, do something clever, teardown this MSI, and
reallocate a whole bunch of them. The NVME driver does exactly that,
amongst others.

What happens in that case is that the core code is accidentaly issuing
another .msi_prepare() call, even if it shouldn't.  This luckily cancels
the above behaviour and hides the problem.

In order to fix the core code, start by implementing the new
.msi_teardown() callback. Nothing calls it yet, so a side effect is that
the its_dev structure will not be freed and that the DID will stay
mapped. Not a big deal, and this will be solved in following patches.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250513163144.2215824-3-maz@kernel.org
2025-05-14 12:36:41 +02:00
Marc Zyngier
28026cf2dd genirq/msi: Add .msi_teardown() callback as the reverse of .msi_prepare()
While the MSI ops do have a .msi_prepare() callback that is responsible for
setting up the relevant (usually per-device) allocation, there is no
callback reversing this setup.

For this purpose, add .msi_teardown() callback.

In order to avoid breaking the ITS driver that suffers from related issues,
do not call the callback just yet.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250513163144.2215824-2-maz@kernel.org
2025-05-14 12:36:41 +02:00
Thomas Gleixner
a1d8a83093 Merge branch 'irq/platform-msi' into irq/msi
Pull in the platform MSI/GIC changes which are seperate for the PCI
endpoint driver updates.
2025-05-08 19:17:16 +02:00
Frank Li
f1680d9081 irqchip/gic-v3-its: Add support for device tree msi-map and msi-mask
Some platform devices create child devices dynamically and require the
parent device's msi-map to map device IDs to actual sideband information.

A typical use case is using ITS as a PCIe Endpoint Controller(EPC)'s
doorbell function, where PCI hosts send TLP memory writes to the EP
controller. The EP controller converts these writes to AXI transactions
and appends platform-specific sideband information.

EPC's DTS will provide such information by msi-map and msi-mask. A
simplified dts as

pcie-ep@10000000 {
	...
	msi-map = <0 &its 0xc 8>;
                          ^^^ 0xc is implement defined sideband information,
			      which append to AXI write transaction.
	           ^ 0 is function index.

	msi-mask = <0x7>
}

Check msi-map if msi-parent missed to keep compatility with existing systems.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250414-ep-msi-v18-5-f69b49917464@nxp.com
2025-05-07 17:49:24 +02:00
Frank Li
a6aed6b9c7 dt-bindings: PCI: pci-ep: Add support for iommu-map and msi-map
Document the use of (msi|iommu)-map for PCI Endpoint (EP) controllers,
which can use MSI as a doorbell mechanism. Each EP controller can support
up to 8 physical functions and 65,536 virtual functions.

Define how to construct device IDs using function bits [2:0] and virtual
function index bits [31:3], enabling (msi|iommu)-map to associate each
child device with a specific (msi|iommu)-specifier.

The EP cannot rely on PCI Requester ID (RID) because the RID is determined
by the PCI topology of the host system. Since the EP may be connected to
different PCI hosts, the RID can vary between systems and is therefore not
a reliable identifier.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://lore.kernel.org/all/20250414-ep-msi-v18-4-f69b49917464@nxp.com
2025-05-07 17:49:00 +02:00
Frank Li
fd120c38fe irqchip/gic-v3-its: Set IRQ_DOMAIN_FLAG_MSI_IMMUTABLE for ITS
Set the IRQ_DOMAIN_FLAG_MSI_IMMUTABLE flag for ITS, as it does not change
the address/data pair after setup.

Ensure compatibility with MSI users, such as PCIe Endpoint Doorbell, which
require the address/data pair to remain unchanged. Enable PCIe endpoints to
use ITS for triggering doorbells from the PCIe Root Complex (RC) side.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250414-ep-msi-v18-3-f69b49917464@nxp.com
2025-05-07 17:49:00 +02:00
Frank Li
b8c7bfb7a0 irqdomain: Add IRQ_DOMAIN_FLAG_MSI_IMMUTABLE and irq_domain_is_msi_immutable()
Add the flag IRQ_DOMAIN_FLAG_MSI_IMMUTABLE and the API function
irq_domain_is_msi_immutable() to check if the MSI controller retains an
immutable address/data pair during irq_set_affinity().

Ensure compatibility with MSI users like PCIe Endpoint Doorbell, which
require the address/data pair to remain unchanged after setup. Use this
function to verify if the MSI controller is immutable.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250414-ep-msi-v18-2-f69b49917464@nxp.com
2025-05-07 17:49:00 +02:00
Frank Li
9a958e1fd4 platform-msi: Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all()
platform_device_msi_init_and_alloc_irqs() performs two tasks: allocating
the MSI domain for a platform device, and allocate a number of MSIs in that
domain.

platform_device_msi_free_irqs_all() only frees the MSIs, and leaves the MSI
domain alive.

Given that platform_device_msi_init_and_alloc_irqs() is the sole tool a
platform device has to allocate platform MSIs, it makes sense for
platform_device_msi_free_irqs_all() to teardown the MSI domain at the same
time as the MSIs.

This avoids warnings and unexpected behaviours when a driver repeatedly
allocates and frees MSIs.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/all/20250414-ep-msi-v18-1-f69b49917464@nxp.com
2025-05-07 17:49:00 +02:00
Thomas Gleixner
9357e329cd genirq/msi: Rename msi_[un]lock_descs()
Now that all abuse is gone and the legit users are converted to
guard(msi_descs_lock), rename the lock functions and document them as
internal.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huwei.com>
Link: https://lore.kernel.org/all/20250319105506.864699741@linutronix.de
2025-04-09 20:47:30 +02:00
Thomas Gleixner
e46a28cea2 scsi: ufs: qcom: Remove the MSI descriptor abuse
The driver abuses the MSI descriptors for internal purposes. Aside of core
code and MSI providers nothing has to care about their existence. They have
been encapsulated with a lot of effort because this kind of abuse caused
all sorts of issues including a maintainability nightmare.

Rewrite the code so it uses dedicated storage to hand the required
information to the interrupt handler and use a custom cleanup function to
simplify the error path.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Link: https://lore.kernel.org/all/20250319105506.805529593@linutronix.de
2025-04-09 20:47:30 +02:00
Thomas Gleixner
71296eae58 PCI/TPH: Replace the broken MSI-X control word update
The driver walks the MSI descriptors to test whether a descriptor exists
for a given index. That's just abuse of the MSI internals.

The same test can be done with a single function call by looking up whether
there is a Linux interrupt number assigned at the index.

What's worse is that the function is completely unserialized against
modifications of the MSI-X control by operations issued from the interrupt
core. It also brings the PCI/MSI-X internal cached control word out of
sync.

Remove the trainwreck and invoke the function provided by the PCI/MSI core
to update it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250319105506.744271447@linutronix.de
2025-04-09 20:47:30 +02:00
Thomas Gleixner
d5124a9957 PCI/MSI: Provide a sane mechanism for TPH
The PCI/TPH driver fiddles with the MSI-X control word of an active
interrupt completely unserialized against concurrent operations issued
from the interrupt core. It also brings the PCI/MSI-X internal cached
control word out of sync.

Provide a function, which has the required serialization and keeps the
control word cache in sync.

Unfortunately this requires to look up and lock the interrupt descriptor,
which should be only done in the interrupt core code. But confining this
particular oddity in the PCI/MSI core is the lesser of all evil. A
interrupt core implementation would require a larger pile of infrastructure
and indirections for dubious value.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250319105506.683663807@linutronix.de
2025-04-09 20:47:30 +02:00
Thomas Gleixner
6552e90e2a PCI: hv: Switch MSI descriptor locking to guard()
Convert the code to use the new guard(msi_descs_lock).

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250319105506.624838146@linutronix.de
2025-04-09 20:47:30 +02:00
Thomas Gleixner
891146645e PCI/MSI: Switch msix_capability_init() to guard(msi_desc_lock)
Split the lock protected functionality of msix_capability_init() out into a
helper function and use guard(msi_desc_lock) to replace the lock/unlock
pair.

Simplify the error path in the helper function by utilizing a custom
cleanup to get rid of the remaining gotos.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250319105506.564105011@linutronix.de
2025-04-09 20:47:30 +02:00
Thomas Gleixner
f11cc2af8f PCI/MSI: Switch msi_capability_init() to guard(msi_desc_lock)
Split the lock protected functionality of msi_capability_init() out into a
helper function and use guard(msi_desc_lock) to replace the lock/unlock
pair.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250319105506.504992208@linutronix.de
2025-04-09 20:47:30 +02:00
Thomas Gleixner
5c0ba4f9d2 PCI/MSI: Use __free() for affinity masks
Let cleanup handle the freeing of the affinity mask. That prepares for
switching the MSI descriptor locking to a guard().

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250319105506.444764312@linutronix.de
2025-04-09 20:47:30 +02:00
Thomas Gleixner
b0c44a5ec3 PCI/MSI: Set pci_dev:: Msi_enabled late
The comment claiming that pci_dev::msi_enabled has to be set across setup
is a leftover from ancient code versions. Nothing in the setup code
requires the flag to be set anymore.

Set it in the success path and remove the extra goto label.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250319105506.383222333@linutronix.de
2025-04-09 20:47:29 +02:00
Thomas Gleixner
497f68cff6 PCI/MSI: Use guard(msi_desc_lock) where applicable
Convert the trivial cases of msi_desc_lock/unlock() pairs.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250319105506.322536126@linutronix.de
2025-04-09 20:47:29 +02:00
Thomas Gleixner
8f3315cf7e NTB/msi: Switch MSI descriptor locking to lock guard()
Convert the code to use the new guard(msi_descs_lock).

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/all/20250319105506.263456735@linutronix.de
2025-04-09 20:47:29 +02:00
Thomas Gleixner
f25dd9ac48 soc: ti: ti_sci_inta_msi: Switch MSI descriptor locking to guard()
Convert the code to use the new guard(msi_descs_lock).

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nishanth Menon <nm@ti.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Link: https://lore.kernel.org/all/20250319105506.203802081@linutronix.de
2025-04-09 20:47:29 +02:00
Thomas Gleixner
0dac2b0930 genirq/msi: Use lock guards for MSI descriptor locking
Provide a lock guard for MSI descriptor locking and update the core code
accordingly.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/all/20250319105506.144672678@linutronix.de
2025-04-09 20:47:29 +02:00
Thomas Gleixner
092d00ead7 cleanup: Provide retain_and_null_ptr()
In cases where an allocation is consumed by another function, the
allocation needs to be retained on success or freed on failure. The code
pattern is usually:

	struct foo *f = kzalloc(sizeof(*f), GFP_KERNEL);
	struct bar *b;

	,,,
	// Initialize f
	...
	if (ret)
		goto free;
        ...
	bar = bar_create(f);
	if (!bar) {
		ret = -ENOMEM;
	   	goto free;
	}
	...
	return 0;
free:
	kfree(f);
	return ret;

This prevents using __free(kfree) on @f because there is no canonical way
to tell the cleanup code that the allocation should not be freed.

Abusing no_free_ptr() by force ignoring the return value is not really a
sensible option either.

Provide an explicit macro retain_and_null_ptr(), which NULLs the cleanup
pointer. That makes it easy to analyze and reason about.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Link: https://lore.kernel.org/all/20250319105506.083538907@linutronix.de
2025-04-09 20:47:29 +02:00
Jiri Slaby (SUSE)
fdc348121f irqdomain: pci: Switch to of_fwnode_handle()
of_node_to_fwnode() is irqdomain's reimplementation of the "officially"
defined of_fwnode_handle(). The former is in the process of being removed,
so use the latter instead.

Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250319092951.37667-8-jirislaby@kernel.org
2025-04-07 12:15:14 -05:00
Linus Torvalds
0af2f6be1b Linux 6.15-rc1 v6.15-rc1 2025-04-06 13:11:33 -07:00
Thomas Weißschuh
0efdedb335 tools/include: make uapi/linux/types.h usable from assembly
The "real" linux/types.h UAPI header gracefully degrades to a NOOP when
included from assembly code.

Mirror this behaviour in the tools/ variant.

Test for __ASSEMBLER__ over __ASSEMBLY__ as the former is provided by the
toolchain automatically.

Reported-by: Mark Brown <broonie@kernel.org>
Closes: https://lore.kernel.org/lkml/af553c62-ca2f-4956-932c-dd6e3a126f58@sirena.org.uk/
Fixes: c9fbaa8795 ("selftests: vDSO: parse_vdso: Use UAPI headers instead of libc headers")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Link: https://patch.msgid.link/20250321-uapi-consistency-v1-1-439070118dc0@linutronix.de
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-04-06 12:55:31 -07:00
Linus Torvalds
710329254d Merge tag 'turbostat-2025.05.06' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull turbostat updates from Len Brown:

 - support up to 8192 processors

 - add cpuidle governor debug telemetry, disabled by default

 - update default output to exclude cpuidle invocation counts

 - bug fixes

* tag 'turbostat-2025.05.06' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  tools/power turbostat: v2025.05.06
  tools/power turbostat: disable "cpuidle" invocation counters, by default
  tools/power turbostat: re-factor sysfs code
  tools/power turbostat: Restore GFX sysfs fflush() call
  tools/power turbostat: Document GNR UncMHz domain convention
  tools/power turbostat: report CoreThr per measurement interval
  tools/power turbostat: Increase CPU_SUBSET_MAXCPUS to 8192
  tools/power turbostat: Add idle governor statistics reporting
  tools/power turbostat: Fix names matching
  tools/power turbostat: Allow Zero return value for some RAPL registers
  tools/power turbostat: Clustered Uncore MHz counters should honor show/hide options
2025-04-06 12:32:43 -07:00
Linus Torvalds
59f392fa7c Merge tag 'soundwire-6.15-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire
Pull soundwire fix from Vinod Koul:

 - add missing config symbol CONFIG_SND_HDA_EXT_CORE required for asoc
   driver CONFIG_SND_SOF_SOF_HDA_SDW_BPT

* tag 'soundwire-6.15-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
  ASoC: SOF: Intel: Let SND_SOF_SOF_HDA_SDW_BPT select SND_HDA_EXT_CORE
2025-04-06 12:04:53 -07:00
Len Brown
03e00e373c tools/power turbostat: v2025.05.06
Support up to 8192 processors
Add cpuidle governor debug telemetry, disabled by default
Update default output to exclude cpuidle invocation counts
Bug fixes

Signed-off-by: Len Brown <len.brown@intel.com>
2025-04-06 14:49:20 -04:00
Len Brown
ec4acd3166 tools/power turbostat: disable "cpuidle" invocation counters, by default
Create "pct_idle" counter group, the sofware notion of residency
so it can now be singled out, independent of other counter groups.

Create "cpuidle" group, the cpuidle invocation counts.
Disable "cpuidle", by default.

Create "swidle" = "cpuidle" + "pct_idle".
Undocument "sysfs", the old name for "swidle", but keep it working
for backwards compatibilty.

Create "hwidle", all the HW idle counters

Modify "idle", enabled by default
"idle" = "hwidle" + "pct_idle" (and now excludes "cpuidle")

Signed-off-by: Len Brown <len.brown@intel.com>
2025-04-06 14:29:57 -04:00
Linus Torvalds
dda8887894 Merge tag 'perf-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf event fix from Ingo Molnar:
 "Fix a perf events time accounting bug"

* tag 'perf-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Fix child_total_time_enabled accounting bug at task exit
2025-04-06 10:48:12 -07:00
Linus Torvalds
302deb109d Merge tag 'sched-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:

 - Fix a nonsensical Kconfig combination

 - Remove an unnecessary rseq-notification

* tag 'sched-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rseq: Eliminate useless task_work on execve
  sched/isolation: Make CONFIG_CPU_ISOLATION depend on CONFIG_SMP
2025-04-06 10:44:58 -07:00
Linus Torvalds
6f110a5e4f Disable SLUB_TINY for build testing
... and don't error out so hard on missing module descriptions.

Before commit 6c6c1fc09d ("modpost: require a MODULE_DESCRIPTION()")
we used to warn about missing module descriptions, but only when
building with extra warnigns (ie 'W=1').

After that commit the warning became an unconditional hard error.

And it turns out not all modules have been converted despite the claims
to the contrary.  As reported by Damian Tometzki, the slub KUnit test
didn't have a module description, and apparently nobody ever really
noticed.

The reason nobody noticed seems to be that the slub KUnit tests get
disabled by SLUB_TINY, which also ends up disabling a lot of other code,
both in tests and in slub itself.  And so anybody doing full build tests
didn't actually see this failre.

So let's disable SLUB_TINY for build-only tests, since it clearly ends
up limiting build coverage.  Also turn the missing module descriptions
error back into a warning, but let's keep it around for non-'W=1'
builds.

Reported-by: Damian Tometzki <damian@riscv-rocks.de>
Link: https://lore.kernel.org/all/01070196099fd059-e8463438-7b1b-4ec8-816d-173874be9966-000000@eu-central-1.amazonses.com/
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
Fixes: 6c6c1fc09d ("modpost: require a MODULE_DESCRIPTION()")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-04-06 10:00:04 -07:00
Len Brown
994633894f tools/power turbostat: re-factor sysfs code
Probe cpuidle "sysfs" residency and counts separately,
since soon we will make one disabled on, and the
other disabled off.

Clarify that some BIC (build-in-counters) are actually "groups".
since we're about to re-name some of those groups.

no functional change.

Signed-off-by: Len Brown <len.brown@intel.com>
2025-04-06 12:53:18 -04:00
Zhang Rui
f8b136ef26 tools/power turbostat: Restore GFX sysfs fflush() call
Do fflush() to discard the buffered data, before each read of the
graphics sysfs knobs.

Fixes: ba99a4fc8c ("tools/power turbostat: Remove unnecessary fflush() call")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2025-04-06 12:36:03 -04:00
Len Brown
3ae8508663 tools/power turbostat: Document GNR UncMHz domain convention
Document that on Intel Granite Rapids Systems,
Uncore domains 0-2 are CPU domains, and
uncore domains 3-4 are IO domains.

Signed-off-by: Len Brown <len.brown@intel.com>
2025-04-06 12:31:59 -04:00
Len Brown
f729775f79 tools/power turbostat: report CoreThr per measurement interval
The CoreThr column displays total thermal throttling events
since boot time.

Change it to report events during the measurement interval.

This is more useful for showing a user the current conditions.
Total events since boot time are still available to the user via
/sys/devices/system/cpu/cpu*/thermal_throttle/*

Document CoreThr on turbostat.8

Fixes: eae97e053f ("turbostat: Support thermal throttle count print")
Reported-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Cc: Chen Yu <yu.c.chen@intel.com>
2025-04-06 12:21:25 -04:00
Justin Ernst
eb187540d1 tools/power turbostat: Increase CPU_SUBSET_MAXCPUS to 8192
On systems with >= 1024 cpus (in my case 1152), turbostat fails with the error output:
	"turbostat: /sys/fs/cgroup/cpuset.cpus.effective: cpu str malformat 0-1151"

A similar error appears with the use of turbostat --cpu when the inputted cpu
range contains a cpu number >= 1024:
	# turbostat -c 1100-1151
	"--cpu 1100-1151" malformed
	...

Both errors are caused by parse_cpu_str() reaching its limit of CPU_SUBSET_MAXCPUS.

It's a good idea to limit the maximum cpu number being parsed, but 1024 is too low.
For a small increase in compute and allocated memory, increasing CPU_SUBSET_MAXCPUS
brings support for parsing cpu numbers >= 1024.

Increase CPU_SUBSET_MAXCPUS to 8192, a common setting for CONFIG_NR_CPUS on x86_64.

Signed-off-by: Justin Ernst <justin.ernst@hpe.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2025-04-06 12:14:14 -04:00
Linus Torvalds
16cd1c2657 Merge tag 'timers-cleanups-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer cleanups from Thomas Gleixner:
 "A set of final cleanups for the timer subsystem:

   - Convert all del_timer[_sync]() instances over to the new
     timer_delete[_sync]() API and remove the legacy wrappers.

     Conversion was done with coccinelle plus some manual fixups as
     coccinelle chokes on scoped_guard().

   - The final cleanup of the hrtimer_init() to hrtimer_setup()
     conversion.

     This has been delayed to the end of the merge window, so that all
     patches which have been merged through other trees are in mainline
     and all new users are catched.

  Doing this right before rc1 ensures that new code which is merged post
  rc1 is not introducing new instances of the original functionality"

* tag 'timers-cleanups-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tracing/timers: Rename the hrtimer_init event to hrtimer_setup
  hrtimers: Rename debug_init_on_stack() to debug_setup_on_stack()
  hrtimers: Rename debug_init() to debug_setup()
  hrtimers: Rename __hrtimer_init_sleeper() to __hrtimer_setup_sleeper()
  hrtimers: Remove unnecessary NULL check in hrtimer_start_range_ns()
  hrtimers: Make callback function pointer private
  hrtimers: Merge __hrtimer_init() into __hrtimer_setup()
  hrtimers: Switch to use __htimer_setup()
  hrtimers: Delete hrtimer_init()
  treewide: Convert new and leftover hrtimer_init() users
  treewide: Switch/rename to timer_delete[_sync]()
2025-04-06 08:35:37 -07:00
Linus Torvalds
ff0c66685d Merge tag 'irq-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more irq updates from Thomas Gleixner:
 "A set of updates for the interrupt subsystem:

   - A treewide cleanup for the irq_domain code, which makes the naming
     consistent and gets rid of the original oddity of naming domains
     'host'.

     This is a trivial mechanical change and is done late to ensure that
     all instances have been catched and new code merged post rc1 wont
     reintroduce new instances.

   - A trivial consistency fix in the migration code

     The recent introduction of irq_force_complete_move() in the core
     code, causes a problem for the nostalgia crowd who maintains ia64
     out of tree.

     The code assumes that hierarchical interrupt domains are enabled
     and dereferences irq_data::parent_data unconditionally. That works
     in mainline because both architectures which enable that code have
     hierarchical domains enabled. Though it breaks the ia64 build,
     which enables the functionality, but does not have hierarchical
     domains.

     While it's not really a problem for mainline today, this
     unconditional dereference is inconsistent and trivially fixable by
     using the existing helper function irqd_get_parent_data(), which
     has the appropriate #ifdeffery in place"

* tag 'irq-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq/migration: Use irqd_get_parent_data() in irq_force_complete_move()
  irqdomain: Stop using 'host' for domain
  irqdomain: Rename irq_get_default_host() to irq_get_default_domain()
  irqdomain: Rename irq_set_default_host() to irq_set_default_domain()
2025-04-06 08:17:43 -07:00
Linus Torvalds
a91c49517d Merge tag 'timers-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
 "A revert to fix a adjtimex() regression:

  The recent change to prevent that time goes backwards for the coarse
  time getters due to immediate multiplier adjustments via adjtimex(),
  changed the way how the timekeeping core treats that.

  That change result in a regression on the adjtimex() side, which is
  user space visible:

   1) The forwarding of the base time moves the update out of the
      original period and establishes a new one. That's changing the
      behaviour of the [PF]LL control, which user space expects to be
      applied periodically.

   2) The clearing of the accumulated NTP error due to #1, changes the
      behaviour as well.

  An attempt to delay the multiplier/frequency update to the next tick
  did not solve the problem as userspace expects that the multiplier or
  frequency updates are in effect, when the syscall returns.

  There is a different solution for the coarse time problem available,
  so revert the offending commit to restore the existing adjtimex()
  behaviour"

* tag 'timers-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "timekeeping: Fix possible inconsistencies in _COARSE clockids"
2025-04-06 08:13:16 -07:00
Linus Torvalds
1f80fbac0b Merge tag 'sh-for-v6.15-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux
Pull sh updates from John Paul Adrian Glaubitz:
 "One important fix and one small configuration update.

  The first patch by Artur Rojek fixes an issue with the J2 firmware
  loader not being able to find the location of the device tree blob due
  to insufficient alignment of the .bss section which rendered J2 boards
  unbootable.

  The second patch by Johan Korsnes updates the defconfigs on sh to drop
  the CONFIG_NET_CLS_TCINDEX configuration option which became obsolete
  after 8c710f7525 ("net/sched: Retire tcindex classifier").

  Summary:

   - sh: defconfig: Drop obsolete CONFIG_NET_CLS_TCINDEX

   - sh: Align .bss section padding to 8-byte boundary"

* tag 'sh-for-v6.15-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
  sh: defconfig: Drop obsolete CONFIG_NET_CLS_TCINDEX
  sh: Align .bss section padding to 8-byte boundary
2025-04-06 08:10:45 -07:00
Linus Torvalds
f4d2ef4825 Merge tag 'kbuild-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:

 - Improve performance in gendwarfksyms

 - Remove deprecated EXTRA_*FLAGS and KBUILD_ENABLE_EXTRA_GCC_CHECKS

 - Support CONFIG_HEADERS_INSTALL for ARCH=um

 - Use more relative paths to sources files for better reproducibility

 - Support the loong64 Debian architecture

 - Add Kbuild bash completion

 - Introduce intermediate vmlinux.unstripped for architectures that need
   static relocations to be stripped from the final vmlinux

 - Fix versioning in Debian packages for -rc releases

 - Treat missing MODULE_DESCRIPTION() as an error

 - Convert Nios2 Makefiles to use the generic rule for built-in DTB

 - Add debuginfo support to the RPM package

* tag 'kbuild-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (40 commits)
  kbuild: rpm-pkg: build a debuginfo RPM
  kconfig: merge_config: use an empty file as initfile
  nios2: migrate to the generic rule for built-in DTB
  rust: kbuild: skip `--remap-path-prefix` for `rustdoc`
  kbuild: pacman-pkg: hardcode module installation path
  kbuild: deb-pkg: don't set KBUILD_BUILD_VERSION unconditionally
  modpost: require a MODULE_DESCRIPTION()
  kbuild: make all file references relative to source root
  x86: drop unnecessary prefix map configuration
  kbuild: deb-pkg: add comment about future removal of KDEB_COMPRESS
  kbuild: Add a help message for "headers"
  kbuild: deb-pkg: remove "version" variable in mkdebian
  kbuild: deb-pkg: fix versioning for -rc releases
  Documentation/kbuild: Fix indentation in modules.rst example
  x86: Get rid of Makefile.postlink
  kbuild: Create intermediate vmlinux build with relocations preserved
  kbuild: Introduce Kconfig symbol for linking vmlinux with relocations
  kbuild: link-vmlinux.sh: Make output file name configurable
  kbuild: do not generate .tmp_vmlinux*.map when CONFIG_VMLINUX_MAP=y
  Revert "kheaders: Ignore silly-rename files"
  ...
2025-04-05 15:46:50 -07:00
Linus Torvalds
758e4c86a1 Merge tag 'drm-next-2025-04-05' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
 "Weekly fixes, mostly from the end of last week, this week was very
  quiet, maybe you scared everyone away. It's mostly amdgpu, and xe,
  with some i915, adp and bridge bits, since I think this is overly
  quiet I'd expect rc2 to be a bit more lively.

  bridge:
   - tda998x: Select CONFIG_DRM_KMS_HELPER

  amdgpu:
   - Guard against potential division by 0 in fan code
   - Zero RPM support for SMU 14.0.2
   - Properly handle SI and CIK support being disabled
   - PSR fixes
   - DML2 fixes
   - DP Link training fix
   - Vblank fixes
   - RAS fixes
   - Partitioning fix
   - SDMA fix
   - SMU 13.0.x fixes
   - Rom fetching fix
   - MES fixes
   - Queue reset fix

  xe:
   - Fix NULL pointer dereference on error path
   - Add missing HW workaround for BMG
   - Fix survivability mode not triggering
   - Fix build warning when DRM_FBDEV_EMULATION is not set

  i915:
   - Bounds check for scalers in DSC prefill latency computation
   - Fix build by adding a missing include

  adp:
   - Fix error handling in plane setup"

  # -----BEGIN PGP SIGNATURE-----

* tag 'drm-next-2025-04-05' of https://gitlab.freedesktop.org/drm/kernel: (34 commits)
  drm/i2c: tda998x: select CONFIG_DRM_KMS_HELPER
  drm/amdgpu/gfx12: fix num_mec
  drm/amdgpu/gfx11: fix num_mec
  drm/amd/pm: Add gpu_metrics_v1_8
  drm/amdgpu: Prefer shadow rom when available
  drm/amd/pm: Update smu metrics table for smu_v13_0_6
  drm/amd/pm: Remove host limit metrics support
  Remove unnecessary firmware version check for gc v9_4_2
  drm/amdgpu: stop unmapping MQD for kernel queues v3
  Revert "drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA"
  drm/amdgpu: Parse all deferred errors with UMC aca handle
  drm/amdgpu: Update ta ras block
  drm/amdgpu: Add NPS2 to DPX compatible mode
  drm/amdgpu: Use correct gfx deferred error count
  drm/amd/display: Actually do immediate vblank disable
  drm/amd/display: prevent hang on link training fail
  Revert "drm/amd/display: dml2 soc dscclk use DPM table clk setting"
  drm/amd/display: Increase vblank offdelay for PSR panels
  drm/amd: Handle being compiled without SI or CIK support better
  drm/amd/pm: Add zero RPM enabled OD setting support for SMU14.0.2
  ...
2025-04-05 15:35:11 -07:00
Uday Shankar
a7c699d090 kbuild: rpm-pkg: build a debuginfo RPM
The rpm-pkg make target currently suffers from a few issues related to
debuginfo:
1. debuginfo for things built into the kernel (vmlinux) is not available
   in any RPM produced by make rpm-pkg. This makes using tools like
   systemtap against a make rpm-pkg kernel impossible.
2. debug source for the kernel is not available. This means that
   commands like 'disas /s' in gdb, which display source intermixed with
   assembly, can only print file names/line numbers which then must be
   painstakingly resolved to actual source in a separate editor.
3. debuginfo for modules is available, but it remains bundled with the
   .ko files that contain module code, in the main kernel RPM. This is a
   waste of space for users who do not need to debug the kernel (i.e.
   most users).

Address all of these issues by additionally building a debuginfo RPM
when the kernel configuration allows for it, in line with standard
patterns followed by RPM distributors. With these changes:
1. systemtap now works (when these changes are backported to 6.11, since
   systemtap lags a bit behind in compatibility), as verified by the
   following simple test script:

   # stap -e 'probe kernel.function("do_sys_open").call { printf("%s\n", $$parms); }'
   dfd=0xffffffffffffff9c filename=0x7fe18800b160 flags=0x88800 mode=0x0
   ...

2. disas /s works correctly in gdb, with source and disassembly
   interspersed:

   # gdb vmlinux --batch -ex 'disas /s blk_op_str'
   Dump of assembler code for function blk_op_str:
   block/blk-core.c:
   125     {
      0xffffffff814c8740 <+0>:     endbr64

   127
   128             if (op < ARRAY_SIZE(blk_op_name) && blk_op_name[op])
      0xffffffff814c8744 <+4>:     mov    $0xffffffff824a7378,%rax
      0xffffffff814c874b <+11>:    cmp    $0x23,%edi
      0xffffffff814c874e <+14>:    ja     0xffffffff814c8768 <blk_op_str+40>
      0xffffffff814c8750 <+16>:    mov    %edi,%edi

   126             const char *op_str = "UNKNOWN";
      0xffffffff814c8752 <+18>:    mov    $0xffffffff824a7378,%rdx

   127
   128             if (op < ARRAY_SIZE(blk_op_name) && blk_op_name[op])
      0xffffffff814c8759 <+25>:    mov    -0x7dfa0160(,%rdi,8),%rax

   126             const char *op_str = "UNKNOWN";
      0xffffffff814c8761 <+33>:    test   %rax,%rax
      0xffffffff814c8764 <+36>:    cmove  %rdx,%rax

   129                     op_str = blk_op_name[op];
   130
   131             return op_str;
   132     }
      0xffffffff814c8768 <+40>:    jmp    0xffffffff81d01360 <__x86_return_thunk>
   End of assembler dump.

3. The size of the main kernel package goes down substantially,
   especially if many modules are built (quite typical). Here is a
   comparison of installed size of the kernel package (configured with
   allmodconfig, dwarf4 debuginfo, and module compression turned off)
   before and after this patch:

   # rpm -qi kernel-6.13* | grep -E '^(Version|Size)'
   Version     : 6.13.0postpatch+
   Size        : 1382874089
   Version     : 6.13.0prepatch+
   Size        : 17870795887

   This is a ~92% size reduction.

Note that a debuginfo package can only be produced if the following
configs are set:
- CONFIG_DEBUG_INFO=y
- CONFIG_MODULE_COMPRESS=n
- CONFIG_DEBUG_INFO_SPLIT=n

The first of these is obvious - we can't produce debuginfo if the build
does not generate it. The second two requirements can in principle be
removed, but doing so is difficult with the current approach, which uses
a generic rpmbuild script find-debuginfo.sh that processes all packaged
executables. If we want to remove those requirements the best path
forward is likely to add some debuginfo extraction/installation logic to
the modules_install target (controllable by flags). That way, it's
easier to operate on modules before they're compressed, and the logic
can be reused by all packaging targets.

Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-04-06 06:22:01 +09:00
Daniel Gomez
a26fe287ee kconfig: merge_config: use an empty file as initfile
The scripts/kconfig/merge_config.sh script requires an existing
$INITFILE (or the $1 argument) as a base file for merging Kconfig
fragments. However, an empty $INITFILE can serve as an initial starting
point, later referenced by the KCONFIG_ALLCONFIG Makefile variable
if -m is not used. This variable can point to any configuration file
containing preset config symbols (the merged output) as stated in
Documentation/kbuild/kconfig.rst. When -m is used $INITFILE will
contain just the merge output requiring the user to run make (i.e.
KCONFIG_ALLCONFIG=<$INITFILE> make <allnoconfig/alldefconfig> or make
olddefconfig).

Instead of failing when `$INITFILE` is missing, create an empty file and
use it as the starting point for merges.

Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-04-06 06:22:01 +09:00
Masahiro Yamada
3b8241f64c nios2: migrate to the generic rule for built-in DTB
Commit 654102df2a ("kbuild: add generic support for built-in boot
DTBs") introduced generic support for built-in DTBs.

Select GENERIC_BUILTIN_DTB when built-in DTB support is enabled.

To keep consistency across architectures, this commit also renames
CONFIG_NIOS2_DTB_SOURCE_BOOL to CONFIG_BUILTIN_DTB, and
CONFIG_NIOS2_DTB_SOURCE to CONFIG_BUILTIN_DTB_NAME.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-04-06 06:22:01 +09:00