2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
linux/arch/x86/kernel/cpu
Russell Senior bebe35bb73 x86/CPU: Fix warm boot hang regression on AMD SC1100 SoC systems
I still have some Soekris net4826 in a Community Wireless Network I
volunteer with. These devices use an AMD SC1100 SoC. I am running
OpenWrt on them, which uses a patched kernel, that naturally has
evolved over time.  I haven't updated the ones in the field in a
number of years (circa 2017), but have one in a test bed, where I have
intermittently tried out test builds.

A few years ago, I noticed some trouble, particularly when "warm
booting", that is, doing a reboot without removing power, and noticed
the device was hanging after the kernel message:

  [    0.081615] Working around Cyrix MediaGX virtual DMA bugs.

If I removed power and then restarted, it would boot fine, continuing
through the message above, thusly:

  [    0.081615] Working around Cyrix MediaGX virtual DMA bugs.
  [    0.090076] Enable Memory-Write-back mode on Cyrix/NSC processor.
  [    0.100000] Enable Memory access reorder on Cyrix/NSC processor.
  [    0.100070] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
  [    0.110058] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
  [    0.120037] CPU: NSC Geode(TM) Integrated Processor by National Semi (family: 0x5, model: 0x9, stepping: 0x1)
  [...]

In order to continue using modern tools, like ssh, to interact with
the software on these old devices, I need modern builds of the OpenWrt
firmware on the devices. I confirmed that the warm boot hang was still
an issue in modern OpenWrt builds (currently using a patched linux
v6.6.65).

Last night, I decided it was time to get to the bottom of the warm
boot hang, and began bisecting. From preserved builds, I narrowed down
the bisection window from late February to late May 2019. During this
period, the OpenWrt builds were using 4.14.x. I was able to build
using period-correct Ubuntu 18.04.6. After a number of bisection
iterations, I identified a kernel bump from 4.14.112 to 4.14.113 as
the commit that introduced the warm boot hang.

  07aaa7e3d6

Looking at the upstream changes in the stable kernel between 4.14.112
and 4.14.113 (tig v4.14.112..v4.14.113), I spotted a likely suspect:

  https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=20afb90f730982882e65b01fb8bdfe83914339c5

So, I tried reverting just that kernel change on top of the breaking
OpenWrt commit, and my warm boot hang went away.

Presumably, the warm boot hang is due to some register not getting
cleared in the same way that a loss of power does. That is
approximately as much as I understand about the problem.

More poking/prodding and coaching from Jonas Gorski, it looks
like this test patch fixes the problem on my board: Tested against
v6.6.67 and v4.14.113.

Fixes: 18fb053f9b ("x86/cpu/cyrix: Use correct macros for Cyrix calls on Geode processors")
Debugged-by: Jonas Gorski <jonas.gorski@gmail.com>
Signed-off-by: Russell Senior <russell@personaltelco.net>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/CAHP3WfOgs3Ms4Z+L9i0-iBOE21sdMk5erAiJurPjnrL9LSsgRA@mail.gmail.com
Cc: Matthew Whitehead <tedheadster@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
2025-02-25 22:44:01 +01:00
..
mce x86/mce/amd: Remove shared threshold bank plumbing 2025-01-03 19:05:35 +01:00
microcode x86/microcode/AMD: Remove ret local var in early_apply_microcode() 2024-12-31 14:03:41 +01:00
mtrr x86/mtrr: Rename mtrr_overwrite_state() to guest_force_mtrr_state() 2024-12-04 10:46:19 -08:00
resctrl - Remove the less generic CPU matching infra around struct x86_cpu_desc and 2025-01-21 09:30:59 -08:00
sgx - Use vmalloc_array() instead of vmalloc() 2024-11-22 12:50:00 -08:00
.gitignore
acrn.c x86/traps: Add sysvec_install() to install a system interrupt handler 2024-01-31 22:02:36 +01:00
amd.c - Remove the less generic CPU matching infra around struct x86_cpu_desc and 2025-01-21 09:30:59 -08:00
aperfmperf.c x86/sched: Add basic support for CPU capacity scaling 2024-09-04 13:36:40 +02:00
bugs.c x86/cpu/kvm: SRSO: Fix possible missing IBPB on VM-Exit 2025-02-11 10:07:52 -08:00
bus_lock.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
cacheinfo.c x86/cacheinfo: Delete global num_cache_leaves 2024-12-06 13:13:36 +01:00
centaur.c x86/cpu: Use common topology code for Centaur and Zhaoxin 2024-02-15 22:07:37 +01:00
common.c - Remove the less generic CPU matching infra around struct x86_cpu_desc and 2025-01-21 09:30:59 -08:00
cpu.h x86/topology/intel: Unlock CPUID before evaluating anything 2024-05-31 20:25:56 +02:00
cpuid-deps.c x86/cpufeatures: Make AVX-VNNI depend on AVX 2025-02-21 14:19:16 +01:00
cyrix.c x86/CPU: Fix warm boot hang regression on AMD SC1100 SoC systems 2025-02-25 22:44:01 +01:00
debugfs.c x86/topology: Introduce topology_logical_core_id() 2024-12-02 12:01:35 +01:00
feat_ctl.c x86/cpu: Clarify the error message when BIOS does not support SGX 2024-08-25 14:41:19 +02:00
hygon.c x86/cpu: Use common topology code for HYGON 2024-02-15 22:07:38 +01:00
hypervisor.c
intel_epb.c x86/cpu/intel_epb: Switch to new Intel CPU model defines 2024-04-29 10:31:16 +02:00
intel.c - Remove the less generic CPU matching infra around struct x86_cpu_desc and 2025-01-21 09:30:59 -08:00
Makefile x86/split_lock: Move Split and Bus lock code to a dedicated file 2024-08-08 18:02:15 +02:00
match.c Miscellaneous x86 cleanups and typo fixes, and also the removal 2025-01-21 11:15:29 -08:00
mkcapflags.sh x86/cpufeatures: Flip the /proc/cpuinfo appearance logic 2024-06-20 21:04:22 +02:00
mshyperv.c hyperv: Switch from hyperv-tlfs.h to hyperv/hvhdk.h 2025-01-10 00:54:21 +00:00
perfctr-watchdog.c
powerflags.c
proc.c x86/cpu: Use str_yes_no() helper in show_cpuinfo_misc() 2024-10-26 15:37:15 +02:00
rdrand.c x86/msr: Prepare for including <linux/percpu.h> into <asm/msr.h> 2024-03-04 12:01:39 +01:00
scattered.c x86/cpu: Fix formatting of cpuid_bits[] in scattered.c 2024-10-28 13:51:05 +01:00
topology_amd.c x86/cpu: Add CPU type to struct cpuinfo_topology 2024-10-25 20:44:26 +02:00
topology_common.c x86/topology: Introduce topology_logical_core_id() 2024-12-02 12:01:35 +01:00
topology_ext.c x86/cpu/topology: Add support for the AMD 0x80000026 leaf 2024-03-22 11:22:14 +01:00
topology.c Merge branch 'linus' into x86/cleanups, to resolve conflict 2024-12-10 19:33:03 +01:00
topology.h x86/cpu/topology: Retrieve cores per package from topology bitmaps 2024-02-15 22:07:45 +01:00
transmeta.c
tsx.c x86/cpu: Remove redundant extern x86_read_arch_cap_msr() 2023-01-10 12:40:24 +01:00
umc.c
umwait.c x86/umwait: move to use bus_get_dev_root() 2023-03-17 15:29:29 +01:00
vmware.c x86/vmware: Add TDX hypercall support 2024-06-25 17:15:48 +02:00
vortex.c
zhaoxin.c x86/cpu: Use common topology code for Centaur and Zhaoxin 2024-02-15 22:07:37 +01:00