- Add 'Runtime Firmware Activation' support for NVDIMMs that advertise
the relevant capability
- Misc libnvdimm and DAX cleanups
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQT9vPEBxh63bwxRYEEPzq5USduLdgUCXzHodgAKCRAPzq5USduL
djTjAQD1THDmizHn16zd94ueygh/BXfN0zyeVvQH352ol7kdfQEAj2A7YJ9XBbBY
JC6/CNd+OiB9W88lLOUf3Waj1a7cUQ8=
=Q6qn
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updayes from Vishal Verma:
"You'd normally receive this pull request from Dan Williams, but he's
busy watching a newborn (Congrats Dan!), so I'm watching libnvdimm
this cycle.
This adds a new feature in libnvdimm - 'Runtime Firmware Activation',
and a few small cleanups and fixes in libnvdimm and DAX. I'd
originally intended to make separate topic-based pull requests - one
for libnvdimm, and one for DAX, but some of the DAX material fell out
since it wasn't quite ready.
Summary:
- add 'Runtime Firmware Activation' support for NVDIMMs that
advertise the relevant capability
- misc libnvdimm and DAX cleanups"
* tag 'libnvdimm-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
libnvdimm/security: ensure sysfs poll thread woke up and fetch updated attr
libnvdimm/security: the 'security' attr never show 'overwrite' state
libnvdimm/security: fix a typo
ACPI: NFIT: Fix ARS zero-sized allocation
dax: Fix incorrect argument passed to xas_set_err()
ACPI: NFIT: Add runtime firmware activate support
PM, libnvdimm: Add runtime firmware activation support
libnvdimm: Convert to DEVICE_ATTR_ADMIN_RO()
drivers/dax: Expand lock scope to cover the use of addresses
fs/dax: Remove unused size parameter
dax: print error message by pr_info() in __generic_fsdax_supported()
driver-core: Introduce DEVICE_ATTR_ADMIN_{RO,RW}
tools/testing/nvdimm: Emulate firmware activation commands
tools/testing/nvdimm: Prepare nfit_ctl_test() for ND_CMD_CALL emulation
tools/testing/nvdimm: Add command debug messages
tools/testing/nvdimm: Cleanup dimm index passing
ACPI: NFIT: Define runtime firmware activation commands
ACPI: NFIT: Move bus_dsm_mask out of generic nvdimm_bus_descriptor
libnvdimm: Validate command family indices
Abstract platform specific mechanics for nvdimm firmware activation
behind a handful of generic ops. At the bus level ->activate_state()
indicates the unified state (idle, busy, armed) of all DIMMs on the bus,
and ->capability() indicates the system state expectations for activate.
At the DIMM level ->activate_state() indicates the per-DIMM state,
->activate_result() indicates the outcome of the last activation
attempt, and ->arm() attempts to transition the DIMM from 'idle' to
'armed'.
A new hibernate_quiet_exec() facility is added to support firmware
activation in an OS defined system quiesce state. It leverages the fact
that the hibernate-freeze state wants to assert that a memory
hibernation snapshot can be taken. This is in contrast to a platform
firmware defined quiesce state that may forcefully quiet the memory
controller independent of whether an individual device-driver properly
supports hibernate-freeze.
The libnvdimm sysfs interface is extended to support detection of a
firmware activate capability. The mechanism supports enumeration and
triggering of firmware activate, optionally in the
hibernate_quiet_exec() context.
[rafael: hibernate_quiet_exec() proposal]
[vishal: fix up sparse warning, grammar in Documentation/]
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Co-developed-by: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Signed-off-by: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
In hibernate.c, some places lack of spaces while some places have
redundant spaces. So fix them.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Hibernation concurrency handling is currently delegated to user.c,
where it's also used for regulating the access to the snapshot device.
In the prospective of making user.c a separate configuration option,
such mutual exclusion is brought into hibernate.c and made available
through accessor helpers hereby introduced.
Signed-off-by: Domenico Andreoli <domenico.andreoli@linux.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently the kernel threads are not frozen in software_resume(), so
between dpm_suspend_start(PMSG_QUIESCE) and resume_target_kernel(),
system_freezable_power_efficient_wq can still try to submit SCSI
commands and this can cause a panic since the low level SCSI driver
(e.g. hv_storvsc) has quiesced the SCSI adapter and can not accept
any SCSI commands: https://lkml.org/lkml/2020/4/10/47
At first I posted a fix (https://lkml.org/lkml/2020/4/21/1318) trying
to resolve the issue from hv_storvsc, but with the help of
Bart Van Assche, I realized it's better to fix software_resume(),
since this looks like a generic issue, not only pertaining to SCSI.
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If hibernation_restore() fails, the 'error' should not be zero.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Hibernation fails when the kernel cannot allocate enough memory
to copy all pages of RAM in use.
Ensure that the failure reason is clearly logged, and clearly
attributable to the hibernation module.
Signed-off-by: Luigi Semenzato <semenzato@google.com>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There is currently no way to verify the resume image when returning
from hibernate. This might compromise the signed modules trust model,
so until we can work with signed hibernate images we disable it when the
kernel is locked down.
Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Matthew Garrett <mjg59@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: rjw@rjwysocki.net
Cc: pavel@ucw.cz
cc: linux-pm@vger.kernel.org
Signed-off-by: James Morris <jmorris@namei.org>
Another round of SPDX header file fixes for 5.2-rc4
These are all more "GPL-2.0-or-later" or "GPL-2.0-only" tags being
added, based on the text in the files. We are slowly chipping away at
the 700+ different ways people tried to write the license text. All of
these were reviewed on the spdx mailing list by a number of different
people.
We now have over 60% of the kernel files covered with SPDX tags:
$ ./scripts/spdxcheck.py -v 2>&1 | grep Files
Files checked: 64533
Files with SPDX: 40392
Files with errors: 0
I think the majority of the "easy" fixups are now done, it's now the
start of the longer-tail of crazy variants to wade through.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXPuGTg8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykBvQCg2SG+HmDH+tlwKLT/q7jZcLMPQigAoMpt9Uuy
sxVEiFZo8ZU9v1IoRb1I
=qU++
-----END PGP SIGNATURE-----
Merge tag 'spdx-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull yet more SPDX updates from Greg KH:
"Another round of SPDX header file fixes for 5.2-rc4
These are all more "GPL-2.0-or-later" or "GPL-2.0-only" tags being
added, based on the text in the files. We are slowly chipping away at
the 700+ different ways people tried to write the license text. All of
these were reviewed on the spdx mailing list by a number of different
people.
We now have over 60% of the kernel files covered with SPDX tags:
$ ./scripts/spdxcheck.py -v 2>&1 | grep Files
Files checked: 64533
Files with SPDX: 40392
Files with errors: 0
I think the majority of the "easy" fixups are now done, it's now the
start of the longer-tail of crazy variants to wade through"
* tag 'spdx-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (159 commits)
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 450
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 449
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 448
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 446
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 445
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 444
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 443
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 442
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 440
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 438
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 437
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 436
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 435
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 434
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 433
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 432
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 431
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 430
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 429
...
Based on 1 normalized pattern(s):
this file is released under the gplv2
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 68 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Armijn Hemel <armijn@tjaldur.nl>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190531190114.292346262@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As explained in
0cc3cd2165 ("cpu/hotplug: Boot HT siblings at least once")
we always, no matter what, have to bring up x86 HT siblings during boot at
least once in order to avoid first MCE bringing the system to its knees.
That means that whenever 'nosmt' is supplied on the kernel command-line,
all the HT siblings are as a result sitting in mwait or cpudile after
going through the online-offline cycle at least once.
This causes a serious issue though when a kernel, which saw 'nosmt' on its
commandline, is going to perform resume from hibernation: if the resume
from the hibernated image is successful, cr3 is flipped in order to point
to the address space of the kernel that is being resumed, which in turn
means that all the HT siblings are all of a sudden mwaiting on address
which is no longer valid.
That results in triple fault shortly after cr3 is switched, and machine
reboots.
Fix this by always waking up all the SMT siblings before initiating the
'restore from hibernation' process; this guarantees that all the HT
siblings will be properly carried over to the resumed kernel waiting in
resume_play_dead(), and acted upon accordingly afterwards, based on the
target kernel configuration.
Symmetricaly, the resumed kernel has to push the SMT siblings to mwait
again in case it has SMT disabled; this means it has to online all
the siblings when resuming (so that they come out of hlt) and offline
them again to let them reach mwait.
Cc: 4.19+ <stable@vger.kernel.org> # v4.19+
Debugged-by: Thomas Gleixner <tglx@linutronix.de>
Fixes: 0cc3cd2165 ("cpu/hotplug: Boot HT siblings at least once")
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Pavel Machek <pavel@ucw.cz>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
On systems with ACPI platform firmware the last stage of hibernation
is analogous to system suspend to S3 (suspend-to-RAM), so it should
be handled analogously. In particular, pm_suspend_via_firmware()
should return 'true' in that stage to let the callers of it know that
control will be passed to the platform firmware going forward, so
pm_set_suspend_via_firmware() needs to be called then in analogy with
acpi_suspend_begin().
However, the platform hibernation ->begin() callback is invoked
during the "freeze" transition (before creating a snapshot image of
system memory) as well as during the "hibernate" transition which is
the last stage of it and pm_set_suspend_via_firmware() should be
invoked by that callback in the latter stage only.
In order to implement that redefine the hibernation ->begin()
callback to take a pm_message_t argument to indicate which stage
of hibernation is taking place and rework acpi_hibernation_begin()
and acpi_hibernation_begin_old() to take it into account as needed.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
- Fix the handling of Performance and Energy Bias Hint (EPB) on
Intel processors and expose it to user space via sysfs to avoid
having to access it through the generic MSR I/F (Rafael Wysocki).
- Improve the handling of global turbo changes made by the platform
firmware in the intel_pstate driver (Rafael Wysocki).
- Convert some slow-path static_cpu_has() callers to boot_cpu_has()
in cpufreq (Borislav Petkov).
- Fix the frequency calculation loop in the armada-37xx cpufreq
driver (Gregory CLEMENT).
- Fix possible object reference leaks in multuple cpufreq drivers
(Wen Yang).
- Fix kerneldoc comment in the centrino cpufreq driver (dongjian).
- Clean up the ACPI and maple cpufreq drivers (Viresh Kumar, Mohan
Kumar).
- Add support for lx2160a and ls1028a to the qoriq cpufreq driver
(Vabhav Sharma, Yuantian Tang).
- Fix kobject memory leak in the cpufreq core (Viresh Kumar).
- Simplify the IOwait boosting in the schedutil cpufreq governor
and rework the TSC cpufreq notifier on x86 (Rafael Wysocki).
- Clean up the cpufreq core and statistics code (Yue Hu, Kyle Lin).
- Improve the cpufreq documentation, add SPDX license tags to
some PM documentation files and unify copyright notices in
them (Rafael Wysocki).
- Add support for "CPU" domains to the generic power domains (genpd)
framework and provide low-level PSCI firmware support for that
feature (Ulf Hansson).
- Rearrange the PSCI firmware support code and add support for
SYSTEM_RESET2 to it (Ulf Hansson, Sudeep Holla).
- Improve genpd support for devices in multiple power domains (Ulf
Hansson).
- Unify target residency for the AFTR and coupled AFTR states in the
exynos cpuidle driver (Marek Szyprowski).
- Introduce new helper routine in the operating performance points
(OPP) framework (Andrew-sh.Cheng).
- Add support for passing on-die termination (ODT) and auto power
down parameters from the kernel to Trusted Firmware-A (TF-A) to
the rk3399_dmc devfreq driver (Enric Balletbo i Serra).
- Add tracing to devfreq (Lukasz Luba).
- Make the exynos-bus devfreq driver suspend all devices on system
shutdown (Marek Szyprowski).
- Fix a few minor issues in the devfreq subsystem and clean it up
somewhat (Enric Balletbo i Serra, MyungJoo Ham, Rob Herring,
Saravana Kannan, Yangtao Li).
- Improve system wakeup diagnostics (Stephen Boyd).
- Rework filesystem sync messages emitted during system suspend and
hibernation (Harry Pan).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAlzQEwUSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxxXwP/jrxikIXdCOV3CJVioV0NetyebwlOqYp
UsIA7lQBfZ/DY6dHw/oKuAT9LP01vcFg6XGe83Alkta9qczR5KZ/MYHFNSZXjXjL
kEvIMBCS/oykaBuW+Xn9am8Ke3Yq/rBSTKWVom3vzSQY0qvZ9GBwPDrzw+k63Zhz
P3afB4ThyY0e9ftgw4HvSSNm13Kn0ItUIQOdaLatXMMcPqP5aAdnUma5Ibinbtpp
rpTHuHKYx7MSjaCg6wl3kKTJeWbQP4wYO2ISZqH9zEwQgdvSHeFAvfPKTegUkmw9
uUsQnPD1JvdglOKovr2muehD1Ur+zsjKDf2OKERkWsWXHPyWzA/AqaVv1mkkU++b
KaWaJ9pE86kGlJ3EXwRbGfV0dM5rrl+dUUQW6nPI1XJnIOFlK61RzwAbqI26F0Mz
AlKxY4jyPLcM3SpQz9iILqyzHQqB67rm29XvId/9scoGGgoqEI4S+v6LYZqI3Vx6
aeSRu+Yof7p5w4Kg5fODX+HzrtMnMrPmLUTXhbExfsYZMi7hXURcN6s+tMpH0ckM
4yiIpnNGCKUSV4vxHBm8XJdAuUnR4Vcz++yFslszgDVVvw5tkvF7SYeHZ6HqcQVm
af9HdWzx3qajs/oyBwdRBedZYDnP1joC5donBI2ofLeF33NA7TEiPX8Zebw8XLkv
fNikssA7PGdv
=nY9p
-----END PGP SIGNATURE-----
Merge tag 'pm-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These fix the (Intel-specific) Performance and Energy Bias Hint (EPB)
handling and expose it to user space via sysfs, fix and clean up
several cpufreq drivers, add support for two new chips to the qoriq
cpufreq driver, fix, simplify and clean up the cpufreq core and the
schedutil governor, add support for "CPU" domains to the generic power
domains (genpd) framework and provide low-level PSCI firmware support
for that feature, fix the exynos cpuidle driver and fix a couple of
issues in the devfreq subsystem and clean it up.
Specifics:
- Fix the handling of Performance and Energy Bias Hint (EPB) on Intel
processors and expose it to user space via sysfs to avoid having to
access it through the generic MSR I/F (Rafael Wysocki).
- Improve the handling of global turbo changes made by the platform
firmware in the intel_pstate driver (Rafael Wysocki).
- Convert some slow-path static_cpu_has() callers to boot_cpu_has()
in cpufreq (Borislav Petkov).
- Fix the frequency calculation loop in the armada-37xx cpufreq
driver (Gregory CLEMENT).
- Fix possible object reference leaks in multuple cpufreq drivers
(Wen Yang).
- Fix kerneldoc comment in the centrino cpufreq driver (dongjian).
- Clean up the ACPI and maple cpufreq drivers (Viresh Kumar, Mohan
Kumar).
- Add support for lx2160a and ls1028a to the qoriq cpufreq driver
(Vabhav Sharma, Yuantian Tang).
- Fix kobject memory leak in the cpufreq core (Viresh Kumar).
- Simplify the IOwait boosting in the schedutil cpufreq governor and
rework the TSC cpufreq notifier on x86 (Rafael Wysocki).
- Clean up the cpufreq core and statistics code (Yue Hu, Kyle Lin).
- Improve the cpufreq documentation, add SPDX license tags to some PM
documentation files and unify copyright notices in them (Rafael
Wysocki).
- Add support for "CPU" domains to the generic power domains (genpd)
framework and provide low-level PSCI firmware support for that
feature (Ulf Hansson).
- Rearrange the PSCI firmware support code and add support for
SYSTEM_RESET2 to it (Ulf Hansson, Sudeep Holla).
- Improve genpd support for devices in multiple power domains (Ulf
Hansson).
- Unify target residency for the AFTR and coupled AFTR states in the
exynos cpuidle driver (Marek Szyprowski).
- Introduce new helper routine in the operating performance points
(OPP) framework (Andrew-sh.Cheng).
- Add support for passing on-die termination (ODT) and auto power
down parameters from the kernel to Trusted Firmware-A (TF-A) to the
rk3399_dmc devfreq driver (Enric Balletbo i Serra).
- Add tracing to devfreq (Lukasz Luba).
- Make the exynos-bus devfreq driver suspend all devices on system
shutdown (Marek Szyprowski).
- Fix a few minor issues in the devfreq subsystem and clean it up
somewhat (Enric Balletbo i Serra, MyungJoo Ham, Rob Herring,
Saravana Kannan, Yangtao Li).
- Improve system wakeup diagnostics (Stephen Boyd).
- Rework filesystem sync messages emitted during system suspend and
hibernation (Harry Pan)"
* tag 'pm-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (72 commits)
cpufreq: Fix kobject memleak
cpufreq: armada-37xx: fix frequency calculation for opp
cpufreq: centrino: Fix centrino_setpolicy() kerneldoc comment
cpufreq: qoriq: add support for lx2160a
x86: tsc: Rework time_cpufreq_notifier()
PM / Domains: Allow to attach a CPU via genpd_dev_pm_attach_by_id|name()
PM / Domains: Search for the CPU device outside the genpd lock
PM / Domains: Drop unused in-parameter to some genpd functions
PM / Domains: Use the base device for driver_deferred_probe_check_state()
cpufreq: qoriq: Add ls1028a chip support
PM / Domains: Enable genpd_dev_pm_attach_by_id|name() for single PM domain
PM / Domains: Allow OF lookup for multi PM domain case from ->attach_dev()
PM / Domains: Don't kfree() the virtual device in the error path
cpufreq: Move ->get callback check outside of __cpufreq_get()
PM / Domains: remove unnecessary unlikely()
cpufreq: Remove needless bios_limit check in show_bios_limit()
drivers/cpufreq/acpi-cpufreq.c: This fixes the following checkpatch warning
firmware/psci: add support for SYSTEM_RESET2
PM / devfreq: add tracing for scheduling work
trace: events: add devfreq trace event file
...
This adds a function to disable secondary CPUs for suspend that are
not necessarily non-zero / non-boot CPUs. Platforms will be able to
use this to suspend using non-zero CPUs.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lkml.kernel.org/r/20190411033448.20842-3-npiggin@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Create a common helper to sync filesystems for system suspend and
hibernation.
Signed-off-by: Harry Pan <harry.pan@intel.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
At present, "systemctl suspend" and "shutdown" can run in parrallel. A
system can suspend after devices_shutdown(), and resume. Then the shutdown
task goes on to power off. This causes many devices are not really shut
off. Hence replacing reboot_mutex with system_transition_mutex (renamed
from pm_mutex) to achieve the exclusion. The renaming of pm_mutex as
system_transition_mutex can be better to reflect the purpose of the mutex.
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
This addresses Coverity-ID: 114713 ("Missing break in switch").
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
timekeeping suspend/resume calls read_persistent_clock() which takes
rtc_lock. That results in might sleep warnings because at that point
we run with interrupts disabled.
We cannot convert rtc_lock to a raw spinlock as that would trigger
other might sleep warnings.
As a workaround we disable the might sleep warnings by setting
system_state to SYSTEM_SUSPEND before calling sysdev_suspend() and
restoring it to SYSTEM_RUNNING afer sysdev_resume(). There is no lock
contention because hibernate / suspend to RAM is single-CPU at this
point.
In s2idle's case the system_state is set to SYSTEM_SUSPEND before
timekeeping_suspend() which is invoked by the last CPU. In the resume
case it set back to SYSTEM_RUNNING after timekeeping_resume() which is
invoked by the first CPU in the resume case. The other CPUs will block
on tick_freeze_lock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: cover s2idle in tick_freeze() / tick_unfreeze()]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
- Modify the cpuidle poll state implementation to prevent CPUs from
staying in the loop in there for excessive times (Rafael Wysocki).
- Add Intel Cannon Lake chips support to the RAPL power capping
driver (Joe Konno).
- Add reference counting to the device links handling code in the
PM core (Lukas Wunner).
- Avoid reconfiguring GPEs on suspend-to-idle in the ACPI system
suspend code (Rafael Wysocki).
- Allow devices to be put into deeper low-power states via ACPI
if both _SxD and _SxW are missing (Daniel Drake).
- Reorganize the core ACPI suspend-to-idle wakeup code to avoid a
keyboard wakeup issue on Asus UX331UA (Chris Chiu).
- Prevent the PCMCIA library code from aborting suspend-to-idle due
to noirq suspend failures resulting from incorrect assumptions
(Rafael Wysocki).
- Add coupled cpuidle supprt to the Exynos3250 platform (Marek
Szyprowski).
- Add new sysfs file to make it easier to specify the image storage
location during hibernation (Mario Limonciello).
- Add sysfs files for collecting suspend-to-idle usage and time
statistics for CPU idle states (Rafael Wysocki).
- Update the pm-graph utilities (Todd Brandt).
- Reduce the kernel log noise related to reporting Low-power Idle
constraings by the ACPI system suspend code (Rafael Wysocki).
- Make it easier to distinguish dedicated wakeup IRQs in the
/proc/interrupts output (Tony Lindgren).
- Add the frequency table validation in cpufreq to the core and
drop it from a number of cpufreq drivers (Viresh Kumar).
- Drop "cooling-{min|max}-level" for CPU nodes from a couple of
DT bindings (Viresh Kumar).
- Clean up the CPU online error code path in the cpufreq core
(Viresh Kumar).
- Fix assorted issues in the SCPI, CPPC, mediatek and tegra186
cpufreq drivers (Arnd Bergmann, Chunyu Hu, George Cherian,
Viresh Kumar).
- Drop memory allocation error messages from a few places in
cpufreq and cpuildle drivers (Markus Elfring).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJawgTUAAoJEILEb/54YlRxI48QALc6IUfj/O+gLAWAf8qHk+8V
eLn9E1NrZXUtNMNYItBcgZfuMImIj7MC5qRo/BhzYdd0VyUzFYEyd9liUVFBDEXA
SH65jyjRrXORKfLrSP5H8lcCdckTFXfxzonVFN2n4l7Gdv540UFuqloU+vS4Wrfp
wMg9UvKRxr+7LwOI4q2sMFtB8Uki+lySY5UECqRIKUIJKIH6RPo3m73Kps7kw8kU
c2RCU8w/9PoomPaEjvwZ0vT5lNrQXmBbC5hxcMzBHtLS0Cwb3xJsUB4w6niezdGY
e+n6Vx7XeId7+Ujnn4praxUwyVq2wEirJccvAEgKFcZzjmGAXrHl8rOgMLvb3ugN
P+ftkYk+Vizci9hmACeA1LGw4hN/dXMfephnezCsy9Q/QK8QPJV8XO0vxfyaQYhZ
ie6SKcdZimFDzqd6oHLFftRou3imvq8RUvKTx2CR0KVkApnaDDiTeP5ay1Yd+UU3
EomWe7/mxoSgJ9kB/9GlKifQXBof62/fbrWH0AdM1oliONbbOZcLqg5x4DZUmfTK
hTAx3SSxMRZSlc4Zl1CzbrHnFKi9cRBYCs0tPdOSnAO2ZfCsOmokJE+ig5I8lZre
SlaciUpG2Vvf7m61mVlrqLLos8T9rTVM2pqwsoxII7A+dFrWK3HpqoypEN/87tm7
4/zjPF6LK2eTKL9WdTCk
=6JC2
-----END PGP SIGNATURE-----
Merge tag 'pm-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These update the cpuidle poll state definition to reduce excessive
energy usage related to it, add new CPU ID to the RAPL power capping
driver, update the ACPI system suspend code to handle some special
cases better, extend the PM core's device links code slightly, add new
sysfs attribute for better suspend-to-idle diagnostics and easier
hibernation handling, update power management tools and clean up
cpufreq quite a bit.
Specifics:
- Modify the cpuidle poll state implementation to prevent CPUs from
staying in the loop in there for excessive times (Rafael Wysocki).
- Add Intel Cannon Lake chips support to the RAPL power capping
driver (Joe Konno).
- Add reference counting to the device links handling code in the PM
core (Lukas Wunner).
- Avoid reconfiguring GPEs on suspend-to-idle in the ACPI system
suspend code (Rafael Wysocki).
- Allow devices to be put into deeper low-power states via ACPI if
both _SxD and _SxW are missing (Daniel Drake).
- Reorganize the core ACPI suspend-to-idle wakeup code to avoid a
keyboard wakeup issue on Asus UX331UA (Chris Chiu).
- Prevent the PCMCIA library code from aborting suspend-to-idle due
to noirq suspend failures resulting from incorrect assumptions
(Rafael Wysocki).
- Add coupled cpuidle supprt to the Exynos3250 platform (Marek
Szyprowski).
- Add new sysfs file to make it easier to specify the image storage
location during hibernation (Mario Limonciello).
- Add sysfs files for collecting suspend-to-idle usage and time
statistics for CPU idle states (Rafael Wysocki).
- Update the pm-graph utilities (Todd Brandt).
- Reduce the kernel log noise related to reporting Low-power Idle
constraings by the ACPI system suspend code (Rafael Wysocki).
- Make it easier to distinguish dedicated wakeup IRQs in the
/proc/interrupts output (Tony Lindgren).
- Add the frequency table validation in cpufreq to the core and drop
it from a number of cpufreq drivers (Viresh Kumar).
- Drop "cooling-{min|max}-level" for CPU nodes from a couple of DT
bindings (Viresh Kumar).
- Clean up the CPU online error code path in the cpufreq core (Viresh
Kumar).
- Fix assorted issues in the SCPI, CPPC, mediatek and tegra186
cpufreq drivers (Arnd Bergmann, Chunyu Hu, George Cherian, Viresh
Kumar).
- Drop memory allocation error messages from a few places in cpufreq
and cpuildle drivers (Markus Elfring)"
* tag 'pm-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (56 commits)
ACPI / PM: Fix keyboard wakeup from suspend-to-idle on ASUS UX331UA
cpufreq: CPPC: Use transition_delay_us depending transition_latency
PM / hibernate: Change message when writing to /sys/power/resume
PM / hibernate: Make passing hibernate offsets more friendly
cpuidle: poll_state: Avoid invoking local_clock() too often
PM: cpuidle/suspend: Add s2idle usage and time state attributes
cpuidle: Enable coupled cpuidle support on Exynos3250 platform
cpuidle: poll_state: Add time limit to poll_idle()
cpufreq: tegra186: Don't validate the frequency table twice
cpufreq: speedstep: Don't validate the frequency table twice
cpufreq: sparc: Don't validate the frequency table twice
cpufreq: sh: Don't validate the frequency table twice
cpufreq: sfi: Don't validate the frequency table twice
cpufreq: scpi: Don't validate the frequency table twice
cpufreq: sc520: Don't validate the frequency table twice
cpufreq: s3c24xx: Don't validate the frequency table twice
cpufreq: qoirq: Don't validate the frequency table twice
cpufreq: pxa: Don't validate the frequency table twice
cpufreq: ppc_cbe: Don't validate the frequency table twice
cpufreq: powernow: Don't validate the frequency table twice
...
Using this helper allows us to avoid the in-kernel calls to the
sys_sync() syscall. The ksys_ prefix denotes that this function
is meant as a drop-in replacement for the syscall. In particular, it
uses the same calling convention as sys_sync().
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
This file is used both for setting the wakeup device without kernel
command line as well as for actually waking the system (when appropriate
swap header is in place).
To avoid confusion on incorrect logs in system log downgrade the
message to debug and make it clearer.
Signed-off-by: Mario Limonciello <mario.limonciello@dell.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently the only way to specify a hibernate offset for a
swap file is on the kernel command line.
Add a new /sys/power/resume_offset that lets userspace
specify the offset and disk to use when initiating a hibernate
cycle.
Signed-off-by: Mario Limonciello <mario.limonciello@dell.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Regardless of whether or not debug messages from the core system
suspend/hibernation code are enabled, it is useful to know when
system-wide transitions start and finish (or fail), so print "info"
messages at these points.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Mark Salyzyn <salyzyn@android.com>
Debug messages from the system suspend/hibernation infrastructure can
fill up the entire kernel log buffer in some cases and anyway they
are only useful for debugging. They depend on CONFIG_PM_DEBUG, but
that is set as a rule as some generally useful diagnostic facilities
depend on it too.
For this reason, avoid printing those messages by default, but make
it possible to turn them on as needed with the help of a new sysfs
attribute under /sys/power/.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
attribute_groups are not supposed to change at runtime. All functions
working with attribute_groups provided by <linux/sysfs.h> work with const
attribute_group. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
6332 488 308 7128 1bd8 kernel/power/hibernate.o
File size After adding 'const':
text data bss dec hex filename
6396 424 308 7128 1bd8 kernel/power/hibernate.o
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull sched.h split-up from Ingo Molnar:
"The point of these changes is to significantly reduce the
<linux/sched.h> header footprint, to speed up the kernel build and to
have a cleaner header structure.
After these changes the new <linux/sched.h>'s typical preprocessed
size goes down from a previous ~0.68 MB (~22K lines) to ~0.45 MB (~15K
lines), which is around 40% faster to build on typical configs.
Not much changed from the last version (-v2) posted three weeks ago: I
eliminated quirks, backmerged fixes plus I rebased it to an upstream
SHA1 from yesterday that includes most changes queued up in -next plus
all sched.h changes that were pending from Andrew.
I've re-tested the series both on x86 and on cross-arch defconfigs,
and did a bisectability test at a number of random points.
I tried to test as many build configurations as possible, but some
build breakage is probably still left - but it should be mostly
limited to architectures that have no cross-compiler binaries
available on kernel.org, and non-default configurations"
* 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (146 commits)
sched/headers: Clean up <linux/sched.h>
sched/headers: Remove #ifdefs from <linux/sched.h>
sched/headers: Remove the <linux/topology.h> include from <linux/sched.h>
sched/headers, hrtimer: Remove the <linux/wait.h> include from <linux/hrtimer.h>
sched/headers, x86/apic: Remove the <linux/pm.h> header inclusion from <asm/apic.h>
sched/headers, timers: Remove the <linux/sysctl.h> include from <linux/timer.h>
sched/headers: Remove <linux/magic.h> from <linux/sched/task_stack.h>
sched/headers: Remove <linux/sched.h> from <linux/sched/init.h>
sched/core: Remove unused prefetch_stack()
sched/headers: Remove <linux/rculist.h> from <linux/sched.h>
sched/headers: Remove the 'init_pid_ns' prototype from <linux/sched.h>
sched/headers: Remove <linux/signal.h> from <linux/sched.h>
sched/headers: Remove <linux/rwsem.h> from <linux/sched.h>
sched/headers: Remove the runqueue_is_locked() prototype
sched/headers: Remove <linux/sched.h> from <linux/sched/hotplug.h>
sched/headers: Remove <linux/sched.h> from <linux/sched/debug.h>
sched/headers: Remove <linux/sched.h> from <linux/sched/nohz.h>
sched/headers: Remove <linux/sched.h> from <linux/sched/stat.h>
sched/headers: Remove the <linux/gfp.h> include from <linux/sched.h>
sched/headers: Remove <linux/rtmutex.h> from <linux/sched.h>
...
- Fix for a cpuidle menu governor problem that started to take an
unnecessary spinlock after one of the recent updates and that
did not play well with the RT patch (Rafael Wysocki).
- Fix for the new intel_pstate operation mode switching feature
added recently that did not reinitialize P-state limits properly
when switching operation modes (Rafael Wysocki).
- Removal of unused global notifiers from the PM QoS framework
(Viresh Kumar).
- Generic power domains framework update to make it handle
asynchronous invocations of PM callbacks in the "noirq" phases
of system suspend/hibernation correctly (Ulf Hansson).
- Two hibernation core cleanups (Rafael Wysocki).
- intel_idle cleanup related to the sysfs interface (Len Brown).
- Off-by-one bug fix in the OPP (Operating Performance Points)
framework (Andrzej Hajda).
- OPP framework's documentation fix (Viresh Kumar).
- cpufreq qoriq driver cleanup (Tang Yuantian).
- Fixes for typos in comments in the device runtime PM framework
(Christophe Jaillet).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJYuLeGAAoJEILEb/54YlRxJvcP/0BRmh8Hn4Itx/NIWNg71X6j
U+v8Pn8T3MP33gCcleLYlgre2JUIAUDmhdK99+UOx+/abhjMhQSaF3HhTOwYaPtQ
6njoHVS0NnfqUf+x5kp+EpRxBVNucYVbdRTVd1DsHIeLLz/96DFOzb/R7tko/pKx
pFMWvNdotHLLgXOG1UvdRimwTDlFMffxFzD8Se53LPjRXS0S73A5VWfqZOye44Re
j3W1AJ0Idgq5uduA6J8x1MWbaxDq1h+j6CSUm05yvqrINzxXwXt0Hv6stCQTo+Gb
YMdiBd8MujNyAgcchw3jiDQ8Vp+zmfLPcHrfPe//SSefj26eB8LyVNSYelvbUdOz
cNjvyErva37MmaegCL9QC7WbLM+A7VE6bU6YzDCi/rR8jYMJ51Fb9jGiYb/oimry
OLlblEekikUsskWv4hGV1JVt5VhmUMlagWtexxn+lMszATcZro0tfXu/vgQWksYs
noUnwuWJWxvj2aNMsvbzW3HLlTGSmYl2UxJ7IymQQaTDblwF9Kg61rm3+5coUctd
ifceynDVp9Gju25faYgZ+Dq9+o8ktlOGOHRRPdLIRNJ/T+4tUDnlGkdbPb+Tfn03
XUIzYCu74U8/oW8gOk6t0WpmWzvxEXNgdirdEIR6y3loYIC0Jr3v4gyD975Eug74
Hzfrdg7ignAmWV+nf6UY
=SeUF
-----END PGP SIGNATURE-----
Merge tag 'pm-extra-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates deom Rafael Wysocki:
"These fix two bugs introduced by recent power management updates (in
the cpuidle menu governor and intel_pstate) and a few other issues,
clean up things and remove unused code.
Specifics:
- Fix for a cpuidle menu governor problem that started to take an
unnecessary spinlock after one of the recent updates and that did
not play well with the RT patch (Rafael Wysocki).
- Fix for the new intel_pstate operation mode switching feature added
recently that did not reinitialize P-state limits properly when
switching operation modes (Rafael Wysocki).
- Removal of unused global notifiers from the PM QoS framework
(Viresh Kumar).
- Generic power domains framework update to make it handle
asynchronous invocations of PM callbacks in the "noirq" phases of
system suspend/hibernation correctly (Ulf Hansson).
- Two hibernation core cleanups (Rafael Wysocki).
- intel_idle cleanup related to the sysfs interface (Len Brown).
- Off-by-one bug fix in the OPP (Operating Performance Points)
framework (Andrzej Hajda).
- OPP framework's documentation fix (Viresh Kumar).
- cpufreq qoriq driver cleanup (Tang Yuantian).
- Fixes for typos in comments in the device runtime PM framework
(Christophe Jaillet)"
* tag 'pm-extra-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / OPP: Documentation: Fix opp-microvolt in examples
intel_idle: stop exposing platform acronyms in sysfs
cpufreq: intel_pstate: Fix limits issue with operation mode switching
PM / hibernate: Define pr_fmt() and use pr_*() instead of printk()
PM / hibernate: Untangle power_down()
cpuidle: menu: Avoid taking spinlock for accessing QoS values
PM / QoS: Remove global notifiers
PM / runtime: Fix some typos
cpufreq: qoriq: clean up unused code
PM / OPP: fix off-by-one bug in dev_pm_opp_get_max_volt_latency loop
PM / Domains: Power off masters immediately in the power off sequence
PM / Domains: Rename is_async to one_dev_on for genpd_power_off()
PM / Domains: Move genpd_power_off() above genpd_power_on()
We are going to move softlockup APIs out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.
<linux/nmi.h> already includes <linux/sched.h>.
Include the <linux/nmi.h> header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Define a pr_fmt() for hibernate.c and convert all of the explicit
printk() calls into corresponding pr_*() so that they use the
pr_fmt() format.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The power_down() routine in the core hibernation code is not exactly
straightforward (to put it lightly), so clean it up to make it avoid
invoking itself recursively, among other things.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Both of these options are poorly named. The features they provide are
necessary for system security and should not be considered debug only.
Change the names to CONFIG_STRICT_KERNEL_RWX and
CONFIG_STRICT_MODULE_RWX to better describe what these options do.
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Jessica Yu <jeyu@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
PAGE_POISONING_ZERO disables zeroing new pages on alloc, they are
poisoned (zeroed) as they become available.
In the hibernate use case, free pages will appear in the system without
being cleared, left there by the loading kernel.
This patch will make sure free pages are cleared on resume when
PAGE_POISONING_ZERO is enabled. We free the pages just after resume
because we can't do it later: going through any device resume code might
allocate some memory and invalidate the free pages bitmap.
Thus we don't need to disable hibernation when PAGE_POISONING_ZERO is
enabled.
Signed-off-by: Anisse Astier <anisse@astier.eu>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Restore the processor state before calling any other functions to
ensure per-CPU variables can be used with KASLR memory randomization.
Tracing functions use per-CPU variables (GS based on x86) and one was
called just before restoring the processor state fully. It resulted
in a double fault when both the tracing & the exception handler
functions tried to use a per-CPU variable.
Fixes: bb3632c610 (PM / sleep: trace events for suspend/resume)
Reported-and-tested-by: Borislav Petkov <bp@suse.de>
Reported-by: Jiri Kosina <jikos@kernel.org>
Tested-by: Rafael J. Wysocki <rafael@kernel.org>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Garnier <thgarnie@google.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
- Rework the cpufreq governor interface to make it more straightforward
and modify the conservative governor to avoid using transition
notifications (Rafael Wysocki).
- Rework the handling of frequency tables by the cpufreq core to make
it more efficient (Viresh Kumar).
- Modify the schedutil governor to reduce the number of wakeups it
causes to occur in cases when the CPU frequency doesn't need to be
changed (Steve Muckle, Viresh Kumar).
- Fix some minor issues and clean up code in the cpufreq core and
governors (Rafael Wysocki, Viresh Kumar).
- Add Intel Broxton support to the intel_pstate driver (Srinivas
Pandruvada).
- Fix problems related to the config TDP feature and to the validity
of the MSR_HWP_INTERRUPT register in intel_pstate (Jan Kiszka,
Srinivas Pandruvada).
- Make intel_pstate update the cpu_frequency tracepoint even if
the frequency doesn't change to avoid confusing powertop (Rafael
Wysocki).
- Clean up the usage of __init/__initdata in intel_pstate, mark some
of its internal variables as __read_mostly and drop an unused
structure element from it (Jisheng Zhang, Carsten Emde).
- Clean up the usage of some duplicate MSR symbols in intel_pstate
and turbostat (Srinivas Pandruvada).
- Update/fix the powernv, s3c24xx and mvebu cpufreq drivers (Akshay
Adiga, Viresh Kumar, Ben Dooks).
- Fix a regression (introduced during the 4.5 cycle) in the
pcc-cpufreq driver by reverting the problematic commit (Andreas
Herrmann).
- Add support for Intel Denverton to intel_idle, clean up Broxton
support in it and make it explicitly non-modular (Jacob Pan,
Jan Beulich, Paul Gortmaker).
- Add support for Denverton and Ivy Bridge server to the Intel RAPL
power capping driver and make it more careful about the handing
of MSRs that may not be present (Jacob Pan, Xiaolong Wang).
- Fix resume from hibernation on x86-64 by making the CPU offline
during resume avoid using MONITOR/MWAIT in the "play dead" loop
which may lead to an inadvertent "revival" of a "dead" CPU and
a page fault leading to a kernel crash from it (Rafael Wysocki).
- Make memory management during resume from hibernation more
straightforward (Rafael Wysocki).
- Add debug features that should help to detect problems related
to hibernation and resume from it (Rafael Wysocki, Chen Yu).
- Clean up hibernation core somewhat (Rafael Wysocki).
- Prevent KASAN from instrumenting the hibernation core which leads
to large numbers of false-positives from it (James Morse).
- Prevent PM (hibernate and suspend) notifiers from being called
during the cleanup phase if they have not been called during the
corresponding preparation phase which is possible if one of the
other notifiers returns an error at that time (Lianwei Wang).
- Improve suspend-related debug printout in the tasks freezer and
clean up suspend-related console handling (Roger Lu, Borislav
Petkov).
- Update the AnalyzeSuspend script in the kernel sources to
version 4.2 (Todd Brandt).
- Modify the generic power domains framework to make it handle
system suspend/resume better (Ulf Hansson).
- Make the runtime PM framework avoid resuming devices synchronously
when user space changes the runtime PM settings for them and
improve its error reporting (Rafael Wysocki, Linus Walleij).
- Fix error paths in devfreq drivers (exynos, exynos-ppmu, exynos-bus)
and in the core, make some devfreq code explicitly non-modular and
change some of it into tristate (Bartlomiej Zolnierkiewicz,
Peter Chen, Paul Gortmaker).
- Add DT support to the generic PM clocks management code and make
it export some more symbols (Jon Hunter, Paul Gortmaker).
- Make the PCI PM core code slightly more robust against possible
driver errors (Andy Shevchenko).
- Make it possible to change DESTDIR and PREFIX in turbostat
(Andy Shevchenko).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJXl7/dAAoJEILEb/54YlRx+VgQAIQJOWvxKew3Yl02c/sdj9OT
5VNnFrzGzdcAPofvvG9qGq8B0Es1vYehJpwwOB21ri8EvYv0riIiU1yrqslObojQ
oaZOkSBpbIoKjGR4CpYA/A+feE+8EqIBdPGd+lx5a6oRdUi7tRVHBG9lyLO3FB/i
jan1q8dMpZsmu+Y+rVVHGnCVuIlIEqr2ZnZfCwDAulO2Arp/QFAh4kH08ELATvrl
bkPa25vq7/VMP/vCDzrfZKD5mUuKogIRu/J5wx4py1nE+FB35cKKyqBOgklLwAeY
UI8vjDhr/myNUs54AZlktOkq47TCYvjvhX9kmOxBjuWqFbRusU012IRek1fYPRIV
ZqbkqNX7UEVQwunAEg9AyFwyzEtOht93dQDT5RLEd4QzKuM76gmHpLeTGGMzE+nu
FnmF9JGl4DVwqpZl9yU2+hR2Mt3bP8OF8qYmNiGUB3KO4emPslhSd+6y8liA5Bx2
SJf0Gb//vaHCh3/uMnwAonYPqRkZvBLOMwuL1VUjNQfRMnQtDdgHMYB1aT/EglPA
8ww6j4J8rVRLAxvYQ3UEmNA/vBNclKXblRR18+JddEZP9/oX0ATfwnCCUpr839uk
xxyQhrm4/AI60+PHWCX4GG80YrKdOGTkF7LXCQZanVWjjuyF17rufegZ2YWLT07v
JU1Cmumfdy2jJluT8xsR
=uVGz
-----END PGP SIGNATURE-----
Merge tag 'pm-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"Again, the majority of changes go into the cpufreq subsystem, but
there are no big features this time. The cpufreq changes that stand
out somewhat are the governor interface rework and improvements
related to the handling of frequency tables. Apart from those, there
are fixes and new device/CPU IDs in drivers, cleanups and an
improvement of the new schedutil governor.
Next, there are some changes in the hibernation core, including a fix
for a nasty problem related to the MONITOR/MWAIT usage by CPU offline
during resume from hibernation, a few core improvements related to
memory management during resume, a couple of additional debug features
and cleanups.
Finally, we have some fixes and cleanups in the devfreq subsystem,
generic power domains framework improvements related to system
suspend/resume, support for some new chips in intel_idle and in the
power capping RAPL driver, a new version of the AnalyzeSuspend utility
and some assorted fixes and cleanups.
Specifics:
- Rework the cpufreq governor interface to make it more
straightforward and modify the conservative governor to avoid using
transition notifications (Rafael Wysocki).
- Rework the handling of frequency tables by the cpufreq core to make
it more efficient (Viresh Kumar).
- Modify the schedutil governor to reduce the number of wakeups it
causes to occur in cases when the CPU frequency doesn't need to be
changed (Steve Muckle, Viresh Kumar).
- Fix some minor issues and clean up code in the cpufreq core and
governors (Rafael Wysocki, Viresh Kumar).
- Add Intel Broxton support to the intel_pstate driver (Srinivas
Pandruvada).
- Fix problems related to the config TDP feature and to the validity
of the MSR_HWP_INTERRUPT register in intel_pstate (Jan Kiszka,
Srinivas Pandruvada).
- Make intel_pstate update the cpu_frequency tracepoint even if the
frequency doesn't change to avoid confusing powertop (Rafael
Wysocki).
- Clean up the usage of __init/__initdata in intel_pstate, mark some
of its internal variables as __read_mostly and drop an unused
structure element from it (Jisheng Zhang, Carsten Emde).
- Clean up the usage of some duplicate MSR symbols in intel_pstate
and turbostat (Srinivas Pandruvada).
- Update/fix the powernv, s3c24xx and mvebu cpufreq drivers (Akshay
Adiga, Viresh Kumar, Ben Dooks).
- Fix a regression (introduced during the 4.5 cycle) in the
pcc-cpufreq driver by reverting the problematic commit (Andreas
Herrmann).
- Add support for Intel Denverton to intel_idle, clean up Broxton
support in it and make it explicitly non-modular (Jacob Pan, Jan
Beulich, Paul Gortmaker).
- Add support for Denverton and Ivy Bridge server to the Intel RAPL
power capping driver and make it more careful about the handing of
MSRs that may not be present (Jacob Pan, Xiaolong Wang).
- Fix resume from hibernation on x86-64 by making the CPU offline
during resume avoid using MONITOR/MWAIT in the "play dead" loop
which may lead to an inadvertent "revival" of a "dead" CPU and a
page fault leading to a kernel crash from it (Rafael Wysocki).
- Make memory management during resume from hibernation more
straightforward (Rafael Wysocki).
- Add debug features that should help to detect problems related to
hibernation and resume from it (Rafael Wysocki, Chen Yu).
- Clean up hibernation core somewhat (Rafael Wysocki).
- Prevent KASAN from instrumenting the hibernation core which leads
to large numbers of false-positives from it (James Morse).
- Prevent PM (hibernate and suspend) notifiers from being called
during the cleanup phase if they have not been called during the
corresponding preparation phase which is possible if one of the
other notifiers returns an error at that time (Lianwei Wang).
- Improve suspend-related debug printout in the tasks freezer and
clean up suspend-related console handling (Roger Lu, Borislav
Petkov).
- Update the AnalyzeSuspend script in the kernel sources to version
4.2 (Todd Brandt).
- Modify the generic power domains framework to make it handle system
suspend/resume better (Ulf Hansson).
- Make the runtime PM framework avoid resuming devices synchronously
when user space changes the runtime PM settings for them and
improve its error reporting (Rafael Wysocki, Linus Walleij).
- Fix error paths in devfreq drivers (exynos, exynos-ppmu,
exynos-bus) and in the core, make some devfreq code explicitly
non-modular and change some of it into tristate (Bartlomiej
Zolnierkiewicz, Peter Chen, Paul Gortmaker).
- Add DT support to the generic PM clocks management code and make it
export some more symbols (Jon Hunter, Paul Gortmaker).
- Make the PCI PM core code slightly more robust against possible
driver errors (Andy Shevchenko).
- Make it possible to change DESTDIR and PREFIX in turbostat (Andy
Shevchenko)"
* tag 'pm-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (89 commits)
Revert "cpufreq: pcc-cpufreq: update default value of cpuinfo_transition_latency"
PM / hibernate: Introduce test_resume mode for hibernation
cpufreq: export cpufreq_driver_resolve_freq()
cpufreq: Disallow ->resolve_freq() for drivers providing ->target_index()
PCI / PM: check all fields in pci_set_platform_pm()
cpufreq: acpi-cpufreq: use cached frequency mapping when possible
cpufreq: schedutil: map raw required frequency to driver frequency
cpufreq: add cpufreq_driver_resolve_freq()
cpufreq: intel_pstate: Check cpuid for MSR_HWP_INTERRUPT
intel_pstate: Update cpu_frequency tracepoint every time
cpufreq: intel_pstate: clean remnant struct element
PM / tools: scripts: AnalyzeSuspend v4.2
x86 / hibernate: Use hlt_play_dead() when resuming from hibernation
cpufreq: powernv: Replacing pstate_id with frequency table index
intel_pstate: Fix MSR_CONFIG_TDP_x addressing in core_get_max_pstate()
PM / hibernate: Image data protection during restoration
PM / hibernate: Add missing braces in __register_nosave_region()
PM / hibernate: Clean up comments in snapshot.c
PM / hibernate: Clean up function headers in snapshot.c
PM / hibernate: Add missing braces in hibernate_setup()
...
test_resume mode is to verify if the snapshot data
written to swap device can be successfully restored
to memory. It is useful to ease the debugging process
on hibernation, since this mode can not only bypass
the BIOSes/bootloader, but also the system re-initialization.
To avoid the risk to break the filesystm on persistent storage,
this patch resumes the image with tasks frozen.
For example:
echo test_resume > /sys/power/disk
echo disk > /sys/power/state
[ 187.306470] PM: Image saving progress: 70%
[ 187.395298] PM: Image saving progress: 80%
[ 187.476697] PM: Image saving progress: 90%
[ 187.554641] PM: Image saving done.
[ 187.558896] PM: Wrote 594600 kbytes in 0.90 seconds (660.66 MB/s)
[ 187.566000] PM: S|
[ 187.589742] PM: Basic memory bitmaps freed
[ 187.594694] PM: Checking hibernation image
[ 187.599865] PM: Image signature found, resuming
[ 187.605209] PM: Loading hibernation image.
[ 187.665753] PM: Basic memory bitmaps created
[ 187.691397] PM: Using 3 thread(s) for decompression.
[ 187.691397] PM: Loading and decompressing image data (148650 pages)...
[ 187.889719] PM: Image loading progress: 0%
[ 188.100452] PM: Image loading progress: 10%
[ 188.244781] PM: Image loading progress: 20%
[ 189.057305] PM: Image loading done.
[ 189.068793] PM: Image successfully loaded
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
On Intel hardware, native_play_dead() uses mwait_play_dead() by
default and only falls back to the other methods if that fails.
That also happens during resume from hibernation, when the restore
(boot) kernel runs disable_nonboot_cpus() to take all of the CPUs
except for the boot one offline.
However, that is problematic, because the address passed to
__monitor() in mwait_play_dead() is likely to be written to in the
last phase of hibernate image restoration and that causes the "dead"
CPU to start executing instructions again. Unfortunately, the page
containing the address in that CPU's instruction pointer may not be
valid any more at that point.
First, that page may have been overwritten with image kernel memory
contents already, so the instructions the CPU attempts to execute may
simply be invalid. Second, the page tables previously used by that
CPU may have been overwritten by image kernel memory contents, so the
address in its instruction pointer is impossible to resolve then.
A report from Varun Koyyalagunta and investigation carried out by
Chen Yu show that the latter sometimes happens in practice.
To prevent it from happening, temporarily change the smp_ops.play_dead
pointer during resume from hibernation so that it points to a special
"play dead" routine which uses hlt_play_dead() and avoids the
inadvertent "revivals" of "dead" CPUs this way.
A slightly unpleasant consequence of this change is that if the
system is hibernated with one or more CPUs offline, it will generally
draw more power after resume than it did before hibernation, because
the physical state entered by CPUs via hlt_play_dead() is higher-power
than the mwait_play_dead() one in the majority of cases. It is
possible to work around this, but it is unclear how much of a problem
that's going to be in practice, so the workaround will be implemented
later if it turns out to be necessary.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=106371
Reported-by: Varun Koyyalagunta <cpudebug@centtech.com>
Original-by: Chen Yu <yu.c.chen@intel.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Make it possible to protect all pages holding image data during
hibernate image restoration by setting them read-only (so as to
catch attempts to write to those pages after image data have been
stored in them).
This adds overhead to image restoration code (it may cause large
page mappings to be split as a result of page flags changes) and
the errors it protects against should never happen in theory, so
the feature is only active after passing hibernate=protect_image
to the command line of the restore kernel.
Also it only is built if CONFIG_DEBUG_RODATA is set.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Make hibernate_setup() follow the coding style more closely by adding
some missing braces to the if () statement in it.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This makes pm notifier PREPARE/POST symmetrical: if PREPARE
fails, we will only undo what ever happened on PREPARE.
It fixes the unbalanced CPU hotplug enable in CPU PM notifier.
Signed-off-by: Lianwei Wang <lianwei.wang@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
With the following fix:
70595b479ce1 ("x86/power/64: Fix crash whan the hibernation code passes control to the image kernel")
... there is no longer a problem with hibernation resuming a
KASLR-booted kernel image, so remove the restriction.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux PM list <linux-pm@vger.kernel.org>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/20160613221002.GA29719@www.outflux.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When suspending to RAM, waking up and later suspending to disk,
we gratuitously runtime resume devices after the thaw phase.
This does not occur if we always suspend to RAM or always to disk.
pm_complete_with_resume_check(), which gets called from
pci_pm_complete() among others, schedules a runtime resume
if PM_SUSPEND_FLAG_FW_RESUME is set. The flag is set during
a suspend-to-RAM cycle. It is cleared at the beginning of
the suspend-to-RAM cycle but not afterwards and it is not
cleared during a suspend-to-disk cycle at all. Fix it.
Fixes: ef25ba0476 (PM / sleep: Add flags to indicate platform firmware involvement)
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: 4.4+ <stable@vger.kernel.org> # 4.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
By default, page poisoning uses a poison value (0xaa) on free. If this
is changed to 0, the page is not only sanitized but zeroing on alloc
with __GFP_ZERO can be skipped as well. The tradeoff is that detecting
corruption from the poisoning is harder to detect. This feature also
cannot be used with hibernation since pages are not guaranteed to be
zeroed after hibernation.
Credit to Grsecurity/PaX team for inspiring this work
Signed-off-by: Laura Abbott <labbott@fedoraproject.org>
Acked-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Jianyu Zhan <nasa4836@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Just fix a typo in a function name in kerneldoc comments.
Signed-off-by: Geliang Tang <geliangtang@163.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When disable_nonboot_cpus() fails on some cpu it doesn't bring back all
cpus it managed to offline, a consequent call to enable_nonboot_cpus() is
expected. In hibernation_platform_enter() we don't call
enable_nonboot_cpus() on error so cpus stay offlined.
create_image() and resume_target_kernel() functions handle
disable_nonboot_cpus() faults correctly, hibernation_platform_enter()
is the only one which is doing it wrong.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch migrates swsusp_show_speed and its callers to using ktime_t instead
of 'struct timeval' which suffers from the y2038 problem.
Changes to swsusp_show_speed:
- use ktime_t for start and stop times
- pass start and stop times by value
Calling functions affected:
- load_image
- load_image_lzo
- save_image
- save_image_lzo
- hibernate_preallocate_memory
Design decisions:
- use ktime_t to preserve same granularity of reporting as before
- use centisecs logic as before to avoid 'div by zero' issues caused by
using seconds and nanoseconds directly
- use monotonic time (ktime_get()) since we only care about elapsed time.
Signed-off-by: Tina Ruchandani <ruchandani.tina@gmail.com>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If a device's dev_pm_ops::freeze callback fails during the QUIESCE
phase, we don't rollback things correctly calling the thaw and complete
callbacks. This could leave some devices in a suspended state in case of
an error during resuming from hibernation.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
ftrace_stop() and ftrace_start() were added to the suspend and hibernate
process because there was some function within the work flow that caused
the system to reboot if it was traced. This function has recently been
found (restore_processor_state()). Now there's no reason to disable
function tracing while we are going into suspend or hibernate, which means
that being able to trace this will help tremendously in debugging any
issues with suspend or hibernate.
This also means that the ftrace_stop/start() functions can be removed
and simplify the function tracing code a bit.
Link: http://lkml.kernel.org/r/1518201.VD9cU33jRU@vostro.rjw.lan
Acked-by: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Changes kASLR from being compile-time selectable (blocked by
CONFIG_HIBERNATION), to being boot-time selectable (with hibernation
available by default) via the "kaslr" kernel command line.
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
To support using kernel features that are not compatible with hibernation,
this creates the "nohibernate" kernel boot parameter to disable both
hibernation and resume. This allows hibernation support to be a boot-time
choice instead of only a compile-time choice.
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Adds trace events that give finer resolution into suspend/resume. These
events are graphed in the timelines generated by the analyze_suspend.py
script. They represent large areas of time consumed that are typical to
suspend and resume.
The event is triggered by calling the function "trace_suspend_resume"
with three arguments: a string (the name of the event to be displayed
in the timeline), an integer (case specific number, such as the power
state or cpu number), and a boolean (where true is used to denote the start
of the timeline event, and false to denote the end).
The suspend_resume trace event reproduces the data that the machine_suspend
trace event did, so the latter has been removed.
Signed-off-by: Todd Brandt <todd.e.brandt@intel.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In the original code "resume_delay" is an int so on 64 bits, the call to
kstrtoul() will cause memory corruption. We may as well fix a style
issue here as well and make "resume_delay" unsigned int, since that's
what we pass to ssleep().
Fixes: 317cf7e5e8 (PM / hibernate: convert simple_strtoul to kstrtoul)
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Replace obsolete function.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reboot logic in kernel/reboot will avoid calling kernel_power_off
when pm_power_off is null, and instead uses kernel_halt. Change
hibernate's power_down to follow the behavior in the reboot call.
Calling the notifier twice (once for SYS_POWER_OFF and again for
SYS_HALT) causes a panic during hibernation on Kirkwood
Openblocks A6 board.
Signed-off-by: Sebastian Capella <sebastian.capella@linaro.org>
Reported-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Reviewed-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
do_div() needs 'u64' type, or it reports warning. And negative number
is meaningless for "speed", so change all signed to unsigned within
swsusp_show_speed().
The related warning (with allmodconfig for unicore32):
CC kernel/power/hibernate.o
kernel/power/hibernate.c: In function ‘swsusp_show_speed’:
kernel/power/hibernate.c:237: warning: comparison of distinct pointer types lacks a cast
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
[rjw: Subject]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Use the name_to_dev_t call to parse the device name echo'd to
to /sys/power/resume. This imitates the method used in hibernate.c
in software_resume, and allows the resume partition to be specified
using other equivalent device formats as well. By allowing
/sys/debug/resume to accept the same syntax as the resume=device
parameter, we can parse the resume=device in the init script and
use the resume device directly from the kernel command line.
Signed-off-by: Sebastian Capella <sebastian.capella@linaro.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Since create_image() only executes platform_leave() if in_suspend is
not set, enable_nonboot_cpus() is run by it with EC transactions
blocked (on ACPI systems) in the image creation code path (that is,
for in_suspend set), which may cause CPU online to fail for the CPUs
in question. In particular, this causes the acpi_cpufreq driver's
initialization to fail for those CPUs on some systems with the
following dmesg:
cpufreq: adding CPU 1
acpi_cpufreq_cpu_init
cpufreq: FREQ: 1401000 - CPU: 0
ACPI Exception: AE_BAD_PARAMETER, Returned by Handler for [EmbeddedControl] (20130725/evregion-287)
ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPC_.EC__.LPMD] (Node ffff88023249ab28), AE_BAD_PARAMETER (20130725/psparse-536)
ACPI Error: Method parse/execution failed [\_PR_.CPU0._PPC] (Node ffff88023270e3f8), AE_BAD_PARAMETER (20130725/psparse-536)
ACPI Error: Method parse/execution failed [\_PR_.CPU1._PPC] (Node ffff88023270e290), AE_BAD_PARAMETER (20130725/psparse-536)
ACPI Exception: AE_BAD_PARAMETER, Evaluating _PPC (20130725/processor_perflib-140)
cpufreq: initialization failed
CPU1 is up
To fix this problem, modify create_image() to execute platform_leave()
unconditionally. [rjw: This shouldn't lead to any significant side
effects on ACPI systems.]
Signed-off-by: Bjørn Mork <bjorn@mork.no>
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
To support the ability to implement PM hibernation code as modules
the hibernation_set_ops function requires to be exported.
Similar solution already available for suspend_set_ops
(please refer to commit a5e4fd8783).
Signed-off-by: Leonardo Potenza <leonardo.potenza@intel.com>
Signed-off-by: Edwin Verplanke <edwin.verplanke@intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
software_resume is being called after deferred_probe_initcall in
drivers base. If the probing of the device that contains the resume
image is deferred, and the system has been instructed to wait for
it to show up, this wait will occur in software_resume. This causes
a deadlock.
Move software_resume into late_initcall_sync so that it happens
after all the other late_initcalls.
Signed-off-by: Russ Dill <Russ.Dill@ti.com>
Acked-by: Pavel Machek <Pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
1) ACPI-based PCI hotplug (ACPIPHP) fixes related to spurious events
After the recent ACPIPHP changes we've seen some interesting breakage
on a system that triggers device check notifications during boot for
non-existing devices. Although those notifications are really
spurious, we should be able to deal with them nevertheless and that
shouldn't introduce too much overhead. Four commits to make that
work properly.
2) Memory hotplug and hibernation mutual exclusion rework
This was maent to be a cleanup, but it happens to fix a classical
ABBA deadlock between system suspend/hibernation and ACPI memory
hotplug which is possible if they are started roughly at the same
time. Three commits rework memory hotplug so that it doesn't
acquire pm_mutex and make hibernation use device_hotplug_lock
which prevents it from racing with memory hotplug.
3) ACPI Intel LPSS (Low-Power Subsystem) driver crash fix
The ACPI LPSS driver crashes during boot on Apple Macbook Air with
Haswell that has slightly unusual BIOS configuration in which one
of the LPSS device's _CRS method doesn't return all of the information
expected by the driver. Fix from Mika Westerberg, for stable.
4) ACPICA fix related to Store->ArgX operation
AML interpreter fix for obscure breakage that causes AML to be
executed incorrectly on some machines (observed in practice). From
Bob Moore.
5) ACPI core fix for PCI ACPI device objects lookup
There still are cases in which there is more than one ACPI device
object matching a given PCI device and we don't choose the one that
the BIOS expects us to choose, so this makes the lookup take more
criteria into account in those cases.
6) Fix to prevent cpuidle from crashing in some rare cases
If the result of cpuidle_get_driver() is NULL, which can happen on
some systems, cpuidle_driver_ref() will crash trying to use that
pointer and the Daniel Fu's fix prevents that from happening.
7) cpufreq fixes related to CPU hotplug
Stephen Boyd reported a number of concurrency problems with cpufreq
related to CPU hotplug which are addressed by a series of fixes
from Srivatsa S Bhat and Viresh Kumar.
8) cpufreq fix for time conversion in time_in_state attribute
Time conversion carried out by cpufreq when user space attempts to
read /sys/devices/system/cpu/cpu*/cpufreq/stats/time_in_state won't
work correcty if cputime_t doesn't map directly to jiffies. Fix
from Andreas Schwab.
9) Revert of a troublesome cpufreq commit
Commit 7c30ed5 (cpufreq: make sure frequency transitions are
serialized) was intended to address some known concurrency problems
in cpufreq related to the ordering of transitions, but unfortunately
it introduced several problems of its own, so I decided to revert it
now and address the original problems later in a more robust way.
10) Intel Haswell CPU models for intel_pstate from Nell Hardcastle.
11) cpufreq fixes related to system suspend/resume
The recent cpufreq changes that made it preserve CPU sysfs attributes
over suspend/resume cycles introduced a possible NULL pointer
dereference that caused it to crash during the second attempt to
suspend. Three commits from Srivatsa S Bhat fix that problem and a
couple of related issues.
12) cpufreq locking fix
cpufreq_policy_restore() should acquire the lock for reading, but
it acquires it for writing. Fix from Lan Tianyu.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABCAAGBQJSMbdRAAoJEKhOf7ml8uNsiFkQAKSh1iBXuiUCxBApEGZgoQio
8lmnuyWdhNQWdjZTnh7ptjpDxdrWhxcoxvoaGABU++reDObjef1QnyrQtdO3r8dl
oy0C/YGh5kq5SIffIDEwPIb/ipDe/47cgRMW8iBlnViDa1MJBqICuLyefcTRIrKp
QGvv0owUM2o7TXpA10+qm8zXjv6m5mu1DTtxYI+2Eodhwi54neAqb+aKMspa2thy
V9KFcVv3Td4rJrNvw6BhXNM81QbaYpRxaK3DRr1T6SM++EKvbqYFA1jgW24YvqTL
nrCZlDMb6KRww5DCxA/ns9Kx5H+ZyicoRwdtAM3PBYA6MGqsLqPozC/8VKV1fSvZ
sgUdbUSuLqKRAkOqM1bjKAhi9PdCGBvkQAg2AqbRK6IBl4HJC8xhdb5E6eZ/J42G
GyNBpKef7wVJwYKXE2hSChZ5dYjqMizNHWxFHf8Xy1dveExbQ2nmSJmaWMy2A3kx
YOXFkcTV5F6GOIZB8WCRruzUalff9xal4G+iVhGF+AZIOCm7bC+FDXfwIS82uVor
ej2l+uQLLZCB499IRmM6942ZIAXshmtN7eRfGtKBc6jsbSCEdQDqf1Z7oRwqAD6h
WkD/k/zz30CyM8y4snOkAXkZgqAQsZodtqfowE3e9OHd51tfcNiqdht+obwCx+eD
MWXc2xATMAX6NcZTXSZS
=U/Jw
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-fixes-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from Rafael Wysocki:
"All of these commits are fixes that have emerged recently and some of
them fix bugs introduced during this merge window.
Specifics:
1) ACPI-based PCI hotplug (ACPIPHP) fixes related to spurious events
After the recent ACPIPHP changes we've seen some interesting
breakage on a system that triggers device check notifications
during boot for non-existing devices. Although those
notifications are really spurious, we should be able to deal with
them nevertheless and that shouldn't introduce too much overhead.
Four commits to make that work properly.
2) Memory hotplug and hibernation mutual exclusion rework
This was maent to be a cleanup, but it happens to fix a classical
ABBA deadlock between system suspend/hibernation and ACPI memory
hotplug which is possible if they are started roughly at the same
time. Three commits rework memory hotplug so that it doesn't
acquire pm_mutex and make hibernation use device_hotplug_lock
which prevents it from racing with memory hotplug.
3) ACPI Intel LPSS (Low-Power Subsystem) driver crash fix
The ACPI LPSS driver crashes during boot on Apple Macbook Air with
Haswell that has slightly unusual BIOS configuration in which one
of the LPSS device's _CRS method doesn't return all of the
information expected by the driver. Fix from Mika Westerberg, for
stable.
4) ACPICA fix related to Store->ArgX operation
AML interpreter fix for obscure breakage that causes AML to be
executed incorrectly on some machines (observed in practice).
From Bob Moore.
5) ACPI core fix for PCI ACPI device objects lookup
There still are cases in which there is more than one ACPI device
object matching a given PCI device and we don't choose the one
that the BIOS expects us to choose, so this makes the lookup take
more criteria into account in those cases.
6) Fix to prevent cpuidle from crashing in some rare cases
If the result of cpuidle_get_driver() is NULL, which can happen on
some systems, cpuidle_driver_ref() will crash trying to use that
pointer and the Daniel Fu's fix prevents that from happening.
7) cpufreq fixes related to CPU hotplug
Stephen Boyd reported a number of concurrency problems with
cpufreq related to CPU hotplug which are addressed by a series of
fixes from Srivatsa S Bhat and Viresh Kumar.
8) cpufreq fix for time conversion in time_in_state attribute
Time conversion carried out by cpufreq when user space attempts to
read /sys/devices/system/cpu/cpu*/cpufreq/stats/time_in_state
won't work correcty if cputime_t doesn't map directly to jiffies.
Fix from Andreas Schwab.
9) Revert of a troublesome cpufreq commit
Commit 7c30ed5 (cpufreq: make sure frequency transitions are
serialized) was intended to address some known concurrency
problems in cpufreq related to the ordering of transitions, but
unfortunately it introduced several problems of its own, so I
decided to revert it now and address the original problems later
in a more robust way.
10) Intel Haswell CPU models for intel_pstate from Nell Hardcastle.
11) cpufreq fixes related to system suspend/resume
The recent cpufreq changes that made it preserve CPU sysfs
attributes over suspend/resume cycles introduced a possible NULL
pointer dereference that caused it to crash during the second
attempt to suspend. Three commits from Srivatsa S Bhat fix that
problem and a couple of related issues.
12) cpufreq locking fix
cpufreq_policy_restore() should acquire the lock for reading, but
it acquires it for writing. Fix from Lan Tianyu"
* tag 'pm+acpi-fixes-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (25 commits)
cpufreq: Acquire the lock in cpufreq_policy_restore() for reading
cpufreq: Prevent problems in update_policy_cpu() if last_cpu == new_cpu
cpufreq: Restructure if/else block to avoid unintended behavior
cpufreq: Fix crash in cpufreq-stats during suspend/resume
intel_pstate: Add Haswell CPU models
Revert "cpufreq: make sure frequency transitions are serialized"
cpufreq: Use signed type for 'ret' variable, to store negative error values
cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writes
cpufreq: Synchronize the cpufreq store_*() routines with CPU hotplug
cpufreq: Invoke __cpufreq_remove_dev_finish() after releasing cpu_hotplug.lock
cpufreq: Split __cpufreq_remove_dev() into two parts
cpufreq: Fix wrong time unit conversion
cpufreq: serialize calls to __cpufreq_governor()
cpufreq: don't allow governor limits to be changed when it is disabled
ACPI / bind: Prefer device objects with _STA to those without it
ACPI / hotplug / PCI: Avoid parent bus rescans on spurious device checks
ACPI / hotplug / PCI: Use _OST to notify firmware about notify status
ACPI / hotplug / PCI: Avoid doing too much for spurious notifies
ACPICA: Fix for a Store->ArgX when ArgX contains a reference to a field.
ACPI / hotplug / PCI: Don't trim devices before scanning the namespace
...
Since all of the memory hotplug operations have to be carried out
under device_hotplug_lock, they won't need to acquire pm_mutex if
device_hotplug_lock is held around hibernation.
For this reason, make the hibernation code acquire
device_hotplug_lock after freezing user space processes and
release it before thawing them. At the same tim drop the
lock_system_sleep() and unlock_system_sleep() calls from
lock_memory_hotplug() and unlock_memory_hotplug(), respectively.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
The hibernation core uses special memory bitmaps during image
creation and restoration and traditionally those bitmaps are
allocated before freezing tasks, because in the past GFP_KERNEL
allocations might not work after all tasks had been frozen.
However, this is an anachronism, because hibernation_snapshot()
now calls hibernate_preallocate_memory() which allocates memory
for the image upfront anyway, so the memory bitmaps may be
allocated after freezing user space safely.
For this reason, move all of the create_basic_memory_bitmaps()
calls after freeze_processes() and all of the corresponding
free_basic_memory_bitmaps() calls before thaw_processes().
This will allow us to hold device_hotplug_lock around hibernation
without the need to worry about freezing issues with user space
processes attempting to acquire it via sysfs attributes after the
creation of memory bitmaps and before the freezing of tasks.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
* ACPI conversion to PM handling based on struct dev_pm_ops.
* Conversion of a number of platform drivers to PM handling based on struct
dev_pm_ops and removal of empty legacy PM callbacks from a couple of PCI
drivers.
* Suspend-to-both for in-kernel hibernation from Bojan Smojver.
* cpuidle fixes and cleanups from ShuoX Liu, Daniel Lezcano and Preeti U Murthy.
* cpufreq bug fixes from Jonghwa Lee and Stephen Boyd.
* Suspend and hibernate fixes from Srivatsa S. Bhat and Colin Cross.
* Generic PM domains framework updates.
* RTC CMOS wakeup signaling update from Paul Fox.
* sparse warnings fixes from Sachin Kamat.
* Build warnings fixes for the generic PM domains framework and PM sysfs code.
* sysfs switch for printing device suspend times from Sameer Nanda.
* Documentation fix from Oskar Schirmer.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iQIcBAABAgAGBQJQDF5eAAoJEKhOf7ml8uNsEaAP/2wg4faoOGob5A0/7tLqG3Cw
xnTmGsfL7wG07Q8ykCL1BSlBb1VeJz8L6LTmUpaABI4M//oIBlcYQKyCE0Tat1AO
9bJXFzK7qcHMhkTz6d6LDqtVzR3NGM3ypjZqj8aEXBov07LMR1AXvgNwXXhv25zM
0unwrh1XNinBN3n+oaktpWk1YHUjsa5IMU+2tQJrocuHXcgK30vGXZVrZ4g9w1c2
eS+ED1oKUqOYtFzIUX+aCtaDDheGaPlugk/GOtIB7Sae0s0vMlxH/T5ncB4SxRC+
v3s4OykqQc5Dc8+0bNlBH7ykSVNB0PoQiyKDY67CxtH+q1xQSc9/f3XJqnGMaVDE
17eZUZsL4qSyzRuCbPCGAgwBHmx3qNCMu1i1BcmnSxU+ikPUeCR7mYOP0mRThwPH
OSfs+c/vZ+Ow6CwVE4UFrbm9Jve7ADnCrlZzT2m6XjhHGyjKP7SJlzP9TPsZ0LRk
oxgQDYHmxbo50t9tBCz5L4ZTMKkDp28e78x84/CteP85srcW3GqDxrPyp2uzJu5O
tvIEBvVlc4ucq8sG83RkugQwrG/2cQwG2HO9ERAwq01HHA1BYsuU3A961Jqf5CZo
nFRSnByvVj/imPf47OWpDPAbVEs7jxufJuLEbPwGj1MkttTGDBIRu3zldXt2S6kP
Q4qYU6fDaQQHFc90pqxQ
=vC4/
-----END PGP SIGNATURE-----
Merge tag 'pm-for-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
- ACPI conversion to PM handling based on struct dev_pm_ops.
- Conversion of a number of platform drivers to PM handling based on
struct dev_pm_ops and removal of empty legacy PM callbacks from a
couple of PCI drivers.
- Suspend-to-both for in-kernel hibernation from Bojan Smojver.
- cpuidle fixes and cleanups from ShuoX Liu, Daniel Lezcano and Preeti
Murthy.
- cpufreq bug fixes from Jonghwa Lee and Stephen Boyd.
- Suspend and hibernate fixes from Srivatsa Bhat and Colin Cross.
- Generic PM domains framework updates.
- RTC CMOS wakeup signaling update from Paul Fox.
- sparse warnings fixes from Sachin Kamat.
- Build warnings fixes for the generic PM domains framework and PM
sysfs code.
- sysfs switch for printing device suspend times from Sameer Nanda.
- Documentation fix from Oskar Schirmer.
* tag 'pm-for-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (70 commits)
cpufreq: Fix sysfs deadlock with concurrent hotplug/frequency switch
EXYNOS: bugfix on retrieving old_index from freqs.old
PM / Sleep: call early resume handlers when suspend_noirq fails
PM / QoS: Use NULL pointer instead of plain integer in qos.c
PM / QoS: Use NULL pointer instead of plain integer in pm_qos.h
PM / Sleep: Require CAP_BLOCK_SUSPEND to use wake_lock/wake_unlock
PM / Sleep: Add missing static storage class specifiers in main.c
cpuilde / ACPI: remove time from acpi_processor_cx structure
cpuidle / ACPI: remove usage from acpi_processor_cx structure
cpuidle / ACPI : remove latency_ticks from acpi_processor_cx structure
rtc-cmos: report wakeups from interrupt handler
PM / Sleep: Fix build warning in sysfs.c for CONFIG_PM_SLEEP unset
PM / Domains: Fix build warning for CONFIG_PM_RUNTIME unset
olpc-xo15-sci: Use struct dev_pm_ops for power management
PM / Domains: Replace plain integer with NULL pointer in domain.c file
PM / Domains: Add missing static storage class specifier in domain.c file
PM / crypto / ux500: Use struct dev_pm_ops for power management
PM / IPMI: Remove empty legacy PCI PM callbacks
tpm_nsc: Use struct dev_pm_ops for power management
tpm_tis: Use struct dev_pm_ops for power management
...
Commit a7a20d1039 ("sd: limit the scope of the async probe domain")
make the SCSI device probing run device discovery in it's own async
domain.
However, as a result, the partition detection was no longer synchronized
by async_synchronize_full() (which, despite the name, only synchronizes
the global async space, not all of them). Which in turn meant that
"wait_for_device_probe()" would not wait for the SCSI partitions to be
parsed.
And "wait_for_device_probe()" was what the boot time init code relied on
for mounting the root filesystem.
Now, most people never noticed this, because not only is it
timing-dependent, but modern distributions all use initrd. So the root
filesystem isn't actually on a disk at all. And then before they
actually mount the final disk filesystem, they will have loaded the
scsi-wait-scan module, which not only does the expected
wait_for_device_probe(), but also does scsi_complete_async_scans().
[ Side note: scsi_complete_async_scans() had also been partially broken,
but that was fixed in commit 43a8d39d01 ("fix async probe
regression"), so that same commit a7a20d1039 had actually broken
setups even if you used scsi-wait-scan explicitly ]
Solve this problem by just moving the scsi_complete_async_scans() call
into wait_for_device_probe(). Everybody who wants to wait for device
probing to finish really wants the SCSI probing to complete, so there's
no reason not to do this.
So now "wait_for_device_probe()" really does what the name implies, and
properly waits for device probing to finish. This also removes the now
unnecessary extra calls to scsi_complete_async_scans().
Reported-and-tested-by: Artem S. Tashkinov <t.artem@mailcity.com>
Cc: Dan Williams <dan.j.williams@gmail.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: James Bottomley <jbottomley@parallels.com>
Cc: Borislav Petkov <bp@amd64.org>
Cc: linux-scsi <linux-scsi@vger.kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If function tracing is enabled for some of the low-level suspend/resume
functions, it leads to triple fault during resume from suspend, ultimately
ending up in a reboot instead of a resume (or a total refusal to come out
of suspended state, on some machines).
This issue was explained in more detail in commit f42ac38c59 (ftrace:
disable tracing for suspend to ram). However, the changes made by that commit
got reverted by commit cbe2f5a6e8 (tracing: allow tracing of
suspend/resume & hibernation code again). So, unfortunately since things are
not yet robust enough to allow tracing of low-level suspend/resume functions,
suspend/resume is still broken when ftrace is enabled.
So fix this by disabling function tracing during suspend/resume & hibernation.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
It is often useful to suspend to memory after hibernation image has been
written to disk. If the battery runs out or power is otherwise lost, the
computer will resume from the hibernated image. If not, it will resume
from memory and hibernation image will be discarded.
Signed-off-by: Bojan Smojver <bojan@rexursive.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Sometimes resume= parameter comes in integer style (e.g. major:minor)
and then name_to_dev_t can not detect partition properly. (especially
async device like usb, mmc).
This patch calls get_gendisk() if resumewait is true and resume_file
is in integer format to work around this problem.
Signed-off-by: Minho Ban <mhban@samsung.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
The core suspend/hibernation code calls usermodehelper_disable() to
avoid race conditions between the freezer and the starting of
usermode helpers and each code path has to do that on its own.
However, it is always called right before freeze_processes()
and usermodehelper_enable() is always called right after
thaw_processes(). For this reason, to avoid code duplication and
to make the connection between usermodehelper_disable() and the
freezer more visible, make freeze_processes() call it and remove the
direct usermodehelper_disable() and usermodehelper_enable() calls
from all suspend/hibernation code paths.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable@vger.kernel.org
There is no reason to call usermodehelper_disable() before creating
memory bitmaps in hibernate() and software_resume(), so call it right
before freeze_processes(), in accordance with the other suspend and
hibernation code. Consequently, call usermodehelper_enable() right
after the thawing of tasks rather than after freeing the memory
bitmaps.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable@vger.kernel.org
If create_basic_memory_bitmaps() fails, usermodehelpers are not re-enabled
before returning. Fix this. And while at it, reword the goto labels so that
they look more meaningful.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
The code related to 'freezer_test_done' is needlessly convoluted.
Refactor the code and simplify the implementation.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
In the hibernation call path, the kernel threads are frozen inside
hibernation_snapshot(). If we happen to encounter an error further down
the road or if we are exiting early due to a successful freezer test,
then thaw kernel threads before returning to the caller.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
The current device suspend/resume phases during system-wide power
transitions appear to be insufficient for some platforms that want
to use the same callback routines for saving device states and
related operations during runtime suspend/resume as well as during
system suspend/resume. In principle, they could point their
.suspend_noirq() and .resume_noirq() to the same callback routines
as their .runtime_suspend() and .runtime_resume(), respectively,
but at least some of them require device interrupts to be enabled
while the code in those routines is running.
It also makes sense to have device suspend-resume callbacks that will
be executed with runtime PM disabled and with device interrupts
enabled in case someone needs to run some special code in that
context during system-wide power transitions.
Apart from this, .suspend_noirq() and .resume_noirq() were introduced
as a workaround for drivers using shared interrupts and failing to
prevent their interrupt handlers from accessing suspended hardware.
It appears to be better not to use them for other porposes, or we may
have to deal with some serious confusion (which seems to be happening
already).
For the above reasons, introduce new device suspend/resume phases,
"late suspend" and "early resume" (and analogously for hibernation)
whose callback will be executed with runtime PM disabled and with
device interrupts enabled and whose callback pointers generally may
point to runtime suspend/resume routines.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reviewed-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Using [un]lock_system_sleep() is safer than directly using mutex_[un]lock()
on 'pm_mutex', since the latter could lead to freezing failures. Hence convert
all the present users of mutex_[un]lock(&pm_mutex) to use these safe APIs
instead.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* pm-freezer: (26 commits)
Freezer / sunrpc / NFS: don't allow TASK_KILLABLE sleeps to block the freezer
Freezer: fix more fallout from the thaw_process rename
freezer: fix wait_event_freezable/__thaw_task races
freezer: kill unused set_freezable_with_signal()
dmatest: don't use set_freezable_with_signal()
usb_storage: don't use set_freezable_with_signal()
freezer: remove unused @sig_only from freeze_task()
freezer: use lock_task_sighand() in fake_signal_wake_up()
freezer: restructure __refrigerator()
freezer: fix set_freezable[_with_signal]() race
freezer: remove should_send_signal() and update frozen()
freezer: remove now unused TIF_FREEZE
freezer: make freezing() test freeze conditions in effect instead of TIF_FREEZE
cgroup_freezer: prepare for removal of TIF_FREEZE
freezer: clean up freeze_processes() failure path
freezer: kill PF_FREEZING
freezer: test freezable conditions while holding freezer_lock
freezer: make freezing indicate freeze condition in effect
freezer: use dedicated lock instead of task_lock() + memory barrier
freezer: don't distinguish nosig tasks on thaw
...
The hibernation test modes 'test' and 'testproc' are deprecated, because
the 'pm_test' framework offers much more fine-grained control for debugging
suspend and hibernation related problems.
So, remove the deprecated 'test' and 'testproc' hibernation test modes.
Suggested-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Commit 2aede851dd (PM / Hibernate: Freeze
kernel threads after preallocating memory) moved the freezing of kernel
threads to hibernation_snapshot() function.
So now, if the call to hibernation_snapshot() returns early due to a
successful hibernation test, the caller has to thaw processes to ensure
that the system gets back to its original state.
But in SNAPSHOT_CREATE_IMAGE hibernation ioctl, the caller does not thaw
processes in case hibernation_snapshot() returned due to a successful
freezer test. Fix this issue. But note we still send the value of 'in_suspend'
(which is now 0) to userspace, because we are not in an error path per-se,
and moreover, the value of in_suspend correctly depicts the situation here.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
In the software_resume() function defined in kernel/power/hibernate.c,
if the call to create_basic_memory_bitmaps() fails, the usermodehelpers
are not enabled (which had been disabled in the previous step). Fix it.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
The goto statements in hibernation_snapshot() are a bit complex.
Refactor the code to remove some of them, thereby simplifying the
implementation.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* 'pm-freezer' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc: (24 commits)
freezer: fix wait_event_freezable/__thaw_task races
freezer: kill unused set_freezable_with_signal()
dmatest: don't use set_freezable_with_signal()
usb_storage: don't use set_freezable_with_signal()
freezer: remove unused @sig_only from freeze_task()
freezer: use lock_task_sighand() in fake_signal_wake_up()
freezer: restructure __refrigerator()
freezer: fix set_freezable[_with_signal]() race
freezer: remove should_send_signal() and update frozen()
freezer: remove now unused TIF_FREEZE
freezer: make freezing() test freeze conditions in effect instead of TIF_FREEZE
cgroup_freezer: prepare for removal of TIF_FREEZE
freezer: clean up freeze_processes() failure path
freezer: kill PF_FREEZING
freezer: test freezable conditions while holding freezer_lock
freezer: make freezing indicate freeze condition in effect
freezer: use dedicated lock instead of task_lock() + memory barrier
freezer: don't distinguish nosig tasks on thaw
freezer: remove racy clear_freeze_flag() and set PF_NOFREEZE on dead tasks
freezer: rename thaw_process() to __thaw_task() and simplify the implementation
...
The hibernation core code forgets to release memory preallocated
for hibernation if there's an error in its early stages or if test
modes causing hibernation_snapshot() to return early are used. This
causes the system to be hardly usable, because the amount of
preallocated memory is usually huge. Fix this problem.
Reported-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
freeze_processes() failure path is rather messy. Freezing is canceled
for workqueues and tasks which aren't frozen yet but frozen tasks are
left alone and should be thawed by the caller and of course some
callers (xen and kexec) didn't do it.
This patch updates __thaw_task() to handle cancelation correctly and
makes freeze_processes() and freeze_kernel_threads() call
thaw_processes() on failure instead so that the system is fully thawed
on failure. Unnecessary [suspend_]thaw_processes() calls are removed
from kernel/power/hibernate.c, suspend.c and user.c.
While at it, restructure error checking if clause in suspend_prepare()
to be less weird.
-v2: Srivatsa spotted missing removal of suspend_thaw_processes() in
suspend_prepare() and error in commit message. Updated.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Commit 2aede851dd
(PM / Hibernate: Freeze kernel threads after preallocating memory)
postponed the freezing of kernel threads to after preallocating memory
for hibernation. But while doing that, the hibernation test TEST_FREEZER
and the test mode HIBERNATION_TESTPROC were not moved accordingly.
As a result, when using these test modes, it only goes upto the freezing of
userspace and exits, when in fact it should go till the complete end of task
freezing stage, namely the freezing of kernel threads as well.
So, move these points of exit to appropriate places so that freezing of
kernel threads is also tested while using these test harnesses.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
These files were getting <linux/module.h> via an implicit non-obvious
path, but we want to crush those out of existence since they cost
time during compiles of processing thousands of lines of headers
for no reason. Give them the lightweight header that just contains
the EXPORT_SYMBOL infrastructure.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Use threads for LZO compression/decompression on hibernate/thaw.
Improve buffering on hibernate/thaw.
Calculate/verify CRC32 of the image pages on hibernate/thaw.
In my testing, this improved write/read speed by a factor of about two.
Signed-off-by: Bojan Smojver <bojan@rexursive.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Static and extern variables in kernel/power/hibernate.c need not be
initialized to 0 explicitly, so remove those initializations.
[rjw: Modified subject, added changelog.]
Signed-off-by: Barry Song <Baohua.Song@csr.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Patch "PM / Hibernate: Add resumewait param to support MMC-like
devices as resume file" added the resumewait kernel command line
option. The present patch adds resumedelay so that
resumewait/delay were analogous to rootwait/delay.
[rjw: Modified the subject and changelog slightly.]
Signed-off-by: Barry Song <baohua.song@csr.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Some devices like MMC are async detected very slow. For example,
drivers/mmc/host/sdhci.c launches a 200ms delayed work to detect
MMC partitions then add disk.
We have wait_for_device_probe() and scsi_complete_async_scans()
before calling swsusp_check(), but it is not enough to wait for MMC.
This patch adds resumewait kernel param just like rootwait so
that we have enough time to wait until MMC is ready. The difference is
that we wait for resume partition whereas rootwait waits for rootfs
partition (which may be on a different device).
This patch will make hibernation support many embedded products
without SCSI devices, but with devices like MMC.
[rjw: Modified the changelog slightly.]
Signed-off-by: Barry Song <Baohua.Song@csr.com>
Reviewed-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Fix a typo in a function name in the kerneldoc comment next to
resume_target_kernel().
[rjw: Changed the subject slightly, added the changelog.]
Signed-off-by: Barry Song <Baohua.Song@csr.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
There is a problem with the current ordering of hibernate code which
leads to deadlocks in some filesystems' memory shrinkers. Namely,
some filesystems use freezable kernel threads that are inactive when
the hibernate memory preallocation is carried out. Those same
filesystems use memory shrinkers that may be triggered by the
hibernate memory preallocation. If those memory shrinkers wait for
the frozen kernel threads, the hibernate process deadlocks (this
happens with XFS, for one example).
Apparently, it is not technically viable to redesign the filesystems
in question to avoid the situation described above, so the only
possible solution of this issue is to defer the freezing of kernel
threads until the hibernate memory preallocation is done, which is
implemented by this change.
Unfortunately, this requires the memory preallocation to be done
before the "prepare" stage of device freeze, so after this change the
only way drivers can allocate additional memory for their freeze
routines in a clean way is to use PM notifiers.
Reported-by: Christoph <cr2005@u-club.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Some of the kerneldoc comments in kernel/power/hibernate.c are
outdated and some of them don't adhere to the kernel's standards.
Update them and make them look in a consistent way.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
All architectures supporting hibernation define
arch_prepare_suspend() as an empty function, so remove it.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Some comments in the core hibernate code are outdated, some aren't
necessary any more and at least one of them is plain wrong. Remove
those comments or update them.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
If device drivers allocate substantial amounts of memory (above 1 MB)
in their hibernate .freeze() callbacks (or in their legacy suspend
callbcks during hibernation), the subsequent creation of hibernate
image may fail due to the lack of memory. This is the case, because
the drivers' .freeze() callbacks are executed after the hibernate
memory preallocation has been carried out and the preallocated amount
of memory may be too small to cover the new driver allocations.
Unfortunately, the drivers' .prepare() callbacks also are executed
after the hibernate memory preallocation has completed, so they are
not suitable for allocating additional memory either. Thus the only
way a driver can safely allocate memory during hibernation is to use
a hibernate/suspend notifier. However, the notifiers are called
before the freezing of user space and the drivers wanting to use them
for allocating additional memory may not know how much memory needs
to be allocated at that point.
To let device drivers overcome this difficulty rework the hibernation
sequence so that the memory preallocation is carried out after the
drivers' .prepare() callbacks have been executed, so that the
.prepare() callbacks can be used for allocating additional memory
to be used by the drivers' .freeze() callbacks. Update documentation
to match the new behavior of the code.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* syscore:
PM: Remove sysdev suspend, resume and shutdown operations
PM / PowerPC: Use struct syscore_ops instead of sysdevs for PM
PM / UNICORE32: Use struct syscore_ops instead of sysdevs for PM
PM / AVR32: Use struct syscore_ops instead of sysdevs for PM
PM / Blackfin: Use struct syscore_ops instead of sysdevs for PM
ARM / Samsung: Use struct syscore_ops for "core" power management
ARM / PXA: Use struct syscore_ops for "core" power management
ARM / SA1100: Use struct syscore_ops for "core" power management
ARM / Integrator: Use struct syscore_ops for core PM
ARM / OMAP: Use struct syscore_ops for "core" power management
ARM: Use struct syscore_ops instead of sysdevs for PM in common code
Martin reports that on his system hibernation occasionally fails due
to the lack of memory, because the radeon driver apparently allocates
too much of it during the device freeze stage. It turns out that the
amount of memory allocated by radeon during hibernation (and
presumably during system suspend too) depends on the utilization of
the GPU (e.g. hibernating while there are two KDE 4 sessions with
compositing enabled causes radeon to allocate more memory than for
one KDE 4 session).
In principle it should be possible to use image_size to make the
memory preallocation mechanism free enough memory for the radeon
driver, but in practice it is not easy to guess the right value
because of the way the preallocation code uses image_size. For this
reason, it seems reasonable to allow users to control the amount of
memory reserved for driver allocations made after the hibernate
preallocation, which currently is constant and amounts to 1 MB.
Introduce a new sysfs file, /sys/power/reserved_size, whose value
will be used as the amount of memory to reserve for the
post-preallocation reservations made by device drivers, in bytes.
For backwards compatibility, set its default (and initial) value to
the currently used number (1 MB).
References: https://bugzilla.kernel.org/show_bug.cgi?id=34102
Reported-and-tested-by: Martin Steigerwald <Martin@Lichtvoll.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Since suspend, resume and shutdown operations in struct sysdev_class
and struct sysdev_driver are not used any more, remove them. Also
drop sysdev_suspend(), sysdev_resume() and sysdev_shutdown() used
for executing those operations and modify all of their users
accordingly. This reduces kernel code size quite a bit and reduces
its complexity.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>