2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

317 Commits

Author SHA1 Message Date
Rafael J. Wysocki
6404367862 cpufreq: intel_pstate: Drop pointless initialization of PID parameters
The P-state selection algorithm used by intel_pstate for Atom
processors is not based on the PID controller and the initialization
of PID parametrs for those processors is pointless and confusing, so
drop it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-28 23:12:07 +02:00
Rafael J. Wysocki
e14cf8857e cpufreq: intel_pstate: Eliminate struct perf_limits
After recent changes the purpose of struct perf_limits is not
particularly clear any more and the code may be made somewhat
easier to follow by eliminating it, so go for that.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-28 23:12:07 +02:00
Rafael J. Wysocki
80b120ca1a cpufreq: intel_pstate: Avoid transient updates of cpuinfo.max_freq
Both intel_pstate_verify_policy() and intel_cpufreq_verify_policy()
set policy->cpuinfo.max_freq depending on the turbo status, but the
updates made by them are discarded by the core, because the policy
object passed to them by the core is temporary and cpuinfo.max_freq
from that object is not copied to the final policy object in
cpufreq_set_policy().

However, cpufreq_set_policy() passes the temporary policy object
to the ->setpolicy callback of the driver, so intel_pstate_set_policy()
actually sees the policy->cpuinfo.max_freq value updated by
intel_pstate_verify_policy() and not the final one.  It also
updates policy->max sometimes which basically has no effect after
it returns, because the core discards that update.

To avoid confusion, eliminate policy->cpuinfo.max_freq updates from
intel_pstate_verify_policy() and intel_cpufreq_verify_policy()
entirely and check the maximum frequency explicitly in
intel_pstate_update_perf_limits() instead of relying on the
transiently updated policy->cpuinfo.max_freq value.

Moreover, move the max->policy adjustment carried out in
intel_pstate_set_policy() to a separate function and call that
function from the ->verify driver callbacks to ensure that it will
actually be effective.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-24 03:04:32 +01:00
Rafael J. Wysocki
c5a2ee7dde cpufreq: intel_pstate: Active mode P-state limits rework
The coordination of P-state limits used by intel_pstate in the active
mode (ie. by default) is problematic, because it synchronizes all of
the limits (ie. the global ones and the per-policy ones) so as to use
one common pair of P-state limits (min and max) across all CPUs in
the system.  The drawbacks of that are as follows:

 - If P-states are coordinated in hardware, it is not necessary
   to coordinate them in software on top of that, so in that case
   all of the above activity is in vain.

 - If P-states are not coordinated in hardware, then the processor
   is actually capable of setting different P-states for different
   CPUs and coordinating them at the software level simply doesn't
   allow that capability to be utilized.

 - The coordination works in such a way that setting a per-policy
   limit (eg. scaling_max_freq) for one CPU causes the common
   effective limit to change (and it will affect all of the other
   CPUs too), but subsequent reads from the corresponding sysfs
   attributes for the other CPUs will return stale values (which
   is confusing).

 - Reads from the global P-state limit attributes, min_perf_pct and
   max_perf_pct, return the effective common values and not the last
   values set through these attributes.  However, the last values
   set through these attributes become hard limits that cannot be
   exceeded by writes to scaling_min_freq and scaling_max_freq,
   respectively, and they are not exposed, so essentially users
   have to remember what they are.

All of that is painful enough to warrant a change of the management
of P-state limits in the active mode.

To that end, redesign the active mode P-state limits management in
intel_pstate in accordance with the following rules:

 (1) All CPUs are affected by the global limits (that is, none of
     them can be requested to run faster than the global max and
     none of them can be requested to run slower than the global
     min).

 (2) Each individual CPU is affected by its own per-policy limits
     (that is, it cannot be requested to run faster than its own
     per-policy max and it cannot be requested to run slower than
     its own per-policy min).

 (3) The global and per-policy limits can be set independently.

Also, the global maximum and minimum P-state limits will be always
expressed as percentages of the maximum supported turbo P-state.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-24 03:04:31 +01:00
Rafael J. Wysocki
553953453b cpufreq: intel_pstate: Use load-based P-state selection more widely
Extend the set of systems for which intel_pstate will use the
"powersave" P-state selection algorithm based on CPU load in the
active mode by systems with ACPI preferred profile set to "tablet",
"appliance PC", "desktop", or "workstation" (ie. everything with a
specified preferred profile that is not a "server").

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-24 03:04:31 +01:00
Rafael J. Wysocki
eb5139d1a2 cpufreq: intel_pstate: Support HWP processors in all operation modes
Currently, some processors supporting HWP are only supported by
intel_pstate if HWP is actually going to be used and not supported
otherwise which is confusing.

Specifically, they are not supported if "intel_pstate=no_hwp" is
passed to the kernel in the command line or if the driver is started
in the passive mode ("intel_pstate=passive").

There is no real reason for that, because everything about those
processor is known anyway and the driver can work with them in all
modes, so make that happen, but use the load-based P-state selection
algorithm for the active mode "powersave" policy with them.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-24 03:04:31 +01:00
Rafael J. Wysocki
f1a91645b7 Merge back intel_pstate updates for 4.12. 2017-03-24 03:04:10 +01:00
Rafael J. Wysocki
64897b20ee cpufreq: intel_pstate: Fix policy data management in passive mode
The policy->cpuinfo.max_freq and policy->max updates in
intel_cpufreq_turbo_update() are excessive as they are done for no
good reason and may lead to problems in principle, so they should be
dropped.  However, after dropping them intel_cpufreq_turbo_update()
becomes almost entirely pointless, because the check made by it is
made again down the road in intel_pstate_prepare_request().  The
only thing in it that still needs to be done is the call to
update_turbo_state(), so drop intel_cpufreq_turbo_update() altogether
and make its callers invoke update_turbo_state() directly instead of
it.

In addition to that, fix intel_cpufreq_verify_policy() so that it
checks global.no_turbo in addition to global.turbo_disabled when
updating policy->cpuinfo.max_freq to make it consistent with
intel_pstate_verify_policy().

Fixes: 001c76f05b (cpufreq: intel_pstate: Generic governors support)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-21 22:19:07 +01:00
Rafael J. Wysocki
7de32556df cpufreq: intel_pstate: One set of global limits in active mode
In the active mode intel_pstate currently uses two sets of global
limits, each associated with one of the possible scaling_governor
settings in that mode: "powersave" or "performance".

The driver switches over from one of those sets to the other
depending on the scaling_governor setting for the last CPU whose
per-policy cpufreq interface in sysfs was last used to change
parameters exposed in there.  That obviously leads to no end of
issues when the scaling_governor settings differ between CPUs.

The most recent issue was introduced by commit a240c4aa5d (cpufreq:
intel_pstate: Do not reinit performance limits in ->setpolicy)
that eliminated the reinitialization of "performance" limits in
intel_pstate_set_policy() preventing the max limit from being set
to anything below 100, among other things.

Namely, an undesirable side effect of commit a240c4aa5d is that
now, after setting scaling_governor to "performance" in the active
mode, the per-policy limits for the CPU in question go to the highest
level and stay there even when it is switched back to "powersave"
later.

As it turns out, some distributions set scaling_governor to
"performance" temporarily for all CPUs to speed-up system
initialization, so that change causes them to misbehave later.

To fix that, get rid of the performance/powersave global limits
split and use just one set of global limits for everything.

From the user's persepctive, after this modification, when
scaling_governor is switched from "performance" to "powersave"
or the other way around on one CPU, the limits settings (ie. the
global max/min_perf_pct and per-policy scaling_max/min_freq for
any CPUs) will not change.  Still, switching from "performance"
to "powersave" or the other way around changes the way in which
P-states are selected and in particular "performance" causes the
driver to always request the highest P-state it is allowed to ask
for for the given CPU.

Fixes: a240c4aa5d (cpufreq: intel_pstate: Do not reinit performance limits in ->setpolicy)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-18 00:57:39 +01:00
Rafael J. Wysocki
e4c204ced0 cpufreq: intel_pstate: Avoid percentages in limits-related computations
Currently, intel_pstate_update_perf_limits() first converts the
policy minimum and maximum limits into percentages of the maximum
turbo frequency (rounding up to an integer) and then converts these
percentages to fractions (by using fixed-point arithmetic to divide
them by 100).

That introduces a rounding error unnecessarily, because the fractions
can be obtained by carrying out fixed-point divisions directly on the
input numbers.

Rework the computations in intel_pstate_hwp_set() to use fractions
instead of percentages (and drop redundant local variables from
there) and modify intel_pstate_update_perf_limits() to compute the
fractions directly and percentages out of them.

While at it, introduce percent_ext_fp() for converting percentages
to fractions (with extended number of fraction bits) and use it in
the computations.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-15 16:52:29 +01:00
Srinivas Pandruvada
3f8ed54aee cpufreq: intel_pstate: Correct frequency setting in the HWP mode
In the functions intel_pstate_hwp_set(), min/max range from HWP capability
MSR along with max_perf_pct and min_perf_pct, is used to set the HWP
request MSR. In some cases this doesn't result in the correct HWP max/min
in HWP request.

For example: In the following case:

HWP capabilities from MSR 0x771
0x70a1220

Here cpufreq min/max frequencies from above MSR dump are 700MHz and 3.2GHz
respectively.

This will result in
hwp_min = 0x07
hwp_max = 0x20

To limit max frequency to 2GHz:

perf_limits->max_perf_pct = 63 (2GHz as a percent of 3.2GHz rounded up)

With the current calculation:
adj_range = max_perf_pct * range / 100;
adj_range = 63 * (32 - 7) / 100
adj_range = 15

max = hw_min + adj_range;
max = 7 + 15 = 22

This will result in HWP request of 0x160f, which will result in a
frequency cap of 2.2GHz not 2GHz.

The problem with the above calculation is that hwp_min of 7 is treated
as 0% in the range. But max_perf_pct is calculated with respect to minimum
as 0 and max as 3.2GHz or hwp_max, so adding hwp_min to it will result in
more than the desired.

Since the min_perf_pct and max_perf_pct is already a percent of max
frequency or hwp_max, this min/max HWP request value can be calculated
directly applying these percentage to hwp_max.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-14 03:56:39 +01:00
Rafael J. Wysocki
6e7408acd0 cpufreq: intel_pstate: Update pid_params.sample_rate_ns in pid_param_set()
Fix the debugfs interface for PID tuning to actually update
pid_params.sample_rate_ns on PID parameters updates, as changing
pid_params.sample_rate_ms via debugfs has no effect now.

Fixes: a4675fbc4a (cpufreq: intel_pstate: Replace timers with utilization update callbacks)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2017-03-13 23:55:12 +01:00
Rafael J. Wysocki
5f98ced1c9 cpufreq: intel_pstate: Drop redundant wrapper function
intel_pstate_hwp_set_policy() is a wrapper around
intel_pstate_hwp_set(), but the only value it adds is to check
hwp_active before calling the latter and one of its two callers
has already checked hwp_active before that happens, so in that
code path the additional check is redundant and using the wrapper
is rather pointless.

For this reason, drop intel_pstate_hwp_set_policy() and make its
callers invoke intel_pstate_hwp_set() directly (after checking
hwp_active).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2017-03-12 23:07:58 +01:00
Rafael J. Wysocki
fd8e57d5d3 Merge branch 'pm-cpufreq'
* pm-cpufreq:
  cpufreq: intel_pstate: Do not reinit performance limits in ->setpolicy
  cpufreq: intel_pstate: Fix intel_pstate_verify_policy()
  cpufreq: intel_pstate: Fix global settings in active mode
  cpufreq: Add the "cpufreq.off=1" cmdline option
  cpufreq: intel_pstate: Avoid triggering cpu_frequency tracepoint unnecessarily
  cpufreq: intel_pstate: Fix intel_cpufreq_verify_policy()
  cpufreq: intel_pstate: Do not use performance_limits in passive mode
2017-03-09 15:12:27 +01:00
Rafael J. Wysocki
a240c4aa5d cpufreq: intel_pstate: Do not reinit performance limits in ->setpolicy
If the current P-state selection algorithm is set to "performance"
in intel_pstate_set_policy(), the limits may be initialized from
scratch, but only if no_turbo is not set and the maximum frequency
allowed for the given CPU (i.e. the policy object representing it)
is at least equal to the max frequency supported by the CPU.  In all
of the other cases, the limits will not be updated.

For example, the following can happen:

 # cat intel_pstate/status
 active
 # echo performance > cpufreq/policy0/scaling_governor
 # cat intel_pstate/min_perf_pct
 100
 # echo 94 > intel_pstate/min_perf_pct
 # cat intel_pstate/min_perf_pct
 100
 # cat cpufreq/policy0/scaling_max_freq
 3100000
 echo 3000000 > cpufreq/policy0/scaling_max_freq
 # cat intel_pstate/min_perf_pct
 94
 # echo 95 > intel_pstate/min_perf_pct
 # cat intel_pstate/min_perf_pct
 95

That is confusing for two reasons.  First, the initial attempt to
change min_perf_pct to 94 seems to have no effect, even though
setting the global limits should always work.  Second, after
changing scaling_max_freq for policy0 the global min_perf_pct
attribute shows 94, even though it should have not been affected
by that operation in principle.

Moreover, the final attempt to change min_perf_pct to 95 worked
as expected, because scaling_max_freq for the only policy with
scaling_governor equal to "performance" was different from the
maximum at that time.

To make all that confusion go away, modify intel_pstate_set_policy()
so that it doesn't reinitialize the limits at all.

At the same time, change intel_pstate_set_performance_limits() to
set min_sysfs_pct to 100 in the "performance" limits set so that
switching the P-state selection algorithm to "performance" causes
intel_pstate/min_perf_pct in sysfs to go to 100 (or whatever value
min_sysfs_pct in the "performance" limits is set to later).

That requires per-CPU limits to be initialized explicitly rather
than by copying the global limits to avoid setting min_sysfs_pct
in the per-CPU limits to 100.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-06 00:06:05 +01:00
Rafael J. Wysocki
d74b199291 cpufreq: intel_pstate: Fix intel_pstate_verify_policy()
The code added to intel_pstate_verify_policy() by commit 1443ebbacf
(cpufreq: intel_pstate: Fix sysfs limits enforcement for performance
policy) should use perf_limits instead of limits, because otherwise
setting global limits via sysfs may affect policies inconsistently.

For example, in the sequence of shell commands below, the
scaling_min_freq attribute for policy1 and policy2 should be
affected in the same way, because scaling_governor is set in
the same way for both of them:

 # cat cpufreq/policy1/scaling_governor
 powersave
 # cat cpufreq/policy2/scaling_governor
 powersave
 # echo performance > cpufreq/policy0/scaling_governor
 # echo 94 > intel_pstate/min_perf_pct
 # cat cpufreq/policy0/scaling_min_freq
 2914000
 # cat cpufreq/policy1/scaling_min_freq
 2914000
 # cat cpufreq/policy2/scaling_min_freq
 800000

The are affected differently, because intel_pstate_verify_policy()
is invoked with limits set to &performance_limits (left behind by
policy0) for policy1 and with limits set to &powersave_limits (left
behind by policy1) for policy2.  Since perf_limits is set to the
set of limits matching the policy being updated, using it instead
of limits fixes the inconsistency.

Fixes: 1443ebbacf (cpufreq: intel_pstate: Fix sysfs limits enforcement for performance policy)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-06 00:06:05 +01:00
Rafael J. Wysocki
cd59b4bed9 cpufreq: intel_pstate: Fix global settings in active mode
Commit 111b8b3fe4 (cpufreq: intel_pstate: Always keep all
limits settings in sync) changed intel_pstate to invoke
cpufreq_update_policy() for every registered CPU on global sysfs
attributes updates, but that led to undesirable effects in the
active mode if the "performance" P-state selection algorithm is
configufred for one CPU and the "powersave" one is chosen for
all of the other CPUs.

Namely, in that case, the following is possible:

 # cd /sys/devices/system/cpu/
 # cat intel_pstate/max_perf_pct
 100
 # cat intel_pstate/min_perf_pct
 26
 # echo performance > cpufreq/policy0/scaling_governor
 # cat intel_pstate/max_perf_pct
 100
 # cat intel_pstate/min_perf_pct
 100
 # echo 94 > intel_pstate/min_perf_pct
 # cat intel_pstate/min_perf_pct
 26

The reason why this happens is because intel_pstate attempts to
maintain two sets of global limits in the active mode, one for
the "performance" P-state selection algorithm and one for the
"powersave"  P-state selection algorithm, but the P-state selection
algorithms are set per policy, so the global limits cannot reflect
all of them at the same time if they are different for different
policies.

In the particular situation above, the attempt to change
min_perf_pct to 94 caused cpufreq_update_policy() to be run
for a CPU with the "powersave"  P-state selection algorithm
and intel_pstate_set_policy() called by it silently switched the
global limits to the "powersave" set which finally was reflected
by the sysfs interface.

To prevent that from happening, modify intel_pstate_update_policies()
to always switch back to the set of limits that was used right before
it has been invoked.

Fixes: 111b8b3fe4 (cpufreq: intel_pstate: Always keep all limits settings in sync)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-06 00:06:04 +01:00
Rafael J. Wysocki
6407829901 cpufreq: intel_pstate: Avoid triggering cpu_frequency tracepoint unnecessarily
In the passive mode the cpu_frequency trace event is already
triggered by the cpufreq core or by scaling governors, so
intel_pstate should not trigger it once again for the same
P-state updates.

In addition to that, the frequency returned by
intel_cpufreq_fast_switch() and passed via freqs.new from
intel_cpufreq_target() to cpufreq_freq_transition_end() should
reflect the P-state actually set, so make that happen.

Fixes: 001c76f05b (cpufreq: intel_pstate: Generic governors support)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-04 01:38:42 +01:00
Rafael J. Wysocki
7f17326fc0 cpufreq: intel_pstate: Fix intel_cpufreq_verify_policy()
The intel_pstate_update_perf_limits() called from
intel_cpufreq_verify_policy() may cause global P-state limits
to change which is generally confusing and unnecessary.

In the passive mode the global limits are only applied to the
frequency selected by the scaling governor (they are not taken
into account by governors when making decisions anyway), so making
them follow the per-policy limits serves no purpose and may go
against user expectations (as it generally causes the global
attributes in sysfs to change even though they have not been
written to in some cases).

Fix that by dropping the intel_pstate_update_perf_limits()
invocation from intel_cpufreq_verify_policy() (which also
reduces the code size by a few lines).

This change does not affect the per-CPU limits case, because those
limits allow any P-state to be set by default in the passive mode
and it removes the only piece of code updating them in that mode,
so the per-policy settings will be the only ones taken into account
in that case as expected.

Fixes: 001c76f05b (cpufreq: intel_pstate: Generic governors support)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-04 01:38:41 +01:00
Rafael J. Wysocki
2bc756e7dd cpufreq: intel_pstate: Do not use performance_limits in passive mode
Using performance_limits in the passive mode doesn't make
sense, because in that mode the global limits are applied to the
frequency selected by the scaling governor.

The maximum and minimum P-state limits in performance_limits are both
set to 100 percent which will put all CPUs into the turbo range
regardless of what governor is used and what frequencies are
selected by it (that is particularly undesirable on CPUs with the
generic powersave governor attached).

For this reason, make intel_pstate_register_driver() always point
limits to powersave_limits in the passive mode.

Fixes: 001c76f05b (cpufreq: intel_pstate: Generic governors support)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-04 01:38:41 +01:00
Linus Torvalds
1827adb11a Merge branch 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull sched.h split-up from Ingo Molnar:
 "The point of these changes is to significantly reduce the
  <linux/sched.h> header footprint, to speed up the kernel build and to
  have a cleaner header structure.

  After these changes the new <linux/sched.h>'s typical preprocessed
  size goes down from a previous ~0.68 MB (~22K lines) to ~0.45 MB (~15K
  lines), which is around 40% faster to build on typical configs.

  Not much changed from the last version (-v2) posted three weeks ago: I
  eliminated quirks, backmerged fixes plus I rebased it to an upstream
  SHA1 from yesterday that includes most changes queued up in -next plus
  all sched.h changes that were pending from Andrew.

  I've re-tested the series both on x86 and on cross-arch defconfigs,
  and did a bisectability test at a number of random points.

  I tried to test as many build configurations as possible, but some
  build breakage is probably still left - but it should be mostly
  limited to architectures that have no cross-compiler binaries
  available on kernel.org, and non-default configurations"

* 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (146 commits)
  sched/headers: Clean up <linux/sched.h>
  sched/headers: Remove #ifdefs from <linux/sched.h>
  sched/headers: Remove the <linux/topology.h> include from <linux/sched.h>
  sched/headers, hrtimer: Remove the <linux/wait.h> include from <linux/hrtimer.h>
  sched/headers, x86/apic: Remove the <linux/pm.h> header inclusion from <asm/apic.h>
  sched/headers, timers: Remove the <linux/sysctl.h> include from <linux/timer.h>
  sched/headers: Remove <linux/magic.h> from <linux/sched/task_stack.h>
  sched/headers: Remove <linux/sched.h> from <linux/sched/init.h>
  sched/core: Remove unused prefetch_stack()
  sched/headers: Remove <linux/rculist.h> from <linux/sched.h>
  sched/headers: Remove the 'init_pid_ns' prototype from <linux/sched.h>
  sched/headers: Remove <linux/signal.h> from <linux/sched.h>
  sched/headers: Remove <linux/rwsem.h> from <linux/sched.h>
  sched/headers: Remove the runqueue_is_locked() prototype
  sched/headers: Remove <linux/sched.h> from <linux/sched/hotplug.h>
  sched/headers: Remove <linux/sched.h> from <linux/sched/debug.h>
  sched/headers: Remove <linux/sched.h> from <linux/sched/nohz.h>
  sched/headers: Remove <linux/sched.h> from <linux/sched/stat.h>
  sched/headers: Remove the <linux/gfp.h> include from <linux/sched.h>
  sched/headers: Remove <linux/rtmutex.h> from <linux/sched.h>
  ...
2017-03-03 10:16:38 -08:00
Linus Torvalds
c82be9d224 Power management turbostat utility updates for v4.11-rc1
These update turbostat significantly and in particular:
 
  - Default output is now verbose, --debug is no longer required to
    get all counters.  As a result, some options have been added to
    specify exactly what output is wanted.
  - Added --quiet to skip system configuration output
  - Added --list, --show and --hide parameters
  - Added --cpu parameter
  - Enhanced Baytrail SoC support
  - Added Gemini Lake SoC support
  - Added sysfs C-state columns
 
 Also the symbol definitions in arch/x86/include/asm/intel-family.h
 and arch/x86/include/asm/msr-index.h are updated and the intel_idle
 and intel_pstate drivers are modified to use the updated symbols.
 
 Credits to Len Brown for all of these changes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJYuLyNAAoJEILEb/54YlRxEvkQAJsggzpgGrlhrO6KHSm4yC9M
 CqhBVsdeppX1ZTAVPiMk/pcXQYtL5fZ97ELk2So/CjT5Nh3jwDPMA/ux5n3uiob+
 O2BTdtxnpNLxPQPQM1mW7Dr/uAIRlJug9gSMxKDbFSU9Oe3aET58PUdUTs7xaT59
 nbtLxVSvzrdGk/bX6WO4ic+7F2licJLZPfDGhYidnoika8LxD4M+cIO73gFpgqQi
 yoKrTZyLimvneFT0eAUUvHIyKjkJIxeMfslW57uBpz8rW5my+3UwsdpRG4AIVeWc
 wSBlsNqj+TuR4BBiZ2VR2RoHF3qbH/SceI+k864BqyThfyK/g2q/vV/GvLZQCR/R
 yWcajWD9kvLKvnm1D3XYOIQDBeP4l60j3vVwHytSvmaPYjn5Ms3jq6b+2K6zkXMM
 8y3leW/hgw+rGCacdXPrKIlpBykSV7h+TnD2iMxeeDISNkbefWWDe/WB6HncocAg
 HDtKRvU9ntRq6/MlnTKbCFM5c0oCXWRw4QNjDy3AsjJELgeAIwiqpHWMKO6XltFj
 qU/rdyW/BTCuAlIjWVbjooAIJZ268geupeug3zvE3uGzrxT4DaVIo8W1wtJ+XQrt
 By7sOW/gMQ2EcTJQiuFjS/Gz5gOKQ2F8OLCm6T8Prjh6SxrCUAiuIvP0LmxUCa8i
 KMlx+8c9E2f9j+TTt9AP
 =oMZe
 -----END PGP SIGNATURE-----

Merge tag 'pm-turbostat-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull turbostat utility updates from Rafael Wysocki:
 "Power management turbostat utility updates.

  These update turbostat significantly and in particular:

   - default output is now verbose, --debug is no longer required to get
     all counters. As a result, some options have been added to specify
     exactly what output is wanted.

   - added --quiet to skip system configuration output

   - added --list, --show and --hide parameters

   - added --cpu parameter

   - enhanced Baytrail SoC support

   - added Gemini Lake SoC support

   - added sysfs C-state columns

  Also the symbol definitions in arch/x86/include/asm/intel-family.h and
  arch/x86/include/asm/msr-index.h are updated and the intel_idle and
  intel_pstate drivers are modified to use the updated symbols.

  Credits to Len Brown for all of these changes"

* tag 'pm-turbostat-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (44 commits)
  tools/power turbostat: version 17.02.24
  tools/power turbostat: bugfix: --add u32 was printed as u64
  tools/power turbostat: show error on exec
  tools/power turbostat: dump p-state software config
  tools/power turbostat: show package number, even without --debug
  tools/power turbostat: support "--hide C1" etc.
  tools/power turbostat: move --Package and --processor into the --cpu option
  tools/power turbostat: turbostat.8 update
  tools/power turbostat: update --list feature
  tools/power turbostat: use wide columns to display large numbers
  tools/power turbostat: Add --list option to show available header names
  tools/power turbostat: fix zero IRQ count shown in one-shot command mode
  tools/power turbostat: add --cpu parameter
  tools/power turbostat: print sysfs C-state stats
  tools/power turbostat: extend --add option to accept /sys path
  tools/power turbostat: skip unused counters on BDX
  tools/power turbostat: fix decoding for GLM, DNV, SKX turbo-ratio limits
  tools/power turbostat: skip unused counters on SKX
  tools/power turbostat: Denverton: use HW CC1 counter, skip C3, C7
  tools/power turbostat: initial Gemini Lake SOC support
  ...
2017-03-02 17:41:27 -08:00
Ingo Molnar
55687da166 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/cpufreq.h>
We are going to split <linux/sched/cpufreq.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/cpufreq.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:30 +01:00
Rafael J. Wysocki
6bff9c609f Merge branch 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull changes related to turbostat for v4.11 from Len Brown.

* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (44 commits)
  tools/power turbostat: version 17.02.24
  tools/power turbostat: bugfix: --add u32 was printed as u64
  tools/power turbostat: show error on exec
  tools/power turbostat: dump p-state software config
  tools/power turbostat: show package number, even without --debug
  tools/power turbostat: support "--hide C1" etc.
  tools/power turbostat: move --Package and --processor into the --cpu option
  tools/power turbostat: turbostat.8 update
  tools/power turbostat: update --list feature
  tools/power turbostat: use wide columns to display large numbers
  tools/power turbostat: Add --list option to show available header names
  tools/power turbostat: fix zero IRQ count shown in one-shot command mode
  tools/power turbostat: add --cpu parameter
  tools/power turbostat: print sysfs C-state stats
  tools/power turbostat: extend --add option to accept /sys path
  tools/power turbostat: skip unused counters on BDX
  tools/power turbostat: fix decoding for GLM, DNV, SKX turbo-ratio limits
  tools/power turbostat: skip unused counters on SKX
  tools/power turbostat: Denverton: use HW CC1 counter, skip C3, C7
  tools/power turbostat: initial Gemini Lake SOC support
  ...
2017-03-01 23:34:38 +01:00
Len Brown
92134bdbc6 intel_pstate: use MSR_ATOM_RATIOS definitions from msr-index.h
Originally, these MSRs were locally defined in this driver.
Now the definitions are in msr-index.h -- use them.

Signed-off-by: Len Brown <len.brown@intel.com>
2017-03-01 00:14:03 -05:00
Rafael J. Wysocki
c3a49c8991 cpufreq: intel_pstate: Fix limits issue with operation mode switching
There is a problem with intel_pstate operation mode switching
introduced by commit fb1fe1041c (cpufreq: intel_pstate: Operation
mode control from sysfs), because the global sysfs limits are
preserved across operation modes while per-policy limits are
reinitialized from scratch on a mode switch and both sets of limits
may get out of sync this way.

Fix that by always reinitializing the global limits upon the
registration of the driver.

Fixes: fb1fe1041c (cpufreq: intel_pstate: Operation mode control from sysfs)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2017-02-28 13:55:52 +01:00
Rafael J. Wysocki
56c7303e62 Merge back earlier cpufreq changes for v4.11. 2017-02-09 01:18:14 +01:00
Rafael J. Wysocki
cbf304e420 Merge branches 'pm-core-fixes' and 'pm-cpufreq-fixes'
* pm-core-fixes:
  PM / runtime: Avoid false-positive warnings from might_sleep_if()

* pm-cpufreq-fixes:
  cpufreq: intel_pstate: Disable energy efficiency optimization
  cpufreq: brcmstb-avs-cpufreq: properly retrieve P-state upon suspend
  cpufreq: brcmstb-avs-cpufreq: extend sysfs entry brcm_avs_pmap
2017-02-06 14:52:10 +01:00
Srinivas Pandruvada
6e978b22ef cpufreq: intel_pstate: Disable energy efficiency optimization
Some Kabylake desktop processors may not reach max turbo when running in
HWP mode, even if running under sustained 100% utilization.

This occurs when the HWP.EPP (Energy Performance Preference) is set to
"balance_power" (0x80) -- the default on most systems.

It occurs because the platform BIOS may erroneously enable an
energy-efficiency setting -- MSR_IA32_POWER_CTL BIT-EE, which is not
recommended to be enabled on this SKU.

On the failing systems, this BIOS issue was not discovered when the
desktop motherboard was tested with Windows, because the BIOS also
neglects to provide the ACPI/CPPC table, that Windows requires to enable
HWP, and so Windows runs in legacy P-state mode, where this setting has
no effect.

Linux' intel_pstate driver does not require ACPI/CPPC to enable HWP, and
so it runs in HWP mode, exposing this incorrect BIOS configuration.

There are several ways to address this problem.

First, Linux can also run in legacy P-state mode on this system.
As intel_pstate is how Linux enables HWP, booting with
"intel_pstate=disable"
will run in acpi-cpufreq/ondemand legacy p-state mode.

Or second, the "performance" governor can be used with intel_pstate,
which will modify HWP.EPP to 0.

Or third, starting in 4.10, the
/sys/devices/system/cpu/cpufreq/policy*/energy_performance_preference
attribute in can be updated from "balance_power" to "performance".

Or fourth, apply this patch, which fixes the erroneous setting of
MSR_IA32_POWER_CTL BIT_EE on this model, allowing the default
configuration to function as designed.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: 4.6+ <stable@vger.kernel.org> # 4.6+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-02-04 00:11:08 +01:00
Srinivas Pandruvada
8fc7554ae5 cpufreq: intel_pstate: Calculate guaranteed performance for HWP
When HWP is active, turbo activation ratio is not used to calculate max
non turbo ratio. But on these systems the max non turbo ratio is decided
by config TDP settings.

This change removes usage of MSR_TURBO_ACTIVATION_RATIO for HWP systems,
instead directly use TDP ratios, when more than one TDPs are available.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-02-04 00:05:33 +01:00
Srinivas Pandruvada
4e5d3f713b cpufreq: intel_pstate: Make HWP limits compatible with legacy
Under HWP the performance limits are calculated using max_perf_pct
and min_perf_pct using possible performance, not available performance.
The available performance can be reduced by no_turbo setting. To make
compatible with legacy mode, use max/min performance percentage with
respect to available performance.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-02-04 00:05:32 +01:00
Srinivas Pandruvada
7d9a8a9f4e cpufreq: intel_pstate: Lower frequency than expected under no_turbo
When turbo is not disabled by BIOS, but user disabled from intel P-State
sysfs and changes max/min using cpufreq sysfs, the resultant frequency
is lower than what user requested.

The reason for this, when the perf limits are calculated in set_policy()
callback, they are with reference to max cpu frequency (turbo frequency
), but when enforced in the intel_pstate_get_min_max() they are with
reference to max available performance as documented in the intel_pstate
documentation (in this case max non turbo P-State).

This needs similar change as done in intel_cpufreq_verify_policy() for
passive mode. Set policy->cpuinfo.max_freq based on the turbo status.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-02-04 00:05:32 +01:00
Rafael J. Wysocki
fb1fe1041c cpufreq: intel_pstate: Operation mode control from sysfs
Make it possible to change the operation mode of intel_pstate with
the help of a new sysfs attribute called "status".

There are three possible configurations that can be selected using
this attribute:

 "off"     - The driver is not in use at this time.
 "active"  - The driver works as a P-state governor (default).
 "passive" - The driver works as a regular cpufreq one and collaborates
             with the generic cpufreq governors (it sets P-states as
             requested by those governors).  [This is the same mode
             the driver can be started in by passing intel_pstate=passive
             in the kernel command line.]

The current setting is returned by reads from this attribute.  Writing
one of the above strings to it changes the operation mode as indicated
by that string, if possible.

If HW-managed P-states (HWP) feature is enabled, it is not possible
to change the driver's operation mode and attempts to write to this
attribute will fail.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-02-04 00:05:31 +01:00
Rafael J. Wysocki
0c30b65b3c cpufreq: intel_pstate: Expose global sysfs attributes upfront
Expose the intel_pstate's global sysfs attributes before registering
the driver to prepare for the addition of an attribute that also will
have to work if the driver is not registered.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-02-04 00:05:30 +01:00
Rafael J. Wysocki
ff7e593c9c Merge branches 'pm-sleep' and 'pm-cpufreq'
* pm-sleep:
  Revert "PM / sleep / ACPI: Use the ACPI_FADT_LOW_POWER_S0 flag"

* pm-cpufreq:
  cpufreq: intel_pstate: Fix sysfs limits enforcement for performance policy
2017-01-27 00:08:59 +01:00
Srinivas Pandruvada
1443ebbacf cpufreq: intel_pstate: Fix sysfs limits enforcement for performance policy
A side effect of keeping intel_pstate sysfs limits in sync with cpufreq
is that the now sysfs limits can't enforced under performance policy.

For example, if the max_perf_pct is changed from 100 to 80, this will call
intel_pstate_set_policy(), which will change the max_perf to 100 again for
performance policy. Same issue happens, when no_turbo is set.

This change calculates max and min frequency using sysfs performance
limits in intel_pstate_verify_policy() and adjusts policy limits by
calling cpufreq_verify_within_limits().

Also, it causes the setting of performance limits to be skipped if
no_turbo is set.

Fixes: 111b8b3fe4 (cpufreq: intel_pstate: Always keep all limits settings in sync)
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-01-20 03:35:27 +01:00
Rafael J. Wysocki
3baad65546 Merge branch 'pm-cpufreq'
* pm-cpufreq:
  cpufreq: dt: Add support for APM X-Gene 2
  cpufreq: intel_pstate: Always keep all limits settings in sync
  cpufreq: intel_pstate: Use locking in intel_cpufreq_verify_policy()
  cpufreq: intel_pstate: Use locking in intel_pstate_resume()
  cpufreq: intel_pstate: Do not expose PID parameters in passive mode
2017-01-06 14:34:52 +01:00
Rafael J. Wysocki
111b8b3fe4 cpufreq: intel_pstate: Always keep all limits settings in sync
Make intel_pstate update per-logical-CPU limits when the global
settings are changed to ensure that they are always in sync and
users will not see confusing values in per-logical-CPU sysfs
attributes.

This also fixes the problem that setting the "no_turbo" global
attribute to 1 in the "passive" mode (ie. when intel_pstate acts
as a regular cpufreq driver) when scaling_governor is set to
"performance" has no effect.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2016-12-31 21:48:44 +01:00
Rafael J. Wysocki
cad3046796 cpufreq: intel_pstate: Use locking in intel_cpufreq_verify_policy()
Race conditions are possible if intel_cpufreq_verify_policy()
is executed in parallel with global limits updates from sysfs,
so the invocation of intel_pstate_update_perf_limits() in it
should be carried out under intel_pstate_limits_lock.

Make that happen.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2016-12-31 21:48:43 +01:00
Rafael J. Wysocki
aa439248ab cpufreq: intel_pstate: Use locking in intel_pstate_resume()
Theoretically, intel_pstate_resume() may be executed in parallel
with intel_pstate_set_policy(), if the latter is invoked via
cpufreq_update_policy() as a result of a notification, so use
intel_pstate_limits_lock in there too to avoid race conditions.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2016-12-31 21:48:42 +01:00
Rafael J. Wysocki
366430b5c2 cpufreq: intel_pstate: Do not expose PID parameters in passive mode
If intel_pstate works in the passive mode in which it acts as
a regular cpufreq driver and collaborates with generic cpufreq
governors, the PID parameters are not used, so do not expose
them via debugfs in that case.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-12-27 03:30:11 +01:00
Linus Torvalds
7b9dc3f75f Power management material for v4.10-rc1
- New cpufreq driver for Broadcom STB SoCs and a Device Tree binding
    for it (Markus Mayer).
 
  - Support for ARM Integrator/AP and Integrator/CP in the generic
    DT cpufreq driver and elimination of the old Integrator cpufreq
    driver (Linus Walleij).
 
  - Support for the zx296718, r8a7743 and r8a7745, Socionext UniPhier,
    and PXA SoCs in the the generic DT cpufreq driver (Baoyou Xie,
    Geert Uytterhoeven, Masahiro Yamada, Robert Jarzmik).
 
  - cpufreq core fix to eliminate races that may lead to using
    inactive policy objects and related cleanups (Rafael Wysocki).
 
  - cpufreq schedutil governor update to make it use SCHED_FIFO
    kernel threads (instead of regular workqueues) for doing delayed
    work (to reduce the response latency in some cases) and related
    cleanups (Viresh Kumar).
 
  - New cpufreq sysfs attribute for resetting statistics (Markus
    Mayer).
 
  - cpufreq governors fixes and cleanups (Chen Yu, Stratos Karafotis,
    Viresh Kumar).
 
  - Support for using generic cpufreq governors in the intel_pstate
    driver (Rafael Wysocki).
 
  - Support for per-logical-CPU P-state limits and the EPP/EPB
    (Energy Performance Preference/Energy Performance Bias) knobs
    in the intel_pstate driver (Srinivas Pandruvada).
 
  - New CPU ID for Knights Mill in intel_pstate (Piotr Luc).
 
  - intel_pstate driver modification to use the P-state selection
    algorithm based on CPU load on platforms with the system profile
    in the ACPI tables set to "mobile" (Srinivas Pandruvada).
 
  - intel_pstate driver cleanups (Arnd Bergmann, Rafael Wysocki,
    Srinivas Pandruvada).
 
  - cpufreq powernv driver updates including fast switching support
    (for the schedutil governor), fixes and cleanus (Akshay Adiga,
    Andrew Donnellan, Denis Kirjanov).
 
  - acpi-cpufreq driver rework to switch it over to the new CPU
    offline/online state machine (Sebastian Andrzej Siewior).
 
  - Assorted cleanups in cpufreq drivers (Wei Yongjun, Prashanth
    Prakash).
 
  - Idle injection rework (to make it use the regular idle path
    instead of a home-grown custom one) and related powerclamp
    thermal driver updates (Peter Zijlstra, Jacob Pan, Petr Mladek,
    Sebastian Andrzej Siewior).
 
  - New CPU IDs for Atom Z34xx and Knights Mill in intel_idle (Andy
    Shevchenko, Piotr Luc).
 
  - intel_idle driver cleanups and switch over to using the new CPU
    offline/online state machine (Anna-Maria Gleixner, Sebastian
    Andrzej Siewior).
 
  - cpuidle DT driver update to support suspend-to-idle properly
    (Sudeep Holla).
 
  - cpuidle core cleanups and misc updates (Daniel Lezcano, Pan Bian,
    Rafael Wysocki).
 
  - Preliminary support for power domains including CPUs in the
    generic power domains (genpd) framework and related DT bindings
    (Lina Iyer).
 
  - Assorted fixes and cleanups in the generic power domains (genpd)
    framework (Colin Ian King, Dan Carpenter, Geert Uytterhoeven).
 
  - Preliminary support for devices with multiple voltage regulators
    and related fixes and cleanups in the Operating Performance Points
    (OPP) library (Viresh Kumar, Masahiro Yamada, Stephen Boyd).
 
  - System sleep state selection interface rework to make it easier
    to support suspend-to-idle as the default system suspend method
    (Rafael Wysocki).
 
  - PM core fixes and cleanups, mostly related to the interactions
    between the system suspend and runtime PM frameworks (Ulf Hansson,
    Sahitya Tummala, Tony Lindgren).
 
  - Latency tolerance PM QoS framework imorovements (Andrew
    Lutomirski).
 
  - New Knights Mill CPU ID for the Intel RAPL power capping driver
    (Piotr Luc).
 
  - Intel RAPL power capping driver fixes, cleanups and switch over
    to using the new CPU offline/online state machine (Jacob Pan,
    Thomas Gleixner, Sebastian Andrzej Siewior).
 
  - Fixes and cleanups in the exynos-ppmu, exynos-nocp, rk3399_dmc,
    rockchip-dfi devfreq drivers and the devfreq core (Axel Lin,
    Chanwoo Choi, Javier Martinez Canillas, MyungJoo Ham, Viresh
    Kumar).
 
  - Fix for false-positive KASAN warnings during resume from ACPI S3
    (suspend-to-RAM) on x86 (Josh Poimboeuf).
 
  - Memory map verification during resume from hibernation on x86 to
    ensure a consistent address space layout (Chen Yu).
 
  - Wakeup sources debugging enhancement (Xing Wei).
 
  - rockchip-io AVS driver cleanup (Shawn Lin).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJYTx4+AAoJEILEb/54YlRx9f8P/2SlNHUENW5qh6FtCw00oC2u
 UqJerQJ2L38UgbgxbE/0VYblma9rFABDWC1eO2xN2XdcdW5UPBKPVvNcOgNe1Clh
 gjy3RxZXVpmjfzt2kGfsTLEuGnHqwvx51hTUkeA2LwvkOal45xb8ZESmy8opCtiv
 iG4LwmPHoxdX5Za5nA9ItFKzxyO1EoyNSnBYAVwALDHxmNOfxEcRevfurASt/0M9
 brCCZJA0/sZxeL0lBdy8fNQPIBTUfCoTJG/MtmzGrObJ9wMFvEDfXrVEyZiWs/zA
 AAZ4kQL77enrIKgrLN8e0G6LzTLHoVcvn38Xjf24dKUqhd7ACBhYcnW+jK3+7EAd
 gjZ8efObQsiuyK/EDLUNw35tt96CHOqfrQCj2tIwRVvk9EekLqAGXdIndTCr2kYW
 RpefmP5kMljnm/nQFOVLwMEUQMuVkvUE7EgxADy7DoDmepBFC4ICRDWPye70R2kC
 0O1Tn2PAQq4Fd1tyI9TYYz0YQQkRoaRb5rfYUSzbRbeCdsphUopp4Vhsiyn6IcnF
 XnLbg6pRAat82MoS9n4pfO/VCo8vkErKA8tut9G7TDakkrJoEE7l31PdKW0hP3f6
 sBo6xXy6WTeivU/o/i8TbM6K4mA37pBaj78ooIkWLgg5fzRaS2+0xSPVy2H9x1m5
 LymHcobCK9rSZ1l208Fe
 =vhxI
 -----END PGP SIGNATURE-----

Merge tag 'pm-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "Again, cpufreq gets more changes than the other parts this time (one
  new driver, one old driver less, a bunch of enhancements of the
  existing code, new CPU IDs, fixes, cleanups)

  There also are some changes in cpuidle (idle injection rework, a
  couple of new CPU IDs, online/offline rework in intel_idle, fixes and
  cleanups), in the generic power domains framework (mostly related to
  supporting power domains containing CPUs), and in the Operating
  Performance Points (OPP) library (mostly related to supporting devices
  with multiple voltage regulators)

  In addition to that, the system sleep state selection interface is
  modified to make it easier for distributions with unchanged user space
  to support suspend-to-idle as the default system suspend method, some
  issues are fixed in the PM core, the latency tolerance PM QoS
  framework is improved a bit, the Intel RAPL power capping driver is
  cleaned up and there are some fixes and cleanups in the devfreq
  subsystem

  Specifics:

   - New cpufreq driver for Broadcom STB SoCs and a Device Tree binding
     for it (Markus Mayer)

   - Support for ARM Integrator/AP and Integrator/CP in the generic DT
     cpufreq driver and elimination of the old Integrator cpufreq driver
     (Linus Walleij)

   - Support for the zx296718, r8a7743 and r8a7745, Socionext UniPhier,
     and PXA SoCs in the the generic DT cpufreq driver (Baoyou Xie,
     Geert Uytterhoeven, Masahiro Yamada, Robert Jarzmik)

   - cpufreq core fix to eliminate races that may lead to using inactive
     policy objects and related cleanups (Rafael Wysocki)

   - cpufreq schedutil governor update to make it use SCHED_FIFO kernel
     threads (instead of regular workqueues) for doing delayed work (to
     reduce the response latency in some cases) and related cleanups
     (Viresh Kumar)

   - New cpufreq sysfs attribute for resetting statistics (Markus Mayer)

   - cpufreq governors fixes and cleanups (Chen Yu, Stratos Karafotis,
     Viresh Kumar)

   - Support for using generic cpufreq governors in the intel_pstate
     driver (Rafael Wysocki)

   - Support for per-logical-CPU P-state limits and the EPP/EPB (Energy
     Performance Preference/Energy Performance Bias) knobs in the
     intel_pstate driver (Srinivas Pandruvada)

   - New CPU ID for Knights Mill in intel_pstate (Piotr Luc)

   - intel_pstate driver modification to use the P-state selection
     algorithm based on CPU load on platforms with the system profile in
     the ACPI tables set to "mobile" (Srinivas Pandruvada)

   - intel_pstate driver cleanups (Arnd Bergmann, Rafael Wysocki,
     Srinivas Pandruvada)

   - cpufreq powernv driver updates including fast switching support
     (for the schedutil governor), fixes and cleanus (Akshay Adiga,
     Andrew Donnellan, Denis Kirjanov)

   - acpi-cpufreq driver rework to switch it over to the new CPU
     offline/online state machine (Sebastian Andrzej Siewior)

   - Assorted cleanups in cpufreq drivers (Wei Yongjun, Prashanth
     Prakash)

   - Idle injection rework (to make it use the regular idle path instead
     of a home-grown custom one) and related powerclamp thermal driver
     updates (Peter Zijlstra, Jacob Pan, Petr Mladek, Sebastian Andrzej
     Siewior)

   - New CPU IDs for Atom Z34xx and Knights Mill in intel_idle (Andy
     Shevchenko, Piotr Luc)

   - intel_idle driver cleanups and switch over to using the new CPU
     offline/online state machine (Anna-Maria Gleixner, Sebastian
     Andrzej Siewior)

   - cpuidle DT driver update to support suspend-to-idle properly
     (Sudeep Holla)

   - cpuidle core cleanups and misc updates (Daniel Lezcano, Pan Bian,
     Rafael Wysocki)

   - Preliminary support for power domains including CPUs in the generic
     power domains (genpd) framework and related DT bindings (Lina Iyer)

   - Assorted fixes and cleanups in the generic power domains (genpd)
     framework (Colin Ian King, Dan Carpenter, Geert Uytterhoeven)

   - Preliminary support for devices with multiple voltage regulators
     and related fixes and cleanups in the Operating Performance Points
     (OPP) library (Viresh Kumar, Masahiro Yamada, Stephen Boyd)

   - System sleep state selection interface rework to make it easier to
     support suspend-to-idle as the default system suspend method
     (Rafael Wysocki)

   - PM core fixes and cleanups, mostly related to the interactions
     between the system suspend and runtime PM frameworks (Ulf Hansson,
     Sahitya Tummala, Tony Lindgren)

   - Latency tolerance PM QoS framework imorovements (Andrew Lutomirski)

   - New Knights Mill CPU ID for the Intel RAPL power capping driver
     (Piotr Luc)

   - Intel RAPL power capping driver fixes, cleanups and switch over to
     using the new CPU offline/online state machine (Jacob Pan, Thomas
     Gleixner, Sebastian Andrzej Siewior)

   - Fixes and cleanups in the exynos-ppmu, exynos-nocp, rk3399_dmc,
     rockchip-dfi devfreq drivers and the devfreq core (Axel Lin,
     Chanwoo Choi, Javier Martinez Canillas, MyungJoo Ham, Viresh Kumar)

   - Fix for false-positive KASAN warnings during resume from ACPI S3
     (suspend-to-RAM) on x86 (Josh Poimboeuf)

   - Memory map verification during resume from hibernation on x86 to
     ensure a consistent address space layout (Chen Yu)

   - Wakeup sources debugging enhancement (Xing Wei)

   - rockchip-io AVS driver cleanup (Shawn Lin)"

* tag 'pm-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (127 commits)
  devfreq: rk3399_dmc: Don't use OPP structures outside of RCU locks
  devfreq: rk3399_dmc: Remove dangling rcu_read_unlock()
  devfreq: exynos: Don't use OPP structures outside of RCU locks
  Documentation: intel_pstate: Document HWP energy/performance hints
  cpufreq: intel_pstate: Support for energy performance hints with HWP
  cpufreq: intel_pstate: Add locking around HWP requests
  PM / sleep: Print active wakeup sources when blocking on wakeup_count reads
  PM / core: Fix bug in the error handling of async suspend
  PM / wakeirq: Fix dedicated wakeirq for drivers not using autosuspend
  PM / Domains: Fix compatible for domain idle state
  PM / OPP: Don't WARN on multiple calls to dev_pm_opp_set_regulators()
  PM / OPP: Allow platform specific custom set_opp() callbacks
  PM / OPP: Separate out _generic_set_opp()
  PM / OPP: Add infrastructure to manage multiple regulators
  PM / OPP: Pass struct dev_pm_opp_supply to _set_opp_voltage()
  PM / OPP: Manage supply's voltage/current in a separate structure
  PM / OPP: Don't use OPP structure outside of rcu protected section
  PM / OPP: Reword binding supporting multiple regulators per device
  PM / OPP: Fix incorrect cpu-supply property in binding
  cpuidle: Add a kerneldoc comment to cpuidle_use_deepest_state()
  ..
2016-12-13 10:41:53 -08:00
Srinivas Pandruvada
984edbdccc cpufreq: intel_pstate: Support for energy performance hints with HWP
It is possible to provide hints to the HWP algorithms in the processor
to be more performance centric to more energy centric. These hints are
provided by using HWP energy performance preference (EPP) or energy
performance bias (EPB) settings.

The scope of these settings is per logical processor, which means that
each of the logical processors in the package can be programmed with a
different value.

This change provides cpufreq sysfs interface to provide hint. For each
policy, two additional attributes will be available to check and provide
hint. These attributes will only be present when the intel_pstate driver
is using HWP mode.

These attributes are:
 - energy_performance_available_preferences
 - energy_performance_preference

To get list of supported hints:
$ cat energy_performance_available_preferences
default performance balance_performance balance_power power

The current preference can be read or changed via cpufreq sysfs
attribute "energy_performance_preference". Reading from this attribute
will display current effective setting changed via any method. User can
write any of the valid preference string to this attribute. User can
always restore to power-on default by writing "default".

Implementation
Since these hints can be provided by direct MSR write or using some tools
like x86_energy_perf_policy, the driver internally doesn't maintain any
state. The user operation will result in direct read/write of MSR: 0x774
(HWP_REQUEST_MSR). Also driver use read modify write to update other
fields in this MSR.

Summary of changes:
 - struct cpudata field epp_saved is renamed to epp_powersave, as this
   stores the value to restore once policy is switched from performance
   to powersave to restore original powersave EPP value.
 - A new struct cpudata field epp_saved is used to store the raw MSR
   EPP/EPB value when a CPU goes offline or on suspend and restore on
   online/resume. This ensures that EPP value is restored to correct
   value irrespective of the means used to set.
 - EPP/EPB value ranges are fixed for each preference, which can be
   set for the cpufreq sysfs, so user request is mapped to/from this
   range.
 - New attributes are only added when HWP is present.
 - Since EPP value of 0 is valid the fields are initialized to
   -EINVAL when not valid. The field epp_default is read only once
   after powerup to avoid reading on subsequent CPU online operation
 - New suspend callback to store epp on suspend operation
 - Don't invalidate old epp_saved field on resume and online as now
   we can restore last epp value on suspend and this field can still
   have old EPP value sampled during switch to performance from
   powersave.
 - While here optimized setting of cpu_data->epp_powersave = epp in
   intel_pstate_hwp_set() as this was done in both true and false
   paths.
 - epp/epb set function returns error to caller on failure to pass
   on to user space for display.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-12-08 01:43:05 +01:00
Srinivas Pandruvada
b59fe54053 cpufreq: intel_pstate: Add locking around HWP requests
To avoid race conditions from multiple threads, increase the scope
of intel_pstate_limits_lock to include HWP requests also.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Subject ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-12-08 01:43:04 +01:00
Piotr Luc
58bf454272 cpufreq: intel_pstate: Add Knights Mill CPUID
Add Knights Mill (KNM) to the list of CPUIDs supported by intel_pstate.

Signed-off-by: Piotr Luc <piotr.luc@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-12-01 15:08:12 +01:00
Arnd Bergmann
7a3ba767f6 cpufreq: intel_pstate: fix intel_pstate_exit_perf_limits() prototype
The addition of the generic governor support marked the
intel_pstate_exit_perf_limits as inline(), which fixed a warning,
but it introduced another warning:

drivers/cpufreq/intel_pstate.c: In function ‘intel_pstate_exit_perf_limits’:
drivers/cpufreq/intel_pstate.c:483:1: error: no return statement in function returning non-void [-Werror=return-type]

This changes it back to a 'void' return type, and changes the
corresponding intel_pstate_init_acpi_perf_limits() function to
be inline as well for consistency.

Fixes: 001c76f05b (cpufreq: intel_pstate: Generic governors support)
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-28 14:24:21 +01:00
Srinivas Pandruvada
8442885fca cpufreq: intel_pstate: Set EPP/EPB to 0 in performance mode
When user has selected performance policy, then set the EPP (Energy
Performance Preference) or EPB (Energy Performance Bias) to maximum
performance mode.

Also when user switch back to powersave, then restore EPP/EPB to last
EPP/EPB value before entering performance mode. If user has not changed
EPP/EPB manually then it will be power on default value.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-28 14:23:56 +01:00
Rafael J. Wysocki
17669006ad cpufreq/intel_pstate: Use CPPC to get max performance
Use the acpi cppc_lib interface to get CPPC performance limits and update
the per cpu priority for the ITMT scheduler. If the highest performance of
CPUs differs the ITMT feature is enabled.

Co-developed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-pm@vger.kernel.org
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/0998b98943bcdec7d1ddd4ff27358da555ea8e92.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-24 20:44:20 +01:00
Srinivas Pandruvada
d5dd33d9de cpufreq: intel_pstate: increase precision of performance limits
Even with round up of limits->min_perf and limits->max_perf, in some
cases resultant performance is 100 MHz less than the desired.

For example when the maximum frequency is 3.50 GHz, setting
scaling_min_frequency to 2.3 GHz always results in 2.2 GHz minimum.

Currently the fixed floating point operation uses 8 bit precision for
calculating limits->min_perf and limits->max_perf. For some operations
in this driver the 14 bit precision is used. Using the 14 bit precision
also for calculating limits->min_perf and limits->max_perf, addresses
this issue.

Introduced fp_ext_toint() equivalent to fp_toint() and int_ext_tofp()
equivalent to int_tofp() with 14 bit precision.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-22 02:31:49 +01:00
Srinivas Pandruvada
46992d6b55 cpufreq: intel_pstate: round up min_perf limits
In some use cases, user wants to enforce a minimum performance limit on
CPUs. But because of simple division the resultant performance is 100 MHz
less than the desired in some cases.

For example when the maximum frequency is 3.50 GHz, setting
scaling_min_frequency to 1.6 GHz always results in 1.5 GHz minimum. With
simple round up, the frequency can be set to 1.6 GHz to minimum in this
case. This round up is already done to max_policy_pct and max_perf, so do
the same for min_policy_pct and min_perf.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-22 02:31:48 +01:00
Rafael J. Wysocki
001c76f05b cpufreq: intel_pstate: Generic governors support
There may be reasons to use generic cpufreq governors (eg. schedutil)
on Intel platforms instead of the intel_pstate driver's internal
governor.  However, that currently can only be done by disabling
intel_pstate altogether and using the acpi-cpufreq driver instead
of it, which is subject to limitations.

First of all, acpi-cpufreq only works on systems where the _PSS
object is present in the ACPI tables for all logical CPUs.  Second,
on those systems acpi-cpufreq will only use frequencies listed by
_PSS which may be suboptimal.  In particular, by convention, the
whole turbo range is represented in _PSS as a single P-state and
the frequency assigned to it is greater by 1 MHz than the greatest
non-turbo frequency listed by _PSS.  That may confuse governors to
use turbo frequencies less frequently which may lead to suboptimal
performance.

For this reason, make it possible to use the intel_pstate driver
with generic cpufreq governors as a "normal" cpufreq driver.  That
mode is enforced by adding intel_pstate=passive to the kernel
command line and cannot be disabled at run time.  In that mode,
intel_pstate provides a cpufreq driver interface including
the ->target() and ->fast_switch() callbacks and is listed in
scaling_driver as "intel_cpufreq".

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Doug Smythies <dsmythies@telus.net>
2016-11-21 14:32:32 +01:00
Rafael J. Wysocki
d0ea59e188 cpufreq: intel_pstate: Request P-states control from SMM if needed
Currently, intel_pstate is unable to control P-states on my
IvyBridge-based Acer Aspire S5, because they are controlled by SMM
on that machine by default and it is necessary to request OS control
of P-states from it via the SMI Command register exposed in the ACPI
FADT.  intel_pstate doesn't do that now, but acpi-cpufreq and other
cpufreq drivers for x86 platforms do.

Address this problem by making intel_pstate use the ACPI-defined
mechanism as well.  However, intel_pstate is not modular and it
doesn't need the module refcount tricks played by
acpi_processor_notify_smm(), so export the core of this function
to it as acpi_processor_pstate_control() and make it call that.
[The changes in processor_perflib.c related to this should not
make any functional difference for the acpi_processor_notify_smm()
users].

To be safe, only call acpi_processor_notify_smm() from intel_pstate
if ACPI _PPC support is enabled in it.

Suggested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2016-11-17 22:47:47 +01:00
Srinivas Pandruvada
7f7a516ee3 cpufreq: intel_pstate: Use CPU load based algorithm for PM_MOBILE
Use get_target_pstate_use_cpu_load() to calculate target P-State for
devices, with the preferred power management profile in ACPI FADT
set to PM_MOBILE.

This may help in resolving some thermal issues caused by low sustained
cpu bound workloads. The current algorithm tend to over provision in this
case as it doesn't look at the CPU busyness.

Also included the fix from Arnd Bergmann <arnd@arndb.de> to solve compile
issue, when CONFIG_ACPI is not defined.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-14 21:25:23 +01:00
Srinivas Pandruvada
a410c03d66 cpufreq: intel_pstate: protect limits variable
The limits variable gets modified from intel_pstate sysfs and also gets
modified from cpufreq sysfs. So protect with a mutex to keep data
integrity, when they are getting modified from multiple threads.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-01 06:10:54 +01:00
Srinivas Pandruvada
5879f87739 cpufreq: intel_pstate: Reduce impact due to rounding error
When policy->max and policy->min are same, in some cases they don't
result in the same frequency cap. The max_policy_pct is rounded up but
not min_perf_pct. So even when they are same, results in different
percentage or maximum and minimum.
Since minimum is a conservative value for power, a lower value without
rounding is better in most of the cases, unless user wants
policy->max = policy->min.
This change uses use the same policy percentage when policy->max and
policy->min are same.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-01 06:04:06 +01:00
Srinivas Pandruvada
eae48f046f cpufreq: intel_pstate: Per CPU P-State limits
Intel P-State offers two interface to set performance limits:
- Intel P-State sysfs
	/sys/devices/system/cpu/intel_pstate/max_perf_pct
	/sys/devices/system/cpu/intel_pstate/min_perf_pct
- cpufreq
	/sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
	/sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq

In the current implementation both of the above methods, change limits
to every CPU in the system. Moreover the limits placed using cpufreq
policy interface also presented in the Intel P-State sysfs via modified
max_perf_pct and min_per_pct during sysfs reads. This allows to check
percent of reduced/increased performance, irrespective of method used to
limit.

There are some new generations of processors, where it is possible to
have limits placed on individual CPU cores. Using cpufreq interface it
is possible to set limits on each CPU. But the current processing will
use last limits placed on all CPUs. So the per core limit feature of
CPUs can't be used.

This change brings in capability to set P-States limits for each CPU,
with some limitations. In this case what should be the read of
max_perf_pct and min_perf_pct? It can be most restrictive limits placed
on any CPU or max possible performance on any given CPU on which no
limits are placed. In either case someone will have issue.

So the consensus is, we can't have both sysfs controls present when user
wants to use limit per core limits.
- By default per-core-control feature is not enabled. So no one will
notice any difference.
- The way to enable is by kernel command line
intel_pstate=per_cpu_perf_limits
- When the per-core-controls are enabled there is no display of for both
read and write on
	/sys/devices/system/cpu/intel_pstate/max_perf_pct
	/sys/devices/system/cpu/intel_pstate/min_perf_pct
- User can change limits using
	/sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
	/sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
	/sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
- User can still observe turbo percent and number of P-States from
	/sys/devices/system/cpu/intel_pstate/turbo_pct
	/sys/devices/system/cpu/intel_pstate/num_pstates
- User can read write system wide turbo status
	/sys/devices/system/cpu/no_turbo

While changing this BUG_ON is changed to WARN_ON, as they are not fatal
errors for the system.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-01 06:04:06 +01:00
Rafael J. Wysocki
fe0f59c412 Merge back earlier cpufreq material for v4.10. 2016-10-30 06:12:50 +01:00
Rafael J. Wysocki
2f1d407ada cpufreq: intel_pstate: Always set max P-state in performance mode
The only times at which intel_pstate checks the policy set for
a given CPU is the initialization of that CPU and updates of its
policy settings from cpufreq when intel_pstate_set_policy() is
invoked.

That is insufficient, however, because intel_pstate uses the same
P-state selection function for all CPUs regardless of the policy
setting for each of them and the P-state limits are shared between
them.  Thus if the policy is set to "performance" for a particular
CPU, it may not behave as expected if the cpufreq settings are
changed subsequently for another CPU.

That can be easily demonstrated by writing "performance" to
scaling_governor for all CPUs and then switching it to "powersave"
for one of them in which case all of the CPUs will behave as though
their scaling_governor were all "powersave" (even though the policy
still appears to be "performance" for the remaining CPUs).

Fix this problem by modifying intel_pstate_adjust_busy_pstate() to
always set the P-state to the maximum allowed by the current limits
for all CPUs whose policy is set to "performance".

Note that it still is recommended to always change the policy setting
in the same way for all CPUs even with this fix applied to avoid
confusion.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-10-24 23:20:25 +02:00
Rafael J. Wysocki
a6c6ead141 cpufreq: intel_pstate: Set P-state upfront in performance mode
After commit a4675fbc4a (cpufreq: intel_pstate: Replace timers with
utilization update callbacks) the cpufreq governor callbacks may not
be invoked on NOHZ_FULL CPUs and, in particular, switching to the
"performance" policy via sysfs may not have any effect on them.  That
is a problem, because it usually is desirable to squeeze the last
bit of performance out of those CPUs, so work around it by setting
the maximum P-state (within the limits) in intel_pstate_set_policy()
upfront when the policy is CPUFREQ_POLICY_PERFORMANCE.

Fixes: a4675fbc4a (cpufreq: intel_pstate: Replace timers with utilization update callbacks)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2016-10-21 22:18:22 +02:00
Srinivas Pandruvada
185d82456e cpufreq: intel_pstate: Remove PID debugfs when not used
When target state is calculated using get_target_pstate_use_cpu_load(),
PID controller is not used, hence it has no effect on performance.
So don't present debugfs entries to tune PID controller.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-10-21 22:16:26 +02:00
Rafael J. Wysocki
1d29815ef2 cpufreq: intel_pstate: Drop boost_iowait flag
The "IOwait boosting" mechanism is only used by the
get_target_pstate_use_cpu_load() governor function and the
boost_iowait flag in pid_params is always set when that function
is in use (and it is never set otherwise).  This means that the
boost_iowait flag is in fact redundant and may be dropped.

For this reason, replace the boost_iowait flag check in
intel_pstate_update_util() with an equivalent check against
pstate_funcs.get_target_pstate and drop that flag.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2016-10-21 22:13:51 +02:00
Rafael J. Wysocki
3954517e2f cpufreq: intel_pstate: Fix struct pstate_adjust_policy kerneldoc
It looks like the name of struct pstate_adjust_policy was updated
without updating its kerneldoc comment accordingly, so fix that
mistake.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-10-12 20:58:14 +02:00
Rafael J. Wysocki
0843e83c1a cpufreq: intel_pstate: Proportional algorithm for Atom
The PID algorithm used by the intel_pstate driver tends to drive
performance to the minimum for workloads with utilization below the
setpoint, which is undesirable, so replace it with a modified
"proportional" algorithm on Atom.

The new algorithm will set the new P-state to be 1.25 times the
available maximum times the (frequency-invariant) utilization during
the previous sampling period except when the target P-state computed
this way is lower than the average P-state during the previous
sampling period.  In the latter case, it will increase the target by
50% of the difference between it and the average P-state to prevent
performance from dropping down too fast in some cases.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2016-10-12 20:58:13 +02:00
Rafael J. Wysocki
f00593a4bd cpufreq: intel_pstate: Clarify comment in get_target_pstate_use_performance()
Make the comment explaining the meaning of the perf_scaled variable
in get_target_pstate_use_performance() more straightforward.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-10-09 18:54:57 +02:00
Srinivas Pandruvada
f9f4872df6 cpufreq: intel_pstate: Fix unsafe HWP MSR access
This is a requirement that MSR MSR_PM_ENABLE must be set to 0x01 before
reading MSR_HWP_CAPABILITIES on a given CPU. If cpufreq init() is
scheduled on a CPU which is not same as policy->cpu or migrates to a
different CPU before calling msr read for MSR_HWP_CAPABILITIES, it
is possible that MSR_PM_ENABLE was not to set to 0x01 on that CPU.
This will cause GP fault. So like other places in this path
rdmsrl_on_cpu should be used instead of rdmsrl.

Moreover the scope of MSR_HWP_CAPABILITIES is on per thread basis, so it
should be read from the same CPU, for which MSR MSR_HWP_REQUEST is
getting set.

dmesg dump or warning:

[   22.014488] WARNING: CPU: 139 PID: 1 at arch/x86/mm/extable.c:50 ex_handler_rdmsr_unsafe+0x68/0x70
[   22.014492] unchecked MSR access error: RDMSR from 0x771
[   22.014493] Modules linked in:
[   22.014507] CPU: 139 PID: 1 Comm: swapper/0 Not tainted 4.7.5+ #1
...
...
[   22.014516] Call Trace:
[   22.014542]  [<ffffffff813d7dd1>] dump_stack+0x63/0x82
[   22.014558]  [<ffffffff8107bc8b>] __warn+0xcb/0xf0
[   22.014561]  [<ffffffff8107bcff>] warn_slowpath_fmt+0x4f/0x60
[   22.014563]  [<ffffffff810676f8>] ex_handler_rdmsr_unsafe+0x68/0x70
[   22.014564]  [<ffffffff810677d9>] fixup_exception+0x39/0x50
[   22.014604]  [<ffffffff8102e400>] do_general_protection+0x80/0x150
[   22.014610]  [<ffffffff817f9ec8>] general_protection+0x28/0x30
[   22.014635]  [<ffffffff81687940>] ? get_target_pstate_use_performance+0xb0/0xb0
[   22.014642]  [<ffffffff810600c7>] ? native_read_msr+0x7/0x40
[   22.014657]  [<ffffffff81688123>] intel_pstate_hwp_set+0x23/0x130
[   22.014660]  [<ffffffff81688406>] intel_pstate_set_policy+0x1b6/0x340
[   22.014662]  [<ffffffff816829bb>] cpufreq_set_policy+0xeb/0x2c0
[   22.014664]  [<ffffffff81682f39>] cpufreq_init_policy+0x79/0xe0
[   22.014666]  [<ffffffff81682cb0>] ? cpufreq_update_policy+0x120/0x120
[   22.014669]  [<ffffffff816833a6>] cpufreq_online+0x406/0x820
[   22.014671]  [<ffffffff8168381f>] cpufreq_add_dev+0x5f/0x90
[   22.014717]  [<ffffffff81530ac8>] subsys_interface_register+0xb8/0x100
[   22.014719]  [<ffffffff816821bc>] cpufreq_register_driver+0x14c/0x210
[   22.014749]  [<ffffffff81fe1d90>] intel_pstate_init+0x39d/0x4d5
[   22.014751]  [<ffffffff81fe13f2>] ? cpufreq_gov_dbs_init+0x12/0x12

Cc: 4.3+ <stable@vger.kernel.org> # 4.3+
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-10-09 18:54:13 +02:00
Rafael J. Wysocki
b6e2511782 Merge branch 'pm-cpufreq-sched' into pm-cpufreq 2016-10-02 01:42:33 +02:00
Srinivas Pandruvada
3ba7bcaa36 cpufreq: intel_pstate: Add io_boost trace
Add io_boost percent to current pstate_sample tracepoint.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-09-16 23:55:30 +02:00
Rafael J. Wysocki
09c448d3c6 cpufreq: intel_pstate: Use IOWAIT flag in Atom algorithm
Modify the P-state selection algorithm for Atom processors to use
the new SCHED_CPUFREQ_IOWAIT flag instead of the questionable
get_cpu_iowait_time_us() function.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-09-14 02:28:13 +02:00
Julia Lawall
42ce8921cc intel_pstate: constify local structures
For structure types defined in the same file or local header files, find
top-level static structure declarations that have the following
properties:
1. Never reassigned.
2. Address never taken
3. Not passed to a top-level macro call
4. No pointer or array-typed field passed to a function or stored in a
variable.
Declare structures having all of these properties as const.

Done using Coccinelle.
Based on a suggestion by Joe Perches <joe@perches.com>.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-09-13 02:40:24 +02:00
Rafael J. Wysocki
58919e83c8 cpufreq / sched: Pass flags to cpufreq_update_util()
It is useful to know the reason why cpufreq_update_util() has just
been called and that can be passed as flags to cpufreq_update_util()
and to the ->func() callback in struct update_util_data.  However,
doing that in addition to passing the util and max arguments they
already take would be clumsy, so avoid it.

Instead, use the observation that the schedutil governor is part
of the scheduler proper, so it can access scheduler data directly.
This allows the util and max arguments of cpufreq_update_util()
and the ->func() callback in struct update_util_data to be replaced
with a flags one, but schedutil has to be modified to follow.

Thus make the schedutil governor obtain the CFS utilization
information from the scheduler and use the "RT" and "DL" flags
instead of the special utilization value of ULONG_MAX to track
updates from the RT and DL sched classes.  Make it non-modular
too to avoid having to export scheduler variables to modules at
large.

Next, update all of the other users of cpufreq_update_util()
and the ->func() callback in struct update_util_data accordingly.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-08-16 22:14:55 +02:00
Rafael J. Wysocki
e2b3b80de5 Merge branches 'pm-sleep', 'pm-cpufreq', 'pm-core' and 'pm-opp'
* pm-sleep:
  x86/power/64: Do not refer to __PAGE_OFFSET from assembly code

* pm-cpufreq:
  cpufreq: Do not default-yes CPU_FREQ_STAT
  cpufreq: intel_pstate: Add more out-of-band IDs

* pm-core:
  PM-wakeup: Delete unnecessary checks before three function calls

* pm-opp:
  PM / OPP: optimize dev_pm_opp_set_rate() performance a bit
2016-08-05 15:46:55 +02:00
Srinivas Pandruvada
65c1262f40 cpufreq: intel_pstate: Add more out-of-band IDs
Add Skylake-X and Broadwell-X IDs for out-of-band (OBB) control of
P-States.

For these processors, if MSR_MISC_PWR_MGMT BIT(8) == 1, then the
Intel P-State driver should exit as OS can't control P-States.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw : Subject/changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-28 23:58:17 +02:00
Rafael J. Wysocki
bc841e260c Merge branch 'pm-cpu'
* pm-cpu:
  x86: remove duplicate turbo ratio limit MSRs
  tools/power turbostat: Replace MSR_NHM_TURBO_RATIO_LIMIT
  cpufreq: intel_pstate: Replace MSR_NHM_TURBO_RATIO_LIMIT
2016-07-25 13:46:30 +02:00
Srinivas Pandruvada
da7de91c3e cpufreq: intel_pstate: Check cpuid for MSR_HWP_INTERRUPT
The MSR MSR_HWP_INTERRUPT is valid only when CPUID.06H:EAX[8] = 1, so
check for feature before accessing this MSR.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-21 14:29:30 +02:00
Rafael J. Wysocki
bc95a454b6 intel_pstate: Update cpu_frequency tracepoint every time
Currently, intel_pstate only updates the cpu_frequency tracepoint
if the new P-state to set is different from the current one, but
that causes powertop to report 100% idle on an 100% loaded system
sometimes.

Prevent that from happening by updating the cpu_frequency tracepoint
every time intel_pstate_update_pstate() is called.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>-
2016-07-21 14:28:37 +02:00
Carsten Emde
2630abc243 cpufreq: intel_pstate: clean remnant struct element
When I was working with the Intel P state driver I came across a
remnant struct element that is no longer needed after the function
intel_pstate_calc_freq() was retired.

Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-21 14:26:00 +02:00
Jan Kiszka
5fc8f707a2 intel_pstate: Fix MSR_CONFIG_TDP_x addressing in core_get_max_pstate()
If MSR_CONFIG_TDP_CONTROL is locked, we currently try to address some
MSR 0x80000648 or so. Mask out the relevant level bits 0 and 1.

Found while running over the Jailhouse hypervisor which became upset
about this strange MSR index.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 4.4+ <stable@vger.kernel.org> # 4.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-11 15:12:30 +02:00
Srinivas Pandruvada
100cf6f277 cpufreq: intel_pstate: Replace MSR_NHM_TURBO_RATIO_LIMIT
Replace MSR_NHM_TURBO_RATIO_LIMIT with MSR_TURBO_RATIO_LIMIT.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-07 15:31:58 +02:00
Rafael J. Wysocki
5d1191ab6c Merge back earlier cpufreq material for v4.8. 2016-07-04 13:21:43 +02:00
Jisheng Zhang
4a7cb7a96a intel_pstate: Declare pid_params/pstate_funcs/hwp_active __read_mostly
pid_params is written once by copy_pid_params() during initialization,
and thereafter is mostly read by hot path intel_pstate_update_util().
The read of pid_params gets more after commit a4675fbc4a ("cpufreq:
intel_pstate: Replace timers with utilization update callbacks")

pstate_funcs is written once by copy_cpu_funcs() during initialization,
and thereafter is mostly read by hot path intel_pstate_update_util()

hwp_active is written to once during initialization and thereafter is
mostly read by hot path intel_pstate_update_util().

The fact that they are mostly read and not written to makes them
candidates for __read_mostly declarations.

Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-06-28 00:04:04 +02:00
Jisheng Zhang
29327c84ba intel_pstate: add __init/__initdata marker to some functions/variables
These functions/variables are not needed after booting, so mark them
as __init or __initdata.

Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-06-28 00:04:04 +02:00
Jisheng Zhang
eed436095e intel_pstate: Fix incorrect placement of __initdata
__initdata should be placed between the variable name and equal sign
(if there is) for the variable to be placed in the intended section.

Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-06-28 00:04:04 +02:00
Rafael J. Wysocki
5ab666e095 intel_pstate: Do not clear utilization update hooks on policy changes
intel_pstate_set_policy() is invoked by the cpufreq core during
driver initialization, on changes of policy attributes (minimim and
maximum frequency, for example) via sysfs and via CPU notifications
from the platform firmware.  On some platforms the latter may occur
relatively often.

Commit bb6ab52f2b (intel_pstate: Do not set utilization update hook
too early) made intel_pstate_set_policy() clear the CPU's utilization
update hook before updating the policy attributes for it (and set the
hook again after doind that), but that involves invoking
synchronize_sched() and adds overhead to the CPU notifications
mentioned above and to the sched-RCU handling in general.

That extra overhead is arguably not necessary, because updating
policy attributes when the CPU's utilization update hook is active
should not lead to any adverse effects, so drop the clearing of
the hook from intel_pstate_set_policy() and make it check if
the hook has been set already when attempting to set it.

Fixes: bb6ab52f2b (intel_pstate: Do not set utilization update hook too early)
Reported-by: Jisheng Zhang <jszhang@marvell.com>
Tested-by: Jisheng Zhang <jszhang@marvell.com>
Tested-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-06-27 23:47:15 +02:00
Rafael J. Wysocki
56b7808572 Merge back earlier cpufreq changes for v4.8. 2016-06-20 14:31:41 +02:00
Srinivas Pandruvada
b00345d199 cpufreq: intel_pstate: Adjust _PSS[0] freqeuency if needed
The maximum turbo P-State used by the intel_pstate driver may be
limited by ACPI _PSS table entry 0.  After commit 9522a2ff9c
(cpufreq: intel_pstate: Enforce _PPC limits), the maximum performance
on servers will be capped by the _PSS table entry 0 by default.

Even though that is formally correct, it may lead to preformance
regressions in some cases.  Namely, if the _PSS table entry 0 is
not the maximum turbo P-State, performance measured after commit
9522a2ff9c will not match the performance measured before that
commit on the same system.

For this reason, modify the code to always use the maximum turbo
frequency as the one that corresponds to _PSS table entry 0 if turbo
is enabled in the BIOS.  This way, the performance levels from
before commit 9522a2ff9c will be restored on the affected systems.

Fixes: 9522a2ff9c (cpufreq: intel_pstate: Enforce _PPC limits)
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw : Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-06-15 01:56:47 +02:00
Srinivas Pandruvada
41bad47f76 cpufreq: intel_pstate: Broxton support
Add Broxton CPU model number.

Broxton requires core_params to get performance limits via MSRs, but
it is an Atom platform, which requires more power optimized algorithm.

So the P state selection will use similar algorithm as other Atom
platforms.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-06-13 23:49:39 +02:00
Rafael J. Wysocki
b77b565108 Merge branch 'x86/cpu' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into x86/cpu
Pull recent changes related to x86 CPU model representations from tip.
2016-06-13 23:48:23 +02:00
Dave Hansen
5b20c94488 x86/cpufreq: Use Intel family name macros for the intel_pstate cpufreq driver
Another straightforward replacement of magic numbers.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: jacob.jun.pan@intel.com
Cc: linux-pm@vger.kernel.org
Link: http://lkml.kernel.org/r/20160603001945.0F5D02AA@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-08 13:03:26 +02:00
Srinivas Pandruvada
983e600e88 cpufreq: intel_pstate: Fix ->set_policy() interface for no_turbo
When turbo is disabled, the ->set_policy() interface is broken.

For example, when turbo is disabled and cpuinfo.max = 2900000 (full
max turbo frequency), setting the limits results in frequency less
than the requested one:
Set 1000000 KHz results in 0700000 KHz
Set 1500000 KHz results in 1100000 KHz
Set 2000000 KHz results in  1500000 KHz

This is because the limits->max_perf fraction is calculated using
the max turbo frequency as the reference, but when the max P-State is
capped in intel_pstate_get_min_max(), the reference is not the max
turbo P-State. This results in reducing max P-State.

One option is to always use max turbo as reference for calculating
limits. But this will not be correct. By definition the intel_pstate
sysfs limits, shows percentage of available performance. So when
BIOS has disabled turbo, the available performance is max non turbo.
So the max_perf_pct should still show 100%.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw : Subject & changelog, rewrite in fewer lines of code ]
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-06-08 03:22:40 +02:00
Srinivas Pandruvada
2c2c1af449 cpufreq: intel_pstate: Fix code ordering in intel_pstate_set_policy()
The limits->max_perf is rounded_up but immediately overwritten by
another assignment to limits->max_perf.

Move that operation to the correct location.

While here also added a pr_debug() call in ->set_policy to aid in
debugging.

Fixes: 785ee27881 (cpufreq: intel_pstate: Fix limits->max_perf rounding error)
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw : Subject & changelog ]
Cc: 4.4+ <stable@vger.kernel.org> # 4.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-06-08 03:22:39 +02:00
Srinivas Pandruvada
6cacd115a8 cpufreq: intel_pstate: Downgrade print level for _PPC
Downgrade pr_info to pr_debug for the "_PPC limits will be enforced"
message.

In server systems with many cores this message is annoying.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-30 15:22:02 +02:00
Rafael J. Wysocki
c749c64f45 intel_pstate: Simplify conditional in intel_pstate_set_policy()
One of the if () statements in intel_pstate_set_policy() causes
another if () to be evaluated if the condition is true and it
doesn't do anything else, so merge the two if () statements into
one.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2016-05-18 02:26:33 +02:00
Rafael J. Wysocki
1aa7a6e2b8 intel_pstate: Clean up get_target_pstate_use_performance()
The comments and the core_busy variable name in
get_target_pstate_use_performance() are totally confusing,
so modify them to reflect what's going on.

The results of the computations should be the same as before.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-11 22:58:38 +02:00
Rafael J. Wysocki
8edb0a6e48 intel_pstate: Use sample.core_avg_perf in get_avg_pstate()
Notice that get_avg_pstate() can use sample.core_avg_perf instead of
carrying the same division again, so make it do that.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-11 22:58:37 +02:00
Rafael J. Wysocki
a1c9787dc3 intel_pstate: Clarify average performance computation
The core_pct_busy field of struct sample actually contains the
average performace during the last sampling period (in percent)
and not the utilization of the core as suggested by its name
which is confusing.

For this reason, change the name of that field to core_avg_perf
and rename the function that computes its value accordingly.

Also notice that storing this value as percentage requires a costly
integer multiplication to be carried out in a hot path, so instead
store it as an "extended fixed point" value with more fraction bits
and update the code using it accordingly (it is better to change the
name of the field along with its meaning in one go than to make those
two changes separately, as that would likely lead to more
confusion).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-11 22:58:37 +02:00
Chen Yu
4578ee7e1d intel_pstate: Avoid unnecessary synchronize_sched() during initialization
Currently, in intel_pstate_clear_update_util_hook(), after
clearing the utilization update hook, we leverage
synchronize_sched() to deal with synchronization, which
is a little bit time-costly because synchronize_sched()
has to wait for all the CPUs to go through a grace period.

Actually, the synchronize_sched() is not necessary if the utilization
update hook has not been set for the given CPU yet, so make the driver
check if that's the case and avoid the synchronize_sched() call then.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=116371
Tested-by: Tian Ye <yex.tian@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
[ rjw : Rebase ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-11 22:56:34 +02:00
Rafael J. Wysocki
f96fd0c86f intel_pstate: Clean up intel_pstate_get()
intel_pstate_get() contains a local variable that's initialized but
never used and it can be written in fewer lines of code, so clean
it up.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2016-05-09 22:55:36 +02:00
Rafael J. Wysocki
da43af961b Merge cpufreq fixes going into v4.6.
* pm-cpufreq-fixes:
  intel_pstate: Fix intel_pstate_get()
  cpufreq: intel_pstate: Fix HWP on boot CPU after system resume
  cpufreq: st: enable selective initialization based on the platform
  cpufreq: intel_pstate: Fix processing for turbo activation ratio
2016-05-06 22:01:14 +02:00
Srinivas Pandruvada
e59a8f7ff4 cpufreq: intel_pstate: Ignore _PPC processing under HWP
When HWP (hardware P states) feature is active, the ACPI _PSS and _PPC
is not used. So ignore processing for _PPC limits.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-05 01:43:47 +02:00
Rafael J. Wysocki
6d45b719cb intel_pstate: Fix intel_pstate_get()
After commit 8fa520af50 "intel_pstate: Remove freq calculation from
intel_pstate_calc_busy()" intel_pstate_get() calls get_avg_frequency()
to compute the average frequency, which is problematic for two reasons.

First, intel_pstate_get() may be invoked before the driver reads the
CPU feedback registers for the first time and if that happens,
get_avg_frequency() will attempt to divide by zero.

Second, the get_avg_frequency() call in intel_pstate_get() is racy
with respect to intel_pstate_sample() and it may end up returning
completely meaningless values for this reason.

Moreover, after commit 7349ec0470 "intel_pstate: Move
intel_pstate_calc_busy() into get_target_pstate_use_performance()"
sample.core_pct_busy is never computed on Atom, but it is used in
intel_pstate_adjust_busy_pstate() in that case too.

To address those problems notice that if sample.core_pct_busy
was used in the average frequency computation carried out by
get_avg_frequency(), both the divide by zero problem and the
race with respect to intel_pstate_sample() would be avoided.

Accordingly, move the invocation of intel_pstate_calc_busy() from
get_target_pstate_use_performance() to intel_pstate_update_util(),
which also will take care of the uninitialized sample.core_pct_busy
on Atom, and modify get_avg_frequency() to use sample.core_pct_busy
as per the above.

Reported-by: kernel test robot <ying.huang@linux.intel.com>
Link: http://marc.info/?l=linux-kernel&m=146226437623173&w=4
Fixes: 8fa520af50 "intel_pstate: Remove freq calculation from intel_pstate_calc_busy()"
Fixes: 7349ec0470 "intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()"
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-04 14:09:16 +02:00