Commit Graph

442426 Commits

Author SHA1 Message Date
Dan Carpenter
fa93384f40 sched: Fix signedness bug in yield_to()
yield_to() is supposed to return -ESRCH if there is no task to
yield to, but because the type is bool that is the same as returning
true.

The only place I see which cares is kvm_vcpu_on_spin().

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Raghavendra <raghavendra.kt@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Link: http://lkml.kernel.org/r/20140523102042.GA7267@mwanda
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 11:52:13 +02:00
Manuel Schölling
2538d960d0 sched/fair: Use time_after() in record_wakee()
To be future-proof and for better readability the time comparisons are modified
to use time_after() instead of plain, error-prone math.

Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1400780723-24626-1-git-send-email-manuel.schoelling@gmx.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 11:52:02 +02:00
Tim Chen
ed61bbc69c sched/balancing: Reduce the rate of needless idle load balancing
The current no_hz idle load balancer do load balancing for *all* idle cpus,
even though the time due to load balance for a particular
idle cpu could be still a while in the future.  This introduces a much
higher load balancing rate than what is necessary.  The patch
changes the behavior by only doing idle load balancing on
behalf of an idle cpu only when it is due for load balancing.

On SGI's systems with over 3000 cores, the cpu responsible for idle balancing
got overwhelmed with idle balancing, and introduces a lot of OS noise
to workloads.  This patch fixes the issue.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: Russ Anderson <rja@sgi.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Jason Low <jason.low2@hp.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Hedi Berriche <hedi@sgi.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: MichelLespinasse <walken@google.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1400621967.2970.280.camel@schen9-DESK
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 11:52:01 +02:00
Ben Segall
51f2176d74 sched/fair: Fix unlocked reads of some cfs_b->quota/period
sched_cfs_period_timer() reads cfs_b->period without locks before calling
do_sched_cfs_period_timer(), and similarly unthrottle_offline_cfs_rqs()
would read cfs_b->period without the right lock. Thus a simultaneous
change of bandwidth could cause corruption on any platform where ktime_t
or u64 writes/reads are not atomic.

Extend cfs_b->lock from do_sched_cfs_period_timer() to include the read of
cfs_b->period to solve that issue; unthrottle_offline_cfs_rqs() can just
use 1 rather than the exact quota, much like distribute_cfs_runtime()
does.

There is also an unlocked read of cfs_b->runtime_expires, but a race
there would only delay runtime expiry by a tick. Still, the comparison
should just be != anyway, which clarifies even that problem.

Signed-off-by: Ben Segall <bsegall@google.com>
Tested-by: Roman Gushchin <klamm@yandex-team.ru>
[peterz: Fix compile warn]
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140519224945.20303.93530.stgit@sword-of-the-dawn.mtv.corp.google.com
Cc: pjt@google.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 11:52:00 +02:00
Rik van Riel
096aa33863 sched/numa: Decay ->wakee_flips instead of zeroing
Affine wakeups have the potential to interfere with NUMA placement.
If a task wakes up too many other tasks, affine wakeups will get
disabled.

However, regardless of how many other tasks it wakes up, it gets
re-enabled once a second, potentially interfering with NUMA
placement of other tasks.

By decaying wakee_wakes in half instead of zeroing it, we can avoid
that problem for some workloads.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: chegu_vinod@hp.com
Cc: umgwanakikbuti@gmail.com
Link: http://lkml.kernel.org/r/20140516001332.67f91af2@annuminas.surriel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 11:16:41 +02:00
Rik van Riel
b1ad065e65 sched/numa: Update migrate_improves/degrades_locality()
Update the migrate_improves/degrades_locality() functions with
knowledge of pseudo-interleaving.

Do not consider moving tasks around within the set of group's active
nodes as improving or degrading locality. Instead, leave the load
balancer free to balance the load between a numa_group's active nodes.

Also, switch from the group/task_weight functions to the group/task_fault
functions. The "weight" functions involve a division, but both calls use
the same divisor, so there's no point in doing that from these functions.

On a 4 node (x10 core) system, performance of SPECjbb2005 seems
unaffected, though the number of migrations with 2 8-warehouse wide
instances seems to have almost halved, due to the scheduler running
each instance on a single node.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: mgorman@suse.de
Cc: chegu_vinod@hp.com
Link: http://lkml.kernel.org/r/20140515130306.61aae7db@cuia.bos.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 11:16:39 +02:00
Rik van Riel
e63da03639 sched/numa: Allow task switch if load imbalance improves
Currently the NUMA balancing code only allows moving tasks between NUMA
nodes when the load on both nodes is in balance. This breaks down when
the load was imbalanced to begin with.

Allow tasks to be moved between NUMA nodes if the imbalance is small,
or if the new imbalance is be smaller than the original one.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: mgorman@suse.de
Cc: chegu_vinod@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/20140514132221.274b3463@annuminas.surriel.com
2014-05-22 11:16:38 +02:00
xiaofeng.yan
4027d08085 sched/rt: Fix 'struct sched_dl_entity' and dl_task_time() comments, to match the current upstream code
Signed-off-by: xiaofeng.yan <xiaofeng.yan@huawei.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1399605687-18094-1-git-send-email-xiaofeng.yan@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 11:16:37 +02:00
Dongsheng Yang
7aa2c016db sched: Consolidate open coded implementations of nice level frobbing into nice_to_rlimit() and rlimit_to_nice()
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/a568a1e3cc8e78648f41b5035fa5e381d36274da.1399532322.git.yangds.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 11:16:36 +02:00
Corey Minyard
a803f0261b sched: Initialize rq->age_stamp on processor start
If the sched_clock time starts at a large value, the kernel will spin
in sched_avg_update for a long time while rq->age_stamp catches up
with rq->clock.

The comment in kernel/sched/clock.c says that there is no strict promise
that it starts at zero.  So initialize rq->age_stamp when a cpu starts up
to avoid this.

I was seeing long delays on a simulator that didn't start the clock at
zero.  This might also be an issue on reboots on processors that don't
re-initialize the timer to zero on reset, and when using kexec.

Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1399574859-11714-1-git-send-email-minyard@acm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 11:16:35 +02:00
Kirill Tkhai
7246544786 sched, nohz: Change rq->nr_running to always use wrappers
Sometimes ->nr_running may cross 2 but interrupt is not being
sent to rq's cpu. In this case we don't reenable the timer.
Looks like this may be the reason for rare unexpected effects,
if nohz is enabled.

Patch replaces all places of direct changing of nr_running
and makes add_nr_running() caring about crossing border.

Signed-off-by: Kirill Tkhai <tkhai@yandex.ru>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140508225830.2469.97461.stgit@localhost
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 11:16:33 +02:00
Jason Low
52a08ef1f1 sched: Fix the rq->next_balance logic in rebalance_domains() and idle_balance()
Currently, in idle_balance(), we update rq->next_balance when we pull_tasks.
However, it is also important to update this in the !pulled_tasks case too.

When the CPU is "busy" (the CPU isn't idle), rq->next_balance gets computed
using sd->busy_factor (so we increase the balance interval when the CPU is
busy). However, when the CPU goes idle, rq->next_balance could still be set
to a large value that was computed with the sd->busy_factor.

Thus, we need to also update rq->next_balance in idle_balance() in the cases
where !pulled_tasks too, so that rq->next_balance gets updated without taking
the busy_factor into account when the CPU is about to go idle.

This patch makes rq->next_balance get updated independently of whether or
not we pulled_task. Also, we add logic to ensure that we always traverse
at least 1 of the sched domains to get a proper next_balance value for
updating rq->next_balance.

Additionally, since load_balance() modifies the sd->balance_interval, we
need to re-obtain the sched domain's interval after the call to
load_balance() in rebalance_domains() before we update rq->next_balance.

This patch adds and uses 2 new helper functions, update_next_balance() and
get_sd_balance_interval() to update next_balance and obtain the sched
domain's balance_interval.

Signed-off-by: Jason Low <jason.low2@hp.com>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: daniel.lezcano@linaro.org
Cc: alex.shi@linaro.org
Cc: efault@gmx.de
Cc: vincent.guittot@linaro.org
Cc: morten.rasmussen@arm.com
Cc: aswin@hp.com
Link: http://lkml.kernel.org/r/1399596562.2200.7.camel@j-VirtualBox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 11:16:32 +02:00
Dongsheng Yang
a9467fa3cd sched: Use clamp() and clamp_val() to make sys_nice() more readable
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1399541715-19568-1-git-send-email-yangds.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 11:16:31 +02:00
Dietmar Eggemann
caffcdd8d2 sched: Do not zero sg->cpumask and sg->sgp->power in build_sched_groups()
There is no need to zero struct sched_group member cpumask and struct
sched_group_power member power since both structures are already allocated
as zeroed memory in __sdt_alloc().

This patch has been tested with
BUG_ON(!cpumask_empty(sched_group_cpus(sg))); and BUG_ON(sg->sgp->power);
in build_sched_groups() on ARM TC2 and INTEL i5 M520 platform including
CPU hotplug scenarios.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1398865178-12577-1-git-send-email-dietmar.eggemann@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 11:16:30 +02:00
Vincent Guittot
c515db8cd3 sched/numa: Fix initialization of sched_domain_topology for NUMA
Jet Chen has reported a kernel panics when booting qemu-system-x86_64 with
kvm64 cpu. A panic occured while building the sched_domain.

In sched_init_numa, we create a new topology table in which both default
levels and numa levels are copied. The last row of the table must have a null
pointer in the mask field.

The current implementation doesn't add this last row in the computation of the
table size. So we add 1 row in the allocation size that will be used as the
last row of the table. The kzalloc will ensure that the mask field is NULL.

Reported-by: Jet Chen <jet.chen@intel.com>
Tested-by: Jet Chen <jet.chen@intel.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: fengguang.wu@intel.com
Link: http://lkml.kernel.org/r/1399972261-25693-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 11:16:29 +02:00
Rik van Riel
8bf21433f3 sched: Call select_idle_sibling() when not affine_sd
On smaller systems, the top level sched domain will be an affine
domain, and select_idle_sibling is invoked for every SD_WAKE_AFFINE
wakeup. This seems to be working well.

On larger systems, with the node distance between far away NUMA nodes
being > RECLAIM_DISTANCE, select_idle_sibling is only called if the
waker and the wakee are on nodes less than RECLAIM_DISTANCE apart.

This patch leaves in place the policy of not pulling the task across
nodes on such systems, while fixing the issue that select_idle_sibling
is not called at all in certain circumstances.

The code will look for an idle CPU in the same CPU package as the
CPU where the task ran previously.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: morten.rasmussen@arm.com
Cc: george.mccollister@gmail.com
Cc: ktkhai@parallels.com
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Link: http://lkml.kernel.org/r/20140514114037.2d93266f@annuminas.surriel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 11:16:28 +02:00
Michael Kerrisk
2240067494 sched: Simplify return logic in sched_read_attr()
Gotos are chained pointlessly here, and the 'out' label
can be dispensed with.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/536CEC29.9090503@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 11:16:27 +02:00
Michael Kerrisk
e78c7bca56 sched: Simplify return logic in sched_copy_attr()
The logic in this function is a little contorted, clean it up:

  * Rather than having chained gotos for the -EFBIG case, just
    return -EFBIG directly.

  * Now, the label 'out' is no longer needed, and 'ret' must be zero
    zero by the time we fall through to this point, so just return 0.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/536CEC24.9080201@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 11:16:26 +02:00
Ben Segall
3944a9274e sched: Fix exec_start/task_hot on migrated tasks
task_hot checks exec_start on any runnable task, but if it has been
migrated since the it last ran, then exec_start is a clock_task from
another cpu. If the old cpu's clock_task was sufficiently far ahead of
this cpu's then the task will not be considered for another migration
until it has run. Instead reset exec_start whenever a task is migrated,
since it is presumably no longer hot anyway.

Signed-off-by: Ben Segall <bsegall@google.com>
[ Made it compile. ]
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140515225920.7179.13924.stgit@sword-of-the-dawn.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 11:16:25 +02:00
Ingo Molnar
6669dc8907 Merge branch 'sched/urgent' into sched/core to avoid conflicts with upcoming changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 10:55:03 +02:00
Ingo Molnar
ec6e7f4082 Merge branch 'pm-cpuidle' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm into sched/core
Pull scheduling related CPU idle updates from Rafael J. Wysocki.

Conflicts:
	kernel/sched/idle.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 10:37:06 +02:00
Ingo Molnar
65c2ce7004 Merge tag 'v3.15-rc6' into sched/core, to pick up the latest fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 10:28:56 +02:00
Peter Zijlstra
842514849a arm64: Remove TIF_POLLING_NRFLAG
The only idle method for arm64 is WFI and it therefore
unconditionally requires the reschedule interrupt when idle.

Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: http://lkml.kernel.org/r/20140509170649.GG13658@twins.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 10:22:58 +02:00
James Hogan
31e6cdefd3 metag: Remove TIF_POLLING_NRFLAG
The Meta idle function jumps into the interrupt handler which
efficiently blocks waiting for the next interrupt when it reads the
interrupt status register (TXSTATI). No other (polling) idle functions
can be used, therefore TIF_POLLING_NRFLAG is unnecessary, so lets remove
it.

Peter Zijlstra said:
> Most archs have (x86) hlt or (arm) wfi like idle instructions, and if
> that is your only possible idle function, you'll require the interrupt
> to wake up and there's really no point to having the POLLING bit.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/536CEB7E.9080007@imgtec.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 10:22:10 +02:00
Lai Jiangshan
6acbfb9697 sched: Fix hotplug vs. set_cpus_allowed_ptr()
Lai found that:

  WARNING: CPU: 1 PID: 13 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x2d/0x4b()
  ...
  migration_cpu_stop+0x1d/0x22

was caused by set_cpus_allowed_ptr() assuming that cpu_active_mask is
always a sub-set of cpu_online_mask.

This isn't true since 5fbd036b55 ("sched: Cleanup cpu_active madness").

So set active and online at the same time to avoid this particular
problem.

Fixes: 5fbd036b55 ("sched: Cleanup cpu_active madness")
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael wang <wangyun@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/53758B12.8060609@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 10:21:31 +02:00
Peter Zijlstra
4dac0b6383 sched/cpupri: Replace NR_CPUS arrays
Tejun reported that his resume was failing due to order-3 allocations
from sched_domain building.

Replace the NR_CPUS arrays in there with a dynamically allocated
array.

Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-7cysnkw1gik45r864t1nkudh@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 10:21:29 +02:00
Peter Zijlstra
944770ab54 sched/deadline: Replace NR_CPUS arrays
Tejun reported that his resume was failing due to order-3 allocations
from sched_domain building.

Replace the NR_CPUS arrays in there with a dynamically allocated
array.

Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-kat4gl1m5a6dwy6nzuqox45e@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 10:21:28 +02:00
Juri Lelli
b0827819b0 sched/deadline: Restrict user params max value to 2^63 ns
Michael Kerrisk noticed that creating SCHED_DEADLINE reservations
with certain parameters (e.g, a runtime of something near 2^64 ns)
can cause a system freeze for some amount of time.

The problem is that in the interface we have

 u64 sched_runtime;

while internally we need to have a signed runtime (to cope with
budget overruns)

 s64 runtime;

At the time we setup a new dl_entity we copy the first value in
the second. The cast turns out with negative values when
sched_runtime is too big, and this causes the scheduler to go crazy
right from the start.

Moreover, considering how we deal with deadlines wraparound

 (s64)(a - b) < 0

we also have to restrict acceptable values for sched_{deadline,period}.

This patch fixes the thing checking that user parameters are always
below 2^63 ns (still large enough for everyone).

It also rewrites other conditions that we check, since in
__checkparam_dl we don't have to deal with deadline wraparounds
and what we have now erroneously fails when the difference between
values is too big.

Reported-by: Michael Kerrisk <mtk.manpages@gmail.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Juri Lelli <juri.lelli@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Dario Faggioli<raistlin@linux.it>
Cc: Dave Jones <davej@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140513141131.20d944f81633ee937f256385@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 10:21:27 +02:00
Peter Zijlstra
ce5f7f8200 sched/deadline: Change sched_getparam() behaviour vs SCHED_DEADLINE
The way we read POSIX one should only call sched_getparam() when
sched_getscheduler() returns either SCHED_FIFO or SCHED_RR.

Given that we currently return sched_param::sched_priority=0 for all
others, extend the same behaviour to SCHED_DEADLINE.

Requested-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Dario Faggioli <raistlin@linux.it>
Cc: linux-man <linux-man@vger.kernel.org>
Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20140512205034.GH13467@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 10:21:26 +02:00
Peter Zijlstra
dbdb22754f sched: Disallow sched_attr::sched_policy < 0
The scheduler uses policy=-1 to preserve the current policy state to
implement sys_sched_setparam(), this got exposed to userspace by
accident through sys_sched_setattr(), cure this.

Reported-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: <stable@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140509085311.GJ30445@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 10:21:26 +02:00
Michael Kerrisk
143cf23df2 sched: Make sched_setattr() correctly return -EFBIG
The documented[1] behavior of sched_attr() in the proposed man page text is:

    sched_attr::size must be set to the size of the structure, as in
    sizeof(struct sched_attr), if the provided structure is smaller
    than the kernel structure, any additional fields are assumed
    '0'. If the provided structure is larger than the kernel structure,
    the kernel verifies all additional fields are '0' if not the
    syscall will fail with -E2BIG.

As currently implemented, sched_copy_attr() returns -EFBIG for
for this case, but the logic in sys_sched_setattr() converts that
error to -EFAULT. This patch fixes the behavior.

[1] http://thread.gmane.org/gmane.linux.kernel/1615615/focus=1697760

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/536CEC17.9070903@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 10:21:25 +02:00
Linus Torvalds
4b660a7f5c Linux 3.15-rc6 v3.15-rc6 2014-05-22 06:42:02 +09:00
Linus Torvalds
6538d62521 Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull two powerpc fixes from Ben Herrenschmidt:
 "Here are a couple of fixes for 3.15.  One from Anton fixes a nasty
  regression I introduced when trying to fix a loss of irq_work whose
  consequences is that we can completely lose timer interrupts on a
  CPU... not pretty.

  The other one is a change to our PCIe reset hook to use a firmware
  call instead of direct config space accesses to trigger a fundamental
  reset on the root port.  This is necessary so that the FW gets a
  chance to disable the link down error monitoring, which would
  otherwise trip and cause subsequent fatal EEH error"

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc: irq work racing with timer interrupt can result in timer interrupt hang
  powerpc/powernv: Reset root port in firmware
2014-05-22 05:55:12 +09:00
Linus Torvalds
11da37b263 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull two btrfs fixes from Chris Mason:
 "This has two fixes that we've been testing for 3.16, but since both
  are safe and fix real bugs, it makes sense to send for 3.15 instead"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: send, fix incorrect ref access when using extrefs
  Btrfs: fix EIO on reading file after ioctl clone works on it
2014-05-22 05:40:13 +09:00
Linus Torvalds
3062556903 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull two ceph fixes from Sage Weil:
 "The first patch fixes a problem when we have a page count of 0 for
  sendpage which is triggered by zfs.  The second fixes a bug in CRUSH
  that was resolved in the userland code a while back but fell through
  the cracks on the kernel side"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  crush: decode and initialize chooseleaf_vary_r
  libceph: fix corruption when using page_count 0 page in rbd
2014-05-22 05:38:51 +09:00
Linus Torvalds
5e9d9fc4ed Merge tag 'xfs-for-linus-3.15-rc6' of git://oss.sgi.com/xfs/xfs
Pull xfs fixes from Dave Chinner:
 "Code inspection of the XFS error number sign translations found a
  bunch of issues, including returning incorrectly signed errors for
  some data integrity operations.

  These leak to userspace and result in applications not getting the
  errors correctly reported.  Hence they need fixing sooner rather than
  later.

  A couple of the bugs are in data integrity operations, a couple more
  are in the new COLLAPSE_RANGE code.  One of these came in through a
  recent ext4 merge and so I had to update the base tree to 3.15-rc5
  before fixing the issues"

* tag 'xfs-for-linus-3.15-rc6' of git://oss.sgi.com/xfs/xfs:
  xfs: list_lru_init returns a negative error
  xfs: negate xfs_icsb_init_counters error value
  xfs: negate mount workqueue init error value
  xfs: fix wrong err sign on xfs_set_acl()
  xfs: fix wrong errno from xfs_initxattrs
  xfs: correct error sign on COLLAPSE_RANGE errors
  xfs: xfs_commit_metadata returns wrong errno
  xfs: fix incorrect error sign in xfs_file_aio_read
  xfs: xfs_dir_fsync() returns positive errno
2014-05-22 05:36:07 +09:00
Linus Torvalds
80932ec1c0 Merge branch 'renameat2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull renameat2 arch support from Miklos Szeredi:
 "I've collected architecture patches for the renameat2 syscall that
  maintainers acked and/or asked me to queue.

  This adds architecture support for the renameat2 syscall to m68k,
  parisc, ia64 and through asm-generic to arc, arm64, c6x, hexagon,
  metag, openrisc, score, tile, unicore32"

* 'renameat2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  scripts/checksyscalls.sh: Make renameat optional
  asm-generic: Add renameat2 syscall
  ia64: add renameat2 syscall
  parisc: add renameat2 syscall
  m68k: add renameat2 syscall
2014-05-22 05:34:57 +09:00
Linus Torvalds
989d216f86 Merge tag 'iommu-fixes-v3.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
 "Three fixes for the AMD IOMMU driver:
   - fix a locking issue around get_user_pages()
   - fix two issues with device aliasing and exclusion range handling"

* tag 'iommu-fixes-v3.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/amd: fix enabling exclusion range for an exact device
  iommu/amd: Take mmap_sem when calling get_user_pages
  iommu/amd: Fix interrupt remapping for aliased devices
2014-05-22 04:29:39 +09:00
Linus Torvalds
677d1bb0cb Merge tag 'stable/for-linus-3.15-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/ibft
Pull iscsi_ibft fix from Konrad Rzeszutek Wilk:
 "Fix iBFT regression on Broadcom NICs introduced in 3.2"

* tag 'stable/for-linus-3.15-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/ibft:
  iscsi_ibft: Fix finding Broadcom specific ibft sign
2014-05-22 04:28:21 +09:00
Linus Torvalds
f6ce579d91 Merge tag 'renesas-sh-drivers-for-v3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas
Pull SH driver fix from Simon Horman:
 "Compile drivers/sh/pm_runtime.c if ARCH_SHMOBILE_MULTI

  This resolves a regression introduced in v3.14 by commit bf98c1eac1
  ("ARM: Rename ARCH_SHMOBILE to ARCH_SHMOBILE_LEGACY")"

* tag 'renesas-sh-drivers-for-v3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
  drivers: sh: compile drivers/sh/pm_runtime.c if ARCH_SHMOBILE_MULTI
2014-05-22 04:26:23 +09:00
Linus Torvalds
fba69f042a Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
 "Most of the changes are drivers fixes (rtl28xuu, fc2580, ov7670,
  davinci, gspca, s5p-fimc and s5c73m3).

  There is also a compat32 fix and one infoleak fixup at the media
  controller"

* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  [media] V4L2: fix VIDIOC_CREATE_BUFS in 64- / 32-bit compatibility mode
  [media] V4L2: ov7670: fix a wrong index, potentially Oopsing the kernel from user-space
  [media] media-device: fix infoleak in ioctl media_enum_entities()
  [media] fc2580: fix tuning failure on 32-bit arch
  [media] Prefer gspca_sonixb over sn9c102 for all devices
  [media] media: davinci: vpfe: make sure all the buffers unmapped and released
  [media] staging: media: davinci: vpfe: make sure all the buffers are released
  [media] media: davinci: vpbe_display: fix releasing of active buffers
  [media] media: davinci: vpif_display: fix releasing of active buffers
  [media] media: davinci: vpif_capture: fix releasing of active buffers
  [media] s5p-fimc: Fix YUV422P depth
  [media] s5c73m3: Add missing rename of v4l2_of_get_next_endpoint() function
  [media] rtl28xxu: silence error log about disabled rtl2832_sdr module
  [media] rtl28xxu: do not hard depend on staging SDR module
2014-05-21 19:01:08 +09:00
Linus Torvalds
84e12d992a Merge tag 'staging-3.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
 "Here are five staging driver fixes for 3.15-rc6 that resolve some
  reported issues.  They are for the imx and rtl8723au drivers"

* tag 'staging-3.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: rtl8723au: Do not reset wdev->iftype in netdev_close()
  staging: rtl8723au: Use correct pipe type for USB interrupts
  imx-drm: imx-tve: correct DDC property name to 'ddc-i2c-bus'
  imx-drm: imx-drm-core: skip components whose parent device is disabled
  imx-drm: imx-drm-core: fix imx_drm_encoder_get_mux_id
2014-05-21 19:00:09 +09:00
Linus Torvalds
439c610992 Merge tag 'driver-core-3.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
 "Here are two driver core (well, sysfs) fixes for 3.15-rc6 that resolve
  some reported issues and a regression from 3.13"

* tag 'driver-core-3.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  sysfs: make sure read buffer is zeroed
  kernfs, sysfs, cgroup: restrict extra perm check on open to sysfs
2014-05-21 18:59:25 +09:00
Linus Torvalds
957cf2582a Merge tag 'pci-v3.15-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
 "These are fixes for an SHPCHP hotplug regression, a "wait for pending
  transaction" problem (used in device reset paths), and an email
  address update.

  PCI device hotplug:
    - Fix SHPCHP bus speed mismatch issue (Marcel Apfelbaum)

  Miscellaneous:
    - Fix pci_wait_for_pending_transaction() (Gavin Shan)
    - Update email address (Ben Hutchings)"

* tag 'pci-v3.15-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: Wrong register used to check pending traffic
  PCI: shpchp: Check bridge's secondary (not primary) bus speed
  PCI: Update my email address
2014-05-21 18:57:25 +09:00
Linus Torvalds
b84293b23e Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random
Pull /dev/random fix from Ted Ts'o:
 "This fixes a BUG_ON-causing regression that was introduced during the
  last merge window"

* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
  random: fix BUG_ON caused by accounting simplification
2014-05-21 18:56:35 +09:00
Linus Torvalds
026d68be45 Merge tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mike.turquette/linux
Pull clock framework fixes from Mike Turquette:
 "Clock framework and driver fixes, all of which fix user-visible
  regressions.

  As usual most fixes are for platform-specific clock drivers, but there
  are also two fixes to the clk core after recent changes to the way
  that clock unregistration is handled"

* tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mike.turquette/linux:
  clk: tegra: Fix wrong value written to PLLE_AUX
  clk: shmobile: clk-mstp: change to using clock-indices
  clk: Fix slab corruption in clk_unregister()
  clk: Fix double free due to devm_clk_register()
  clk: socfpga: fix clock driver for 3.15
  clk: divider: Fix best div calculation for power-of-two and table dividers
  clk: bcm281xx: don't use unnamed structs or unions
2014-05-21 18:55:17 +09:00
Linus Torvalds
b2e3432af1 Merge tag 'spi-v3.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
 "A few core fixes around outlying cases here, nothing that should
  affect most users but useful fixes.  The diffstat is rather larger
  than one might hope due some simple code motion in the fix for
  !CONFIG_DMA, the actual meaningful change is much smaller.

   - Fix handling of unsupported dual and quad mode support on slave
     registration so that drivers that can degrade gracefully do so,
     preventing regressions for drivers this is added.
   - Fix build in !CONFIG_DMA cases following addition of generic DMA
     mapping support.
   - Fix error handling for queue creation which due to wider kernel
     changes can be triggered more easily.
   - A couple of driver specific fixes"

* tag 'spi-v3.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi/pxa2xx: Prevent DMA from transferring too many bytes
  spi: core: Don't destroy master queue if we fail to create it
  spi: qup: Fix return value checking for pm_runtime_get_sync()
  spi: core: Protect DMA code by #ifdef CONFIG_HAS_DMA
  spi: core: Ignore unsupported Dual/Quad Transfer Mode bits
2014-05-21 18:53:55 +09:00
Linus Torvalds
081069ff81 Merge tag 'gpio-v3.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
 - fix a null pointer bug in the ICH6 chipset driver
 - fix device tree registration for the mcp23s08 driver

* tag 'gpio-v3.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio: mcp23s08: Bug fix of SPI device tree registration.
  gpio: ich: set regs and reglen for i3100 and ich6 chipset
2014-05-21 18:53:13 +09:00
Linus Torvalds
06eb4cc2e7 Merge branch 'for-3.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull more cgroup fixes from Tejun Heo:
 "Three more patches to fix cgroup_freezer breakage due to the recent
  cgroup internal locking changes - an operation cgroup_freezer was
  using now requires sleepable context and cgroup_freezer was invoking
  that while holding a spin lock.  cgroup_freezer was using an overly
  elaborate hierarchical locking scheme.

  While it's possible to convert the hierarchical spinlocks directly to
  mutexes, this patch simplifies the overall locking so that it uses a
  global mutex.  This has the added benefit of avoiding iterating
  potentially huge number of tasks under a spinlock.  While the patch is
  on the larger side in the devel cycle, the changes made are mostly
  straight-forward and the locking logic is a lot simpler afterwards"

* 'for-3.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: fix rcu_read_lock() leak in update_if_frozen()
  cgroup_freezer: replace freezer->lock with freezer_mutex
  cgroup: introduce task_css_is_root()
2014-05-21 18:36:40 +09:00
Linus Torvalds
6ab9028d00 Merge branch 'for-3.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata fixes from Tejun Heo:
 "Mostly device-specific fixes.  The only thing which isn't is the fix
  for zpodd oops-on-detach bug"

* 'for-3.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
  ahci: imx: PLL clock needs 100us to settle down
  ata: pata_at91 only works on sam9
  libata: clean up ZPODD when a port is detached
  ahci: imx: software workaround for phy reset issue in resume
  ahci: imx: add namespace for register enums
  ahci: disable DEVSLP for Intel Valleyview
2014-05-21 18:35:42 +09:00