- Convert the 32bit syscalls to be pt_regs based which removes the
requirement to push all 6 potential arguments onto the stack and
consolidates the interface with the 64bit variant
- The first small portion of the exception and syscall related entry
code consolidation which aims to address the recently discovered
issues vs. RCU, int3, NMI and some other exceptions which can
interrupt any context. The bulk of the changes is still work in
progress and aimed for 5.8.
- A few lockdep namespace cleanups which have been applied into this
branch to keep the prerequisites for the ongoing work confined.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6B/TMTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoYA6EAC7r/bCMxBelljT3b7LkBbiJcocJ+zK
OSzWU9miJGTAvYqn4/ciLKg4dA424b/1rBFlF1hBTCQ0HL5Cv4lajxdKEZCO5WCC
WWTCz+MC60aWFaH3VNoywiLGb39H2IbqWbS9yNPd/wBkLHiMAD6NPQntOvcPaD4j
1lyrMtLzfrWlrHxvxdI3kt5ZpFLYNXr2xk61xQjTz0ROFQBhf2sDsuhHhiYVLPj7
JwYktpbBiPeaw2+I18NPymNPY+VfY8LCTgLl5M+rbKyCqebKaedZQJ7QXFhAEqKC
Y2f+gJsKWtTDzGP2mk/5kF0uP7cd0vJK35ZCXtLZ9BbcNtFZU6w+ADqRo4pJBHRY
QRzo/AWrdkuTJF0CrP6mcneNC7NwWLSdKrE1z77RQCHUPVvhHhRDZsgdLcZ/KKwx
y1ji22trwNB+7LmI2fUOU5RRHZBIuNvQT+mPt24febJuHpZKul62dd3cqTGeSTC+
MYVknYDSg/+jk+83DhuZnTyb9lWTbq/0Q1HRDu6l2LrMIH7YMPpY5Ea64ZFYzWXy
s0+iHEM4mUzltwNauHIntjbwXi3C0l2k1WQyG0gun2eS6SXfu0lb93V4msFj/N1+
oHavH2n2A4XrRr+Ob87fsl7nfXJibWP7R9xPblrWP2sNdqfjSyGd49rnsvpWqWMK
Fj0d7tQ78+/SwA==
=tWXS
-----END PGP SIGNATURE-----
Merge tag 'x86-entry-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 entry code updates from Thomas Gleixner:
- Convert the 32bit syscalls to be pt_regs based which removes the
requirement to push all 6 potential arguments onto the stack and
consolidates the interface with the 64bit variant
- The first small portion of the exception and syscall related entry
code consolidation which aims to address the recently discovered
issues vs. RCU, int3, NMI and some other exceptions which can
interrupt any context. The bulk of the changes is still work in
progress and aimed for 5.8.
- A few lockdep namespace cleanups which have been applied into this
branch to keep the prerequisites for the ongoing work confined.
* tag 'x86-entry-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits)
x86/entry: Fix build error x86 with !CONFIG_POSIX_TIMERS
lockdep: Rename trace_{hard,soft}{irq_context,irqs_enabled}()
lockdep: Rename trace_softirqs_{on,off}()
lockdep: Rename trace_hardirq_{enter,exit}()
x86/entry: Rename ___preempt_schedule
x86: Remove unneeded includes
x86/entry: Drop asmlinkage from syscalls
x86/entry/32: Enable pt_regs based syscalls
x86/entry/32: Use IA32-specific wrappers for syscalls taking 64-bit arguments
x86/entry/32: Rename 32-bit specific syscalls
x86/entry/32: Clean up syscall_32.tbl
x86/entry: Remove ABI prefixes from functions in syscall tables
x86/entry/64: Add __SYSCALL_COMMON()
x86/entry: Remove syscall qualifier support
x86/entry/64: Remove ptregs qualifier from syscall table
x86/entry: Move max syscall number calculation to syscallhdr.sh
x86/entry/64: Split X32 syscall table into its own file
x86/entry/64: Move sys_ni_syscall stub to common.c
x86/entry/64: Use syscall wrappers for x32_rt_sigreturn
x86/entry: Refactor SYS_NI macros
...
Continue what commit:
d820ac4c2f ("locking: rename trace_softirq_[enter|exit] => lockdep_softirq_[enter|exit]")
started, rename these to avoid confusing them with tracepoints.
git grep -l "trace_\(soft\|hard\)\(irq_context\|irqs_enabled\)" | while read file;
do
sed -ie 's/trace_\(soft\|hard\)\(irq_context\|irqs_enabled\)/lockdep_\1\2/g' $file;
done
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/20200320115859.178626842@infradead.org
Continue what commit:
d820ac4c2f ("locking: rename trace_softirq_[enter|exit] => lockdep_softirq_[enter|exit]")
started, rename these to avoid confusing them with tracepoints.
git grep -l "trace_softirqs_\(on\|off\)" | while read file;
do
sed -ie 's/trace_softirqs_\(on\|off\)/lockdep_softirqs_\1/g' $file;
done
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/20200320115859.119434738@infradead.org
Set current->irq_config = 1 for hrtimers which are not marked to expire in
hard interrupt context during hrtimer_init(). These timers will expire in
softirq context on PREEMPT_RT.
Setting this allows lockdep to differentiate these timers. If a timer is
marked to expire in hard interrupt context then the timer callback is not
supposed to acquire a regular spinlock instead of a raw_spinlock in the
expiry callback.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113242.534508206@linutronix.de
Extend lockdep to validate lock wait-type context.
The current wait-types are:
LD_WAIT_FREE, /* wait free, rcu etc.. */
LD_WAIT_SPIN, /* spin loops, raw_spinlock_t etc.. */
LD_WAIT_CONFIG, /* CONFIG_PREEMPT_LOCK, spinlock_t etc.. */
LD_WAIT_SLEEP, /* sleeping locks, mutex_t etc.. */
Where lockdep validates that the current lock (the one being acquired)
fits in the current wait-context (as generated by the held stack).
This ensures that there is no attempt to acquire mutexes while holding
spinlocks, to acquire spinlocks while holding raw_spinlocks and so on. In
other words, its a more fancy might_sleep().
Obviously RCU made the entire ordeal more complex than a simple single
value test because RCU can be acquired in (pretty much) any context and
while it presents a context to nested locks it is not the same as it
got acquired in.
Therefore its necessary to split the wait_type into two values, one
representing the acquire (outer) and one representing the nested context
(inner). For most 'normal' locks these two are the same.
[ To make static initialization easier we have the rule that:
.outer == INV means .outer == .inner; because INV == 0. ]
It further means that its required to find the minimal .inner of the held
stack to compare against the outer of the new lock; because while 'normal'
RCU presents a CONFIG type to nested locks, if it is taken while already
holding a SPIN type it obviously doesn't relax the rules.
Below is an example output generated by the trivial test code:
raw_spin_lock(&foo);
spin_lock(&bar);
spin_unlock(&bar);
raw_spin_unlock(&foo);
[ BUG: Invalid wait context ]
-----------------------------
swapper/0/1 is trying to lock:
ffffc90000013f20 (&bar){....}-{3:3}, at: kernel_init+0xdb/0x187
other info that might help us debug this:
1 lock held by swapper/0/1:
#0: ffffc90000013ee0 (&foo){+.+.}-{2:2}, at: kernel_init+0xd1/0x187
The way to read it is to look at the new -{n,m} part in the lock
description; -{3:3} for the attempted lock, and try and match that up to
the held locks, which in this case is the one: -{2,2}.
This tells that the acquiring lock requires a more relaxed environment than
presented by the lock stack.
Currently only the normal locks and RCU are converted, the rest of the
lockdep users defaults to .inner = INV which is ignored. More conversions
can be done when desired.
The check for spinlock_t nesting is not enabled by default. It's a separate
config option for now as there are known problems which are currently
addressed. The config option allows to identify these problems and to
verify that the solutions found are indeed solving them.
The config switch will be removed and the checks will permanently enabled
once the vast majority of issues has been addressed.
[ bigeasy: Move LD_WAIT_FREE,… out of CONFIG_LOCKDEP to avoid compile
failure with CONFIG_DEBUG_SPINLOCK + !CONFIG_LOCKDEP]
[ tglx: Add the config option ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113242.427089655@linutronix.de
nmi_enter() does lockdep_off() and hence lockdep ignores everything.
And NMI context makes it impossible to do full IN-NMI tracking like we
do IN-HARDIRQ, that could result in graph_lock recursion.
However, since look_up_lock_class() is lockless, we can find the class
of a lock that has prior use and detect IN-NMI after USED, just not
USED after IN-NMI.
NOTE: By shifting the lockdep_off() recursion count to bit-16, we can
easily differentiate between actual recursion and off.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Link: https://lkml.kernel.org/r/20200221134215.090538203@infradead.org
There were two patterns for lockdep_recursion:
Pattern-A:
if (current->lockdep_recursion)
return
current->lockdep_recursion = 1;
/* do stuff */
current->lockdep_recursion = 0;
Pattern-B:
current->lockdep_recursion++;
/* do stuff */
current->lockdep_recursion--;
But a third pattern has emerged:
Pattern-C:
current->lockdep_recursion = 1;
/* do stuff */
current->lockdep_recursion = 0;
And while this isn't broken per-se, it is highly dangerous because it
doesn't nest properly.
Get rid of all Pattern-C instances and shore up Pattern-A with a
warning.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200313093325.GW12561@hirez.programming.kicks-ass.net
Qian Cai reported a bug when PROVE_RCU_LIST=y, and read on /proc/lockdep
triggered a warning:
[ ] DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)
...
[ ] Call Trace:
[ ] lock_is_held_type+0x5d/0x150
[ ] ? rcu_lockdep_current_cpu_online+0x64/0x80
[ ] rcu_read_lock_any_held+0xac/0x100
[ ] ? rcu_read_lock_held+0xc0/0xc0
[ ] ? __slab_free+0x421/0x540
[ ] ? kasan_kmalloc+0x9/0x10
[ ] ? __kmalloc_node+0x1d7/0x320
[ ] ? kvmalloc_node+0x6f/0x80
[ ] __bfs+0x28a/0x3c0
[ ] ? class_equal+0x30/0x30
[ ] lockdep_count_forward_deps+0x11a/0x1a0
The warning got triggered because lockdep_count_forward_deps() call
__bfs() without current->lockdep_recursion being set, as a result
a lockdep internal function (__bfs()) is checked by lockdep, which is
unexpected, and the inconsistency between the irq-off state and the
state traced by lockdep caused the warning.
Apart from this warning, lockdep internal functions like __bfs() should
always be protected by current->lockdep_recursion to avoid potential
deadlocks and data inconsistency, therefore add the
current->lockdep_recursion on-and-off section to protect __bfs() in both
lockdep_count_forward_deps() and lockdep_count_backward_deps()
Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200312151258.128036-1-boqun.feng@gmail.com
Once a lock class is zapped, all the lock chains that include the zapped
class are essentially useless. The lock_chain structure itself can be
reused, but not the corresponding chain_hlocks[] entries. Over time,
we will run out of chain_hlocks entries while there are still plenty
of other lockdep array entries available.
To fix this imbalance, we have to make chain_hlocks entries reusable
just like the others. As the freed chain_hlocks entries are in blocks of
various lengths. A simple bitmap like the one used in the other reusable
lockdep arrays isn't applicable. Instead the chain_hlocks entries are
put into bucketed lists (MAX_CHAIN_BUCKETS) of chain blocks. Bucket 0
is the variable size bucket which houses chain blocks of size larger than
MAX_CHAIN_BUCKETS sorted in decreasing size order. Initially, the whole
array is in one chain block (the primordial chain block) in bucket 0.
The minimum size of a chain block is 2 chain_hlocks entries. That will
be the minimum allocation size. In other word, allocation requests
for one chain_hlocks entry will cause 2-entry block to be returned and
hence 1 entry will be wasted.
Allocation requests for the chain_hlocks are fulfilled first by looking
for chain block of matching size. If not found, the first chain block
from bucket[0] (the largest one) is split. That can cause hlock entries
fragmentation and reduce allocation efficiency if a chain block of size >
MAX_CHAIN_BUCKETS is ever zapped and put back to after the primordial
chain block. So the MAX_CHAIN_BUCKETS must be large enough that this
should seldom happen.
By reusing the chain_hlocks entries, we are able to handle workloads
that add and zap a lot of lock classes without the risk of running out
of chain_hlocks entries as long as the total number of outstanding lock
classes at any time remain within a reasonable limit.
Two new tracking counters, nr_free_chain_hlocks & nr_large_chain_blocks,
are added to track the total number of chain_hlocks entries in the
free bucketed lists and the number of large chain blocks in buckets[0]
respectively. The nr_free_chain_hlocks replaces nr_chain_hlocks.
The nr_large_chain_blocks counter enables to see if we should increase
the number of buckets (MAX_CHAIN_BUCKETS) available so as to avoid to
avoid the fragmentation problem in bucket[0].
An internal nfsd test that ran for more than an hour and kept on
loading and unloading kernel modules could cause the following message
to be displayed.
[ 4318.443670] BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!
The patched kernel was able to complete the test with a lot of free
chain_hlocks entries to spare:
# cat /proc/lockdep_stats
:
dependency chains: 18867 [max: 65536]
dependency chain hlocks: 74926 [max: 327680]
dependency chain hlocks lost: 0
:
zapped classes: 1541
zapped lock chains: 56765
large chain blocks: 1
By changing MAX_CHAIN_BUCKETS to 3 and add a counter for the size of the
largest chain block. The system still worked and We got the following
lockdep_stats data:
dependency chains: 18601 [max: 65536]
dependency chain hlocks used: 73133 [max: 327680]
dependency chain hlocks lost: 0
:
zapped classes: 1541
zapped lock chains: 56702
large chain blocks: 45165
large chain block size: 20165
By running the test again, I was indeed able to cause chain_hlocks
entries to get lost:
dependency chain hlocks used: 74806 [max: 327680]
dependency chain hlocks lost: 575
:
large chain blocks: 48737
large chain block size: 7
Due to the fragmentation, it is possible that the
"MAX_LOCKDEP_CHAIN_HLOCKS too low!" error can happen even if a lot of
of chain_hlocks entries appear to be free.
Fortunately, a MAX_CHAIN_BUCKETS value of 16 should be big enough that
few variable sized chain blocks, other than the initial one, should
ever be present in bucket 0.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200206152408.24165-7-longman@redhat.com
Add a new counter nr_zapped_lock_chains to track the number lock chains
that have been removed.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200206152408.24165-6-longman@redhat.com
If a lock chain contains a class that is zapped, the whole lock chain is
likely to be invalid. If the zapped class is at the end of the chain,
the partial chain without the zapped class should have been stored
already as the current code will store all its predecessor chains. If
the zapped class is somewhere in the middle, there is no guarantee that
the partial chain will actually happen. It may just clutter up the hash
and make searching slower. I would rather prefer storing the chain only
when it actually happens.
So just dump the corresponding chain_hlocks entries for now. A latter
patch will try to reuse the freed chain_hlocks entries.
This patch also changes the type of nr_chain_hlocks to unsigned integer
to be consistent with the other counters.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200206152408.24165-5-longman@redhat.com
The whole point of the lockdep dynamic key patch is to allow unused
locks to be removed from the lockdep data buffers so that existing
buffer space can be reused. However, there is no way to find out how
many unused locks are zapped and so we don't know if the zapping process
is working properly.
Add a new nr_zapped_classes counter to track that and show it in
/proc/lockdep_stats.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200206152408.24165-4-longman@redhat.com
There are currently three counters to track the IRQ context of a lock
chain - nr_hardirq_chains, nr_softirq_chains and nr_process_chains.
They are incremented when a new lock chain is added, but they are
not decremented when a lock chain is removed. That causes some of the
statistic counts reported by /proc/lockdep_stats to be incorrect.
IRQ
Fix that by decrementing the right counter when a lock chain is removed.
Since inc_chains() no longer accesses hardirq_context and softirq_context
directly, it is moved out from the CONFIG_TRACE_IRQFLAGS conditional
compilation block.
Fixes: a0b0fd53e1 ("locking/lockdep: Free lock classes that are no longer in use")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200206152408.24165-2-longman@redhat.com
If the lockdep code is really running out of the stack_trace entries,
it is likely that buffer overrun can happen and the data immediately
after stack_trace[] will be corrupted.
If there is less than LOCK_TRACE_SIZE_IN_LONGS entries left before
the call to save_trace(), the max_entries computation will leave it
with a very large positive number because of its unsigned nature. The
subsequent call to stack_trace_save() will then corrupt the data after
stack_trace[]. Fix that by changing max_entries to a signed integer
and check for negative value before calling stack_trace_save().
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 12593b7467 ("locking/lockdep: Reduce space occupied by stack traces")
Link: https://lkml.kernel.org/r/20191220135128.14876-1-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This changes "to the list" to "from the list" and also deletes the
obsolete comment about the "@nested" argument.
The "nested" argument was removed in this commit, earlier this year:
5facae4f35 ("locking/lockdep: Remove unused @nested argument from lock_release()").
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/20191104091252.GA31509@mwanda
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull locking updates from Ingo Molnar:
- improve rwsem scalability
- add uninitialized rwsem debugging check
- reduce lockdep's stacktrace memory usage and add diagnostics
- misc cleanups, code consolidation and constification
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
mutex: Fix up mutex_waiter usage
locking/mutex: Use mutex flags macro instead of hard code
locking/mutex: Make __mutex_owner static to mutex.c
locking/qspinlock,x86: Clarify virt_spin_lock_key
locking/rwsem: Check for operations on an uninitialized rwsem
locking/rwsem: Make handoff writer optimistically spin on owner
locking/lockdep: Report more stack trace statistics
locking/lockdep: Reduce space occupied by stack traces
stacktrace: Constify 'entries' arguments
locking/lockdep: Make it clear that what lock_class::key points at is not modified
Security is a wonderful thing, but so is the ability to debug based on
lockdep warnings. This commit therefore makes lockdep lock addresses
visible in the clear.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Report the number of stack traces and the number of stack trace hash
chains. These two numbers are useful because these allow to estimate
the number of stack trace hash collisions.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Link: https://lkml.kernel.org/r/20190722182443.216015-5-bvanassche@acm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Although commit 669de8bda8 ("kernel/workqueue: Use dynamic lockdep keys
for workqueues") unregisters dynamic lockdep keys when a workqueue is
destroyed, a side effect of that commit is that all stack traces
associated with the lockdep key are leaked when a workqueue is destroyed.
Fix this by storing each unique stack trace once. Other changes in this
patch are:
- Use NULL instead of { .nr_entries = 0 } to represent 'no trace'.
- Store a pointer to a stack trace in struct lock_class and struct
lock_list instead of storing 'nr_entries' and 'offset'.
This patch avoids that the following program triggers the "BUG:
MAX_STACK_TRACE_ENTRIES too low!" complaint:
#include <fcntl.h>
#include <unistd.h>
int main()
{
for (;;) {
int fd = open("/dev/infiniband/rdma_cm", O_RDWR);
close(fd);
}
}
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Reported-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yuyang Du <duyuyang@gmail.com>
Link: https://lkml.kernel.org/r/20190722182443.216015-4-bvanassche@acm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch does not change the behavior of the lockdep code.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Link: https://lkml.kernel.org/r/20190722182443.216015-2-bvanassche@acm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As Will Deacon points out, CONFIG_PROVE_LOCKING implies TRACE_IRQFLAGS,
so the conditions I added in the previous patch, and some others in the
same file can be simplified by only checking for the former.
No functional change.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Yuyang Du <duyuyang@gmail.com>
Fixes: 886532aee3 ("locking/lockdep: Move mark_lock() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING")
Link: https://lkml.kernel.org/r/20190628102919.2345242-1-arnd@arndb.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The last cleanup patch triggered another issue, as now another function
should be moved into the same section:
kernel/locking/lockdep.c:3580:12: error: 'mark_lock' defined but not used [-Werror=unused-function]
static int mark_lock(struct task_struct *curr, struct held_lock *this,
Move mark_lock() into the same #ifdef section as its only caller, and
remove the now-unused mark_lock_irq() stub helper.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yuyang Du <duyuyang@gmail.com>
Fixes: 0d2cc3b345 ("locking/lockdep: Move valid_state() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING")
Link: https://lkml.kernel.org/r/20190617124718.1232976-1-arnd@arndb.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The sequence
static DEFINE_WW_CLASS(test_ww_class);
struct ww_acquire_ctx ww_ctx;
struct ww_mutex ww_lock_a;
struct ww_mutex ww_lock_b;
struct ww_mutex ww_lock_c;
struct mutex lock_c;
ww_acquire_init(&ww_ctx, &test_ww_class);
ww_mutex_init(&ww_lock_a, &test_ww_class);
ww_mutex_init(&ww_lock_b, &test_ww_class);
ww_mutex_init(&ww_lock_c, &test_ww_class);
mutex_init(&lock_c);
ww_mutex_lock(&ww_lock_a, &ww_ctx);
mutex_lock(&lock_c);
ww_mutex_lock(&ww_lock_b, &ww_ctx);
ww_mutex_lock(&ww_lock_c, &ww_ctx);
mutex_unlock(&lock_c); (*)
ww_mutex_unlock(&ww_lock_c);
ww_mutex_unlock(&ww_lock_b);
ww_mutex_unlock(&ww_lock_a);
ww_acquire_fini(&ww_ctx); (**)
will trigger the following error in __lock_release() when calling
mutex_release() at **:
DEBUG_LOCKS_WARN_ON(depth <= 0)
The problem is that the hlock merging happening at * updates the
references for test_ww_class incorrectly to 3 whereas it should've
updated it to 4 (representing all the instances for ww_ctx and
ww_lock_[abc]).
Fix this by updating the references during merging correctly taking into
account that we can have non-zero references (both for the hlock that we
merge into another hlock or for the hlock we are merging into).
Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: https://lkml.kernel.org/r/20190524201509.9199-2-imre.deak@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The sequence
static DEFINE_WW_CLASS(test_ww_class);
struct ww_acquire_ctx ww_ctx;
struct ww_mutex ww_lock_a;
struct ww_mutex ww_lock_b;
struct mutex lock_c;
struct mutex lock_d;
ww_acquire_init(&ww_ctx, &test_ww_class);
ww_mutex_init(&ww_lock_a, &test_ww_class);
ww_mutex_init(&ww_lock_b, &test_ww_class);
mutex_init(&lock_c);
ww_mutex_lock(&ww_lock_a, &ww_ctx);
mutex_lock(&lock_c);
ww_mutex_lock(&ww_lock_b, &ww_ctx);
mutex_unlock(&lock_c); (*)
ww_mutex_unlock(&ww_lock_b);
ww_mutex_unlock(&ww_lock_a);
ww_acquire_fini(&ww_ctx);
triggers the following WARN in __lock_release() when doing the unlock at *:
DEBUG_LOCKS_WARN_ON(curr->lockdep_depth != depth - 1);
The problem is that the WARN check doesn't take into account the merging
of ww_lock_a and ww_lock_b which results in decreasing curr->lockdep_depth
by 2 not only 1.
Note that the following sequence doesn't trigger the WARN, since there
won't be any hlock merging.
ww_acquire_init(&ww_ctx, &test_ww_class);
ww_mutex_init(&ww_lock_a, &test_ww_class);
ww_mutex_init(&ww_lock_b, &test_ww_class);
mutex_init(&lock_c);
mutex_init(&lock_d);
ww_mutex_lock(&ww_lock_a, &ww_ctx);
mutex_lock(&lock_c);
mutex_lock(&lock_d);
ww_mutex_lock(&ww_lock_b, &ww_ctx);
mutex_unlock(&lock_d);
ww_mutex_unlock(&ww_lock_b);
ww_mutex_unlock(&ww_lock_a);
mutex_unlock(&lock_c);
ww_acquire_fini(&ww_ctx);
In general both of the above two sequences are valid and shouldn't
trigger any lockdep warning.
Fix this by taking the decrement due to the hlock merging into account
during lock release and hlock class re-setting. Merging can't happen
during lock downgrading since there won't be a new possibility to merge
hlocks in that case, so add a WARN if merging still happens then.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: ville.syrjala@linux.intel.com
Link: https://lkml.kernel.org/r/20190524201509.9199-1-imre.deak@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In mark_lock_irq(), the following checks are performed:
----------------------------------
| -> | unsafe | read unsafe |
|----------------------------------|
| safe | F B | F* B* |
|----------------------------------|
| read safe | F? B* | - |
----------------------------------
Where:
F: check_usage_forwards
B: check_usage_backwards
*: check enabled by STRICT_READ_CHECKS
?: check enabled by the !dir condition
From checking point of view, the special F? case does not make sense,
whereas it perhaps is made for peroformance concern. As later patch will
address this issue, remove this exception, which makes the checks
consistent later.
With STRICT_READ_CHECKS = 1 which is default, there is no functional
change.
Signed-off-by: Yuyang Du <duyuyang@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bvanassche@acm.org
Cc: frederic@kernel.org
Cc: ming.lei@redhat.com
Cc: will.deacon@arm.com
Link: https://lkml.kernel.org/r/20190506081939.74287-24-duyuyang@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The new bit can be any possible lock usage except it is garbage, so the
cases in switch can be made simpler. Warn early on if wrong usage bit is
passed without taking locks. No functional change.
Signed-off-by: Yuyang Du <duyuyang@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bvanassche@acm.org
Cc: frederic@kernel.org
Cc: ming.lei@redhat.com
Cc: will.deacon@arm.com
Link: https://lkml.kernel.org/r/20190506081939.74287-23-duyuyang@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As Peter has put it all sound and complete for the cause, I simply quote:
"It (check_redundant) was added for cross-release (which has since been
reverted) which would generate a lot of redundant links (IIRC) but
having it makes the reports more convoluted -- basically, if we had an
A-B-C relation, then A-C will not be added to the graph because it is
already covered. This then means any report will include B, even though
a shorter cycle might have been possible."
This would increase the number of direct dependencies. For a simple workload
(make clean; reboot; make vmlinux -j8), the data looks like this:
CONFIG_LOCKDEP_SMALL: direct dependencies: 6926
!CONFIG_LOCKDEP_SMALL: direct dependencies: 9052 (+30.7%)
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Yuyang Du <duyuyang@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bvanassche@acm.org
Cc: frederic@kernel.org
Cc: ming.lei@redhat.com
Cc: will.deacon@arm.com
Link: https://lkml.kernel.org/r/20190506081939.74287-21-duyuyang@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
These two functions now handle different check results themselves. A new
check_path function is added to check whether there is a path in the
dependency graph. No functional change.
Signed-off-by: Yuyang Du <duyuyang@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bvanassche@acm.org
Cc: frederic@kernel.org
Cc: ming.lei@redhat.com
Cc: will.deacon@arm.com
Link: https://lkml.kernel.org/r/20190506081939.74287-20-duyuyang@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The @nested is not used in __release_lock so remove it despite that it
is not used in lock_release in the first place.
Signed-off-by: Yuyang Du <duyuyang@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bvanassche@acm.org
Cc: frederic@kernel.org
Cc: ming.lei@redhat.com
Cc: will.deacon@arm.com
Link: https://lkml.kernel.org/r/20190506081939.74287-19-duyuyang@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In check_deadlock(), the third argument read comes from the second
argument hlock so that it can be removed. No functional change.
Signed-off-by: Yuyang Du <duyuyang@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bvanassche@acm.org
Cc: frederic@kernel.org
Cc: ming.lei@redhat.com
Cc: will.deacon@arm.com
Link: https://lkml.kernel.org/r/20190506081939.74287-18-duyuyang@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The breadth-first search is implemented as flat-out non-recursive now, but
the comments are still describing it as recursive, update the comments in
that regard.
Signed-off-by: Yuyang Du <duyuyang@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bvanassche@acm.org
Cc: frederic@kernel.org
Cc: ming.lei@redhat.com
Cc: will.deacon@arm.com
Link: https://lkml.kernel.org/r/20190506081939.74287-16-duyuyang@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In search of a dependency in the lock graph, there is contant checks for
forward or backward search. Directly reference the field offset of the
struct that differentiates the type of search to avoid those checks.
No functional change.
Signed-off-by: Yuyang Du <duyuyang@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bvanassche@acm.org
Cc: frederic@kernel.org
Cc: ming.lei@redhat.com
Cc: will.deacon@arm.com
Link: https://lkml.kernel.org/r/20190506081939.74287-15-duyuyang@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
With the change, we can slightly adjust the code to iterate the queue in BFS
search, which simplifies the code. No functional change.
Signed-off-by: Yuyang Du <duyuyang@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bvanassche@acm.org
Cc: frederic@kernel.org
Cc: ming.lei@redhat.com
Cc: will.deacon@arm.com
Link: https://lkml.kernel.org/r/20190506081939.74287-14-duyuyang@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The element field is an array in struct circular_queue to keep track of locks
in the search. Making it the same type as the locks avoids type cast. Also
fix a typo and elaborate the comment above struct circular_queue.
No functional change.
Signed-off-by: Yuyang Du <duyuyang@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: frederic@kernel.org
Cc: ming.lei@redhat.com
Cc: will.deacon@arm.com
Link: https://lkml.kernel.org/r/20190506081939.74287-13-duyuyang@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The lockdep_map argument in them is not used, remove it.
Signed-off-by: Yuyang Du <duyuyang@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: frederic@kernel.org
Cc: ming.lei@redhat.com
Cc: will.deacon@arm.com
Link: https://lkml.kernel.org/r/20190506081939.74287-11-duyuyang@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
held_lock->class_idx is used to point to the class of the held lock. The
index is shifted by 1 to make index 0 mean no class, which results in class
index shifting back and forth but is not worth doing so.
The reason is: (1) there will be no "no-class" held_lock to begin with, and
(2) index 0 seems to be used for error checking, but if something wrong
indeed happened, the index can't be counted on to distinguish it as that
something won't set the class_idx to 0 on purpose to tell us it is wrong.
Therefore, change the index to start from 0. This saves a lot of
back-and-forth shifts and a class slot back to lock_classes.
Since index 0 is now used for lock class, we change the initial chain key to
-1 to avoid key collision, which is due to the fact that __jhash_mix(0, 0, 0) = 0.
Actually, the initial chain key can be any arbitrary value other than 0.
In addition, a bitmap is maintained to keep track of the used lock classes,
and we check the validity of the held lock against that bitmap.
Signed-off-by: Yuyang Du <duyuyang@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bvanassche@acm.org
Cc: frederic@kernel.org
Cc: ming.lei@redhat.com
Cc: will.deacon@arm.com
Link: https://lkml.kernel.org/r/20190506081939.74287-10-duyuyang@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Chain keys are computed using Jenkins hash function, which needs an initial
hash to start with. Dedicate a macro to make this clear and configurable. A
later patch changes this initial chain key.
Signed-off-by: Yuyang Du <duyuyang@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bvanassche@acm.org
Cc: frederic@kernel.org
Cc: ming.lei@redhat.com
Cc: will.deacon@arm.com
Link: https://lkml.kernel.org/r/20190506081939.74287-9-duyuyang@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Despite that there is a lockdep_init_task() which does nothing, lockdep
initiates tasks by assigning lockdep fields and does so inconsistently. Fix
this by using lockdep_init_task().
Signed-off-by: Yuyang Du <duyuyang@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bvanassche@acm.org
Cc: frederic@kernel.org
Cc: ming.lei@redhat.com
Cc: will.deacon@arm.com
Link: https://lkml.kernel.org/r/20190506081939.74287-8-duyuyang@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The lock usage bit characters are defined and determined with tricks.
Add some explanation to make it a bit clearer, then adjust the logic to
check the usage, which optimizes the code a bit.
No functional change.
Signed-off-by: Yuyang Du <duyuyang@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bvanassche@acm.org
Cc: frederic@kernel.org
Cc: ming.lei@redhat.com
Cc: will.deacon@arm.com
Link: https://lkml.kernel.org/r/20190506081939.74287-4-duyuyang@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since none of the print_*() function's return value is necessary, change
their return type to void. No functional change.
In cases where an invariable return value is used, this change slightly
improves readability, i.e.:
print_x();
return 0;
is definitely better than:
return print_x(); /* where print_x() always returns 0 */
Signed-off-by: Yuyang Du <duyuyang@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bvanassche@acm.org
Cc: frederic@kernel.org
Cc: ming.lei@redhat.com
Cc: will.deacon@arm.com
Link: https://lkml.kernel.org/r/20190506081939.74287-2-duyuyang@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
gcc warns that function print_lock_trace() is unused if
CONFIG_PROVE_LOCKING isn't set:
../kernel/locking/lockdep.c:2820:13: warning: ‘print_lock_trace’ defined but not used [-Wunused-function]
Rework so we remove the function if CONFIG_PROVE_LOCKING isn't set.
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: will.deacon@arm.com
Fixes: c120bce780 ("lockdep: Simplify stack trace handling")
Link: http://lkml.kernel.org/r/20190516191326.27003-1-anders.roxell@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It is possible to ignore the validation for a certain lock by using:
lockdep_set_novalidate_class()
on it. Each invocation will assign a new name to the class it created
for created __lockdep_no_validate__. That means that once
lockdep_set_novalidate_class() has been used on two locks then
class->name won't match lock->name for the first lock triggering the
warning.
So ignore changed non-matching ->name pointer for the special
__lockdep_no_validate__ class.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/20190517212234.32611-1-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add SPDX license identifiers to all files which:
- Have no license information of any form
- Have EXPORT_.*_SYMBOL_GPL inside which was used in the
initial scan/conversion to ignore the file
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
- Support for kernel address space layout randomization
- Add support for kernel image signature verification
- Convert s390 to the generic get_user_pages_fast code
- Convert s390 to the stack unwind API analog to x86
- Add support for CPU directed interrupts for PCI devices
- Provide support for MIO instructions to the PCI base layer, this
will allow the use of direct PCI mappings in user space code
- Add the basic KVM guest ultravisor interface for protected VMs
- Add AT_HWCAP bits for several new hardware capabilities
- Update the CPU measurement facility counter definitions to SVN 6
- Arnds cleanup patches for his quest to get LLVM compiles working
- A vfio-ccw update with bug fixes and support for halt and clear
- Improvements for the hardware TRNG code
- Another round of cleanup for the QDIO layer
- Numerous cleanups and bug fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJc0CCEAAoJEDjwexyKj9rgjmkH/A3e2drvuP/hSF3xfCKTQFdx
/PoLHQVCqENB3HU3FA/ljoXuG6jMgwj61looqlxBNumXFpIfTg0E1JC5S4wRGJ+K
cOVhIKV53gcuZkRcCJQp0WMnGzpk1Daf7iYXYmAl+7e+mREUPxOuJ0Ei6vXvRGZS
8cQrUCGrtPgkAeLlndypHI2M2TDDGJIMczOGbOZau8+8Lo7Wq9zt5y0h/v0ew37g
ogA0eGh6koU1435dt2pclZRiZ1XOcar3Uin9ioT+RnSgJ4pr1Pza/F6IGO0RdQa+
rva990lqGFp5r9lE4rMCwK9LWb/rfHdVPd35t9XPwphnQ/ORoWUwLk3uc5XOHow=
=dbuy
-----END PGP SIGNATURE-----
Merge tag 's390-5.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
- Support for kernel address space layout randomization
- Add support for kernel image signature verification
- Convert s390 to the generic get_user_pages_fast code
- Convert s390 to the stack unwind API analog to x86
- Add support for CPU directed interrupts for PCI devices
- Provide support for MIO instructions to the PCI base layer, this will
allow the use of direct PCI mappings in user space code
- Add the basic KVM guest ultravisor interface for protected VMs
- Add AT_HWCAP bits for several new hardware capabilities
- Update the CPU measurement facility counter definitions to SVN 6
- Arnds cleanup patches for his quest to get LLVM compiles working
- A vfio-ccw update with bug fixes and support for halt and clear
- Improvements for the hardware TRNG code
- Another round of cleanup for the QDIO layer
- Numerous cleanups and bug fixes
* tag 's390-5.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (98 commits)
s390/vdso: drop unnecessary cc-ldoption
s390: fix clang -Wpointer-sign warnigns in boot code
s390: drop CONFIG_VIRT_TO_BUS
s390: boot, purgatory: pass $(CLANG_FLAGS) where needed
s390: only build for new CPUs with clang
s390: simplify disabled_wait
s390/ftrace: use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
s390/unwind: introduce stack unwind API
s390/opcodes: add missing instructions to the disassembler
s390/bug: add entry size to the __bug_table section
s390: use proper expoline sections for .dma code
s390/nospec: rename assembler generated expoline thunks
s390: add missing ENDPROC statements to assembler functions
locking/lockdep: check for freed initmem in static_obj()
s390/kernel: add support for kernel address space layout randomization (KASLR)
s390/kernel: introduce .dma sections
s390/sclp: do not use static sccbs
s390/kprobes: use static buffer for insn_page
s390/kernel: convert SYSCALL and PGM_CHECK handlers to .quad
s390/kernel: build a relocatable kernel
...
Pull locking updates from Ingo Molnar:
"Here are the locking changes in this cycle:
- rwsem unification and simpler micro-optimizations to prepare for
more intrusive (and more lucrative) scalability improvements in
v5.3 (Waiman Long)
- Lockdep irq state tracking flag usage cleanups (Frederic
Weisbecker)
- static key improvements (Jakub Kicinski, Peter Zijlstra)
- misc updates, cleanups and smaller fixes"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
locking/lockdep: Remove unnecessary unlikely()
locking/static_key: Don't take sleeping locks in __static_key_slow_dec_deferred()
locking/static_key: Factor out the fast path of static_key_slow_dec()
locking/static_key: Add support for deferred static branches
locking/lockdep: Test all incompatible scenarios at once in check_irq_usage()
locking/lockdep: Avoid bogus Clang warning
locking/lockdep: Generate LOCKF_ bit composites
locking/lockdep: Use expanded masks on find_usage_*() functions
locking/lockdep: Map remaining magic numbers to lock usage mask names
locking/lockdep: Move valid_state() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING
locking/rwsem: Prevent unneeded warning during locking selftest
locking/rwsem: Optimize rwsem structure for uncontended lock acquisition
locking/rwsem: Enable lock event counting
locking/lock_events: Don't show pvqspinlock events on bare metal
locking/lock_events: Make lock_events available for all archs & other locks
locking/qspinlock_stat: Introduce generic lockevent_*() counting APIs
locking/rwsem: Enhance DEBUG_RWSEMS_WARN_ON() macro
locking/rwsem: Add debug check for __down_read*()
locking/rwsem: Micro-optimize rwsem_try_read_lock_unqueued()
locking/rwsem: Move rwsem internal function declarations to rwsem-xadd.h
...
Pull stack trace updates from Ingo Molnar:
"So Thomas looked at the stacktrace code recently and noticed a few
weirdnesses, and we all know how such stories of crummy kernel code
meeting German engineering perfection end: a 45-patch series to clean
it all up! :-)
Here's the changes in Thomas's words:
'Struct stack_trace is a sinkhole for input and output parameters
which is largely pointless for most usage sites. In fact if embedded
into other data structures it creates indirections and extra storage
overhead for no benefit.
Looking at all usage sites makes it clear that they just require an
interface which is based on a storage array. That array is either on
stack, global or embedded into some other data structure.
Some of the stack depot usage sites are outright wrong, but
fortunately the wrongness just causes more stack being used for
nothing and does not have functional impact.
Another oddity is the inconsistent termination of the stack trace
with ULONG_MAX. It's pointless as the number of entries is what
determines the length of the stored trace. In fact quite some call
sites remove the ULONG_MAX marker afterwards with or without nasty
comments about it. Not all architectures do that and those which do,
do it inconsistenly either conditional on nr_entries == 0 or
unconditionally.
The following series cleans that up by:
1) Removing the ULONG_MAX termination in the architecture code
2) Removing the ULONG_MAX fixups at the call sites
3) Providing plain storage array based interfaces for stacktrace
and stackdepot.
4) Cleaning up the mess at the callsites including some related
cleanups.
5) Removing the struct stack_trace based interfaces
This is not changing the struct stack_trace interfaces at the
architecture level, but it removes the exposure to the generic
code'"
* 'core-stacktrace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
x86/stacktrace: Use common infrastructure
stacktrace: Provide common infrastructure
lib/stackdepot: Remove obsolete functions
stacktrace: Remove obsolete functions
livepatch: Simplify stack trace retrieval
tracing: Remove the last struct stack_trace usage
tracing: Simplify stack trace retrieval
tracing: Make ftrace_trace_userstack() static and conditional
tracing: Use percpu stack trace buffer more intelligently
tracing: Simplify stacktrace retrieval in histograms
lockdep: Simplify stack trace handling
lockdep: Remove save argument from check_prev_add()
lockdep: Remove unused trace argument from print_circular_bug()
drm: Simplify stacktrace handling
dm persistent data: Simplify stack trace handling
dm bufio: Simplify stack trace retrieval
btrfs: ref-verify: Simplify stack trace retrieval
dma/debug: Simplify stracktrace retrieval
fault-inject: Simplify stacktrace retrieval
mm/page_owner: Simplify stack trace handling
...
DEBUG_LOCKS_WARN_ON() already contains an unlikely(), there is no need
for another one.
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: houtao1@huawei.com
Link: http://lkml.kernel.org/r/1556540791-23110-1-git-send-email-zhengbin13@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Replace the indirection through struct stack_trace by using the storage
array based interfaces and storing the information is a small lockdep
specific data structure.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: linux-mm@kvack.org
Cc: David Rientjes <rientjes@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: kasan-dev@googlegroups.com
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: iommu@lists.linux-foundation.org
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: David Sterba <dsterba@suse.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs@vger.kernel.org
Cc: dm-devel@redhat.com
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: intel-gfx@lists.freedesktop.org
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: dri-devel@lists.freedesktop.org
Cc: David Airlie <airlied@linux.ie>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: linux-arch@vger.kernel.org
Link: https://lkml.kernel.org/r/20190425094802.891724020@linutronix.de
The following warning occurred on s390:
WARNING: CPU: 0 PID: 804 at kernel/locking/lockdep.c:1025 lockdep_register_key+0x30/0x150
This is because the check in static_obj() assumes that all memory within
[_stext, _end] belongs to static objects, which at least for s390 isn't
true. The init section is also part of this range, and freeing it allows
the buddy allocator to allocate memory from it. We have virt == phys for
the kernel on s390, so that such allocations would then have addresses
within the range [_stext, _end].
To fix this, introduce arch_is_kernel_initmem_freed(), similar to
arch_is_kernel_text/data(), and add it to the checks in static_obj().
This will always return 0 on architectures that do not define
arch_is_kernel_initmem_freed. On s390, it will return 1 if initmem has
been freed and the address is in the range [__init_begin, __init_end].
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
check_prev_add_irq() tests all incompatible scenarios one after the
other while adding a lock (@next) to a tree dependency (@prev):
LOCK_USED_IN_HARDIRQ vs LOCK_ENABLED_HARDIRQ
LOCK_USED_IN_HARDIRQ_READ vs LOCK_ENABLED_HARDIRQ
LOCK_USED_IN_SOFTIRQ vs LOCK_ENABLED_SOFTIRQ
LOCK_USED_IN_SOFTIRQ_READ vs LOCK_ENABLED_SOFTIRQ
Also for these four scenarios, we must at least iterate the @prev
backward dependency. Then if it matches the relevant LOCK_USED_* bit,
we must also iterate the @next forward dependency.
Therefore in the best case we iterate 4 times, in the worst case 8 times.
A different approach can let us divide the number of branch iterations
by 4:
1) Iterate through @prev backward dependencies and accumulate all the IRQ
uses in a single mask. In the best case where the current lock hasn't
been used in IRQ, we stop here.
2) Iterate through @next forward dependencies and try to find a lock
whose usage is exclusive to the accumulated usages gathered in the
previous step. If we find one (call it @lockA), we have found an
incompatible use, otherwise we stop here. Only bad locking scenario
go further. So a sane verification stop here.
3) Iterate again through @prev backward dependency and find the lock
whose usage matches @lockA in term of incompatibility. Call that
lock @lockB.
4) Report the incompatible usages of @lockA and @lockB
If no incompatible use is found, the verification never goes beyond
step 2 which means at most two iterations.
The following compares the execution measurements of the function
check_prev_add_irq():
Number of calls | Avg (ns) | Stdev (ns) | Total time (ns)
------------------------------------------------------------------------
Mainline 8452 | 2652 | 11962 | 22415143
This patch 8452 | 1518 | 7090 | 12835602
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: https://lkml.kernel.org/r/20190402160244.32434-5-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In order to optimize check_irq_usage() and factorize all the IRQ usage
validations we'll need to be able to check multiple lock usage bits at
once. Prepare the low level usage mask check functions for that purpose.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: https://lkml.kernel.org/r/20190402160244.32434-4-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Clarify the code with mapping some more constant numbers that haven't
been named after their corresponding LOCK_USAGE_* symbol.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: https://lkml.kernel.org/r/20190402160244.32434-3-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
valid_state() and print_usage_bug*() functions are not used beyond
irq locking correctness checks under CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING.
Sadly the "unused function" warning wouldn't fire because valid_state()
is inline so the unused case has remained unseen until now.
So move them inside the appropriate CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING
section.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: https://lkml.kernel.org/r/20190402160244.32434-2-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If lockdep_register_key() and lockdep_unregister_key() are called with
debug_locks == false then the following warning is reported:
WARNING: CPU: 2 PID: 15145 at kernel/locking/lockdep.c:4920 lockdep_unregister_key+0x1ad/0x240
That warning is reported because lockdep_unregister_key() ignores the
value of 'debug_locks' and because the behavior of lockdep_register_key()
depends on whether or not 'debug_locks' is set. Fix this inconsistency
by making lockdep_unregister_key() take 'debug_locks' again into
account.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: shenghui <shhuiw@foxmail.com>
Fixes: 90c1cba2b3 ("locking/lockdep: Zap lock classes even with lock debugging disabled")
Link: http://lkml.kernel.org/r/20190415170538.23491-1-bvanassche@acm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
No architecture terminates the stack trace with ULONG_MAX anymore. Remove
the cruft.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Link: https://lkml.kernel.org/r/20190410103644.485737321@linutronix.de
The following commit:
a0b0fd53e1 ("locking/lockdep: Free lock classes that are no longer in use")
changed the behavior of lockdep_free_key_range() from
unconditionally zapping lock classes into only zapping lock classes if
debug_lock == true. Not zapping lock classes if debug_lock == false leaves
dangling pointers in several lockdep datastructures, e.g. lock_class::name
in the all_lock_classes list.
The shell command "cat /proc/lockdep" causes the kernel to iterate the
all_lock_classes list. Hence the "unable to handle kernel paging request" cash
that Shenghui encountered by running cat /proc/lockdep.
Since the new behavior can cause cat /proc/lockdep to crash, restore the
pre-v5.1 behavior.
This patch avoids that cat /proc/lockdep triggers the following crash
with debug_lock == false:
BUG: unable to handle kernel paging request at fffffbfff40ca448
RIP: 0010:__asan_load1+0x28/0x50
Call Trace:
string+0xac/0x180
vsnprintf+0x23e/0x820
seq_vprintf+0x82/0xc0
seq_printf+0x92/0xb0
print_name+0x34/0xb0
l_show+0x184/0x200
seq_read+0x59e/0x6c0
proc_reg_read+0x11f/0x170
__vfs_read+0x4d/0x90
vfs_read+0xc5/0x1f0
ksys_read+0xab/0x130
__x64_sys_read+0x43/0x50
do_syscall_64+0x71/0x210
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Reported-by: shenghui <shhuiw@foxmail.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Fixes: a0b0fd53e1 ("locking/lockdep: Free lock classes that are no longer in use") # v5.1-rc1.
Link: https://lkml.kernel.org/r/20190403233552.124673-1-bvanassche@acm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
init_data_structures_once() is called for the first time before RCU has
been initialized. Make sure that init_rcu_head() is called before the
RCU head is used and after RCU has been initialized.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: longman@redhat.com
Link: https://lkml.kernel.org/r/c20aa0f0-42ab-a884-d931-7d4ec2bf0cdc@acm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Clang warns about a tentative array definition without a length:
kernel/locking/lockdep.c:845:12: error: tentative array definition assumed to have one element [-Werror]
There is no real reason to do this here, so just set the same length as
in the real definition later in the same file. It has to be hidden in
an #ifdef or annotated __maybe_unused though, to avoid the unused-variable
warning if CONFIG_PROVE_LOCKING is disabled.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Link: https://lkml.kernel.org/r/20190307075222.3424524-1-arnd@arndb.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull perf updates from Ingo Molnar:
"Lots of tooling updates - too many to list, here's a few highlights:
- Various subcommand updates to 'perf trace', 'perf report', 'perf
record', 'perf annotate', 'perf script', 'perf test', etc.
- CPU and NUMA topology and affinity handling improvements,
- HW tracing and HW support updates:
- Intel PT updates
- ARM CoreSight updates
- vendor HW event updates
- BPF updates
- Tons of infrastructure updates, both on the build system and the
library support side
- Documentation updates.
- ... and lots of other changes, see the changelog for details.
Kernel side updates:
- Tighten up kprobes blacklist handling, reduce the number of places
where developers can install a kprobe and hang/crash the system.
- Fix/enhance vma address filter handling.
- Various PMU driver updates, small fixes and additions.
- refcount_t conversions
- BPF updates
- error code propagation enhancements
- misc other changes"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (238 commits)
perf script python: Add Python3 support to syscall-counts-by-pid.py
perf script python: Add Python3 support to syscall-counts.py
perf script python: Add Python3 support to stat-cpi.py
perf script python: Add Python3 support to stackcollapse.py
perf script python: Add Python3 support to sctop.py
perf script python: Add Python3 support to powerpc-hcalls.py
perf script python: Add Python3 support to net_dropmonitor.py
perf script python: Add Python3 support to mem-phys-addr.py
perf script python: Add Python3 support to failed-syscalls-by-pid.py
perf script python: Add Python3 support to netdev-times.py
perf tools: Add perf_exe() helper to find perf binary
perf script: Handle missing fields with -F +..
perf data: Add perf_data__open_dir_data function
perf data: Add perf_data__(create_dir|close_dir) functions
perf data: Fail check_backup in case of error
perf data: Make check_backup work over directories
perf tools: Add rm_rf_perf_data function
perf tools: Add pattern name checking to rm_rf
perf tools: Add depth checking to rm_rf
perf data: Add global path holder
...
And move the whole lot under CONFIG_DEBUG_LOCKDEP.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A shortcoming of the current lockdep implementation is that it requires
lock keys to be allocated statically. That forces all instances of lock
objects that occur in a given data structure to share a lock key. Since
lock dependency analysis groups lock objects per key sharing lock keys
can cause false positive lockdep reports. Make it possible to avoid
such false positive reports by allowing lock keys to be allocated
dynamically. Require that dynamically allocated lock keys are
registered before use by calling lockdep_register_key(). Complain about
attempts to register the same lock key pointer twice without calling
lockdep_unregister_key() between successive registration calls.
The purpose of the new lock_keys_hash[] data structure that keeps
track of all dynamic keys is twofold:
- Verify whether the lockdep_register_key() and lockdep_unregister_key()
functions are used correctly.
- Avoid that lockdep_init_map() complains when encountering a dynamically
allocated key.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: johannes.berg@intel.com
Cc: tj@kernel.org
Link: https://lkml.kernel.org/r/20190214230058.196511-19-bvanassche@acm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Debugging lockdep data structure inconsistencies is challenging. Add
code that verifies data structure consistency at runtime. That code is
disabled by default because it is very CPU intensive.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: johannes.berg@intel.com
Cc: tj@kernel.org
Link: https://lkml.kernel.org/r/20190214230058.196511-17-bvanassche@acm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A previous patch introduced a lock chain leak. Fix that leak by reusing
lock chains that have been freed.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: johannes.berg@intel.com
Cc: tj@kernel.org
Link: https://lkml.kernel.org/r/20190214230058.196511-16-bvanassche@acm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reflect that add_chain_cache() is always called with the graph lock held.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: johannes.berg@intel.com
Cc: tj@kernel.org
Link: https://lkml.kernel.org/r/20190214230058.196511-15-bvanassche@acm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch does not change any functionality but makes the next patch in
this series easier to read.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: johannes.berg@intel.com
Cc: tj@kernel.org
Link: https://lkml.kernel.org/r/20190214230058.196511-14-bvanassche@acm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Instead of abandoning elements of list_entries[] that are no longer in
use, make alloc_list_entry() reuse array elements that have been freed.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: johannes.berg@intel.com
Cc: tj@kernel.org
Link: https://lkml.kernel.org/r/20190214230058.196511-13-bvanassche@acm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Instead of leaving lock classes that are no longer in use in the
lock_classes array, reuse entries from that array that are no longer in
use. Maintain a linked list of free lock classes with list head
'free_lock_class'. Only add freed lock classes to the free_lock_classes
list after a grace period to avoid that a lock_classes[] element would
be reused while an RCU reader is accessing it. Since the lockdep
selftests run in a context where sleeping is not allowed and since the
selftests require that lock resetting/zapping works with debug_locks
off, make the behavior of lockdep_free_key_range() and
lockdep_reset_lock() depend on whether or not these are called from
the context of the lockdep selftests.
Thanks to Peter for having shown how to modify get_pending_free()
such that that function does not have to sleep.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: johannes.berg@intel.com
Cc: tj@kernel.org
Link: https://lkml.kernel.org/r/20190214230058.196511-12-bvanassche@acm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
synchronize_sched() has been removed recently. Update the comments that
refer to synchronize_sched().
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: johannes.berg@intel.com
Cc: tj@kernel.org
Fixes: 51959d85f3 ("lockdep: Replace synchronize_sched() with synchronize_rcu()") # v5.0-rc1
Link: https://lkml.kernel.org/r/20190214230058.196511-11-bvanassche@acm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The patch that frees unused lock classes will modify the behavior of
lockdep_free_key_range() and lockdep_reset_lock() depending on whether
or not these functions are called from the context of the lockdep
selftests. Hence make it easy to detect whether or not lockdep code
is called from the context of a lockdep selftest.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: johannes.berg@intel.com
Cc: tj@kernel.org
Link: https://lkml.kernel.org/r/20190214230058.196511-10-bvanassche@acm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch does not change the behavior of these functions but makes the
patch that frees unused lock classes easier to read.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: johannes.berg@intel.com
Cc: tj@kernel.org
Link: https://lkml.kernel.org/r/20190214230058.196511-9-bvanassche@acm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch does not change any functionality. A later patch will reuse
lock classes that have been freed. In combination with that patch this
patch wil have the effect of initializing lock class order lists once
instead of every time a lock class structure is reinitialized.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: johannes.berg@intel.com
Cc: tj@kernel.org
Link: https://lkml.kernel.org/r/20190214230058.196511-8-bvanassche@acm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Make sure that all lock order entries that refer to a class are removed
from the list_entries[] array when a kernel module is unloaded.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: johannes.berg@intel.com
Cc: tj@kernel.org
Link: https://lkml.kernel.org/r/20190214230058.196511-7-bvanassche@acm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Make sure that add_chain_cache() returns 0 and does not modify the
chain hash if nr_chain_hlocks == MAX_LOCKDEP_CHAIN_HLOCKS before this
function is called.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: johannes.berg@intel.com
Cc: tj@kernel.org
Link: https://lkml.kernel.org/r/20190214230058.196511-5-bvanassche@acm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Lock chains are only tracked with CONFIG_PROVE_LOCKING=y. Do not report
the memory required for the lock chain array if CONFIG_PROVE_LOCKING=n.
See also commit:
ca58abcb4a ("lockdep: sanitise CONFIG_PROVE_LOCKING")
Include the size of the chain_hlocks[] array.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: johannes.berg@intel.com
Cc: tj@kernel.org
Link: https://lkml.kernel.org/r/20190214230058.196511-4-bvanassche@acm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Change the sizeof(array element time) * (array size) expressions into
sizeof(array). This fixes the size computations of the classhash_table[]
and chainhash_table[] arrays.
The reason is that commit:
a63f38cc4c ("locking/lockdep: Convert hash tables to hlists")
changed the type of the elements of that array from 'struct list_head' into
'struct hlist_head'.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: johannes.berg@intel.com
Cc: tj@kernel.org
Link: https://lkml.kernel.org/r/20190214230058.196511-3-bvanassche@acm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Use %zu to format size_t instead of %lu to avoid that the compiler
complains about a mismatch between format specifier and argument on
32-bit systems.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: johannes.berg@intel.com
Cc: tj@kernel.org
Link: https://lkml.kernel.org/r/20190214230058.196511-2-bvanassche@acm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Some lockdep functions can be involved in breakpoint handling
and probing on those functions can cause a breakpoint recursion.
Prohibit probing on those functions by blacklist.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrea Righi <righi.andrea@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/154998810578.31052.1680977921449292812.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tetsuo Handa had reported he saw an incorrect "downgrading a read lock"
warning right after a previous lockdep warning. It is likely that the
previous warning turned off lock debugging causing the lockdep to have
inconsistency states leading to the lock downgrade warning.
Fix that by add a check for debug_locks at the beginning of
__lock_downgrade().
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Reported-by: syzbot+53383ae265fb161ef488@syzkaller.appspotmail.com
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: https://lkml.kernel.org/r/1547093005-26085-1-git-send-email-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It makes the code more self-explanatory and tells throughout the code
what magic number refers to:
- state (Hardirq/Softirq)
- direction (used in or enabled above state)
- read or write
We can even remove some comments that were compensating for the lack of
those constant names.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: https://lkml.kernel.org/r/1545973321-24422-3-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The enum mark_type appears a bit artificial here. We can directly pass
the base enum lock_usage_bit value to mark_held_locks(). All we need
then is to add the read index for each lock if necessary. It makes the
code clearer.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: https://lkml.kernel.org/r/1545973321-24422-2-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tetsuo Handa had reported he saw an incorrect "downgrading a read lock"
warning right after a previous lockdep warning. It is likely that the
previous warning turned off lock debugging causing the lockdep to have
inconsistency states leading to the lock downgrade warning.
Fix that by add a check for debug_locks at the beginning of
__lock_downgrade().
Debugged-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Reported-by: syzbot+53383ae265fb161ef488@syzkaller.appspotmail.com
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: https://lkml.kernel.org/r/1547093005-26085-1-git-send-email-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull locking updates from Ingo Molnar:
"The main change in this cycle are initial preparatory bits of dynamic
lockdep keys support from Bart Van Assche.
There are also misc changes, a comment cleanup and a data structure
cleanup"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Clean up comment in nohz_idle_balance()
locking/lockdep: Stop using RCU primitives to access 'all_lock_classes'
locking/lockdep: Make concurrent lockdep_reset_lock() calls safe
locking/lockdep: Remove a superfluous INIT_LIST_HEAD() statement
locking/lockdep: Introduce lock_class_cache_is_registered()
locking/lockdep: Inline __lockdep_init_map()
locking/lockdep: Declare local symbols static
tools/lib/lockdep/tests: Test the lockdep_reset_lock() implementation
tools/lib/lockdep: Add dummy print_irqtrace_events() implementation
tools/lib/lockdep: Rename "trywlock" into "trywrlock"
tools/lib/lockdep/tests: Run lockdep tests a second time under Valgrind
tools/lib/lockdep/tests: Improve testing accuracy
tools/lib/lockdep/tests: Fix shellcheck warnings
tools/lib/lockdep/tests: Display compiler warning and error messages
locking/lockdep: Remove ::version from lock_class structure
Due to the previous patch all code that accesses the 'all_lock_classes'
list holds the graph lock. Hence use regular list primitives instead of
their RCU variants to access this list.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: johannes.berg@intel.com
Cc: tj@kernel.org
Link: https://lkml.kernel.org/r/20181207011148.251812-14-bvanassche@acm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since zap_class() removes items from the all_lock_classes list and the
classhash_table, protect all zap_class() calls against concurrent
data structure modifications with the graph lock.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: johannes.berg@intel.com
Cc: tj@kernel.org
Link: https://lkml.kernel.org/r/20181207011148.251812-13-bvanassche@acm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Initializing a list entry just before it is passed to list_add_tail_rcu()
is not necessary because list_add_tail_rcu() overwrites the next and prev
pointers anyway. Hence remove the INIT_LIST_HEAD() statement.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: johannes.berg@intel.com
Cc: tj@kernel.org
Link: https://lkml.kernel.org/r/20181207011148.251812-12-bvanassche@acm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch does not change any functionality but makes the
lockdep_reset_lock() function easier to read.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: johannes.berg@intel.com
Cc: tj@kernel.org
Link: https://lkml.kernel.org/r/20181207011148.251812-11-bvanassche@acm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since the function __lockdep_init_map() only has one caller, inline it
into its caller. This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: johannes.berg@intel.com
Cc: tj@kernel.org
Link: https://lkml.kernel.org/r/20181207011148.251812-10-bvanassche@acm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch avoids that sparse complains about a missing declaration for
the lock_classes array when building with CONFIG_DEBUG_LOCKDEP=n.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: johannes.berg@intel.com
Cc: tj@kernel.org
Link: https://lkml.kernel.org/r/20181207011148.251812-9-bvanassche@acm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that synchronize_rcu() waits for preempt-disable regions of code
as well as RCU read-side critical sections, synchronize_sched() can be
replaced by synchronize_rcu(). This commit therefore makes this change.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
It was found that when debug_locks was turned off because of a problem
found by the lockdep code, the system performance could drop quite
significantly when the lock_stat code was also configured into the
kernel. For instance, parallel kernel build time on a 4-socket x86-64
server nearly doubled.
Further analysis into the cause of the slowdown traced back to the
frequent call to debug_locks_off() from the __lock_acquired() function
probably due to some inconsistent lockdep states with debug_locks
off. The debug_locks_off() function did an unconditional atomic xchg
to write a 0 value into debug_locks which had already been set to 0.
This led to severe cacheline contention in the cacheline that held
debug_locks. As debug_locks is being referenced in quite a few different
places in the kernel, this greatly slow down the system performance.
To prevent that trashing of debug_locks cacheline, lock_acquired()
and lock_contended() now checks the state of debug_locks before
proceeding. The debug_locks_off() function is also modified to check
debug_locks before calling __debug_locks_off().
Signed-off-by: Waiman Long <longman@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/1539913518-15598-1-git-send-email-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Remove the duplicated 'lock_class_ops' percpu array that is not used
anywhere.
Signed-off-by: Waiman Long <longman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Fixes: 8ca2b56cd7 ("locking/lockdep: Make class->ops a percpu counter and move it under CONFIG_DEBUG_LOCKDEP=y")
Link: http://lkml.kernel.org/r/1539380547-16726-1-git-send-email-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A sizable portion of the CPU cycles spent on the __lock_acquire() is used
up by the atomic increment of the class->ops stat counter. By taking it out
from the lock_class structure and changing it to a per-cpu per-lock-class
counter, we can reduce the amount of cacheline contention on the class
structure when multiple CPUs are trying to acquire locks of the same
class simultaneously.
To limit the increase in memory consumption because of the percpu nature
of that counter, it is now put back under the CONFIG_DEBUG_LOCKDEP
config option. So the memory consumption increase will only occur if
CONFIG_DEBUG_LOCKDEP is defined. The lock_class structure, however,
is reduced in size by 16 bytes on 64-bit archs after ops removal and
a minor restructuring of the fields.
This patch also fixes a bug in the increment code as the counter is of
the 'unsigned long' type, but atomic_inc() was used to increment it.
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/d66681f3-8781-9793-1dcf-2436a284550b@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When __lock_release() is called, the most likely unlock scenario is
on the innermost lock in the chain. In this case, we can skip some of
the checks and provide a faster path to completion.
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/1538511560-10090-4-git-send-email-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The static __lock_acquire() function has only two callers:
1) lock_acquire()
2) reacquire_held_locks()
In lock_acquire(), raw_local_irq_save() is called beforehand. So
IRQs must have been disabled. So the check:
DEBUG_LOCKS_WARN_ON(!irqs_disabled())
is kind of redundant in this case. So move the above check
to reacquire_held_locks() to eliminate redundant code in the
lock_acquire() path.
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/1538511560-10090-3-git-send-email-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The inline function add_chain_cache_classes() is defined, but has no
caller. Just remove it.
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/1538511560-10090-2-git-send-email-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit:
c3bc8fd637 ("tracing: Centralize preemptirq tracepoints and unify their usage")
added the inclusion of <trace/events/preemptirq.h>.
liblockdep doesn't have a stub version of that header so now fails to build.
However, commit:
bff1b208a5 ("tracing: Partial revert of "tracing: Centralize preemptirq tracepoints and unify their usage"")
removed the use of functions declared in that header. So delete the #include.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <alexander.levin@verizon.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Fixes: bff1b208a5 ("tracing: Partial revert of "tracing: Centralize ...")
Fixes: c3bc8fd637 ("tracing: Centralize preemptirq tracepoints ...")
Link: http://lkml.kernel.org/r/20180828203315.GD18030@decadent.org.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Joel Fernandes created a nice patch that cleaned up the duplicate hooks used
by lockdep and irqsoff latency tracer. It made both use tracepoints. But it
caused lockdep to trigger several false positives. We have not figured out
why yet, but removing lockdep from using the trace event hooks and just call
its helper functions directly (like it use to), makes the problem go away.
This is a partial revert of the clean up patch c3bc8fd637 ("tracing:
Centralize preemptirq tracepoints and unify their usage") that adds direct
calls for lockdep, but also keeps most of the clean up done to get rid of
the horrible preprocessor if statements.
Link: http://lkml.kernel.org/r/20180806155058.5ee875f4@gandalf.local.home
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Fixes: c3bc8fd637 ("tracing: Centralize preemptirq tracepoints and unify their usage")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
This patch detaches the preemptirq tracepoints from the tracers and
keeps it separate.
Advantages:
* Lockdep and irqsoff event can now run in parallel since they no longer
have their own calls.
* This unifies the usecase of adding hooks to an irqsoff and irqson
event, and a preemptoff and preempton event.
3 users of the events exist:
- Lockdep
- irqsoff and preemptoff tracers
- irqs and preempt trace events
The unification cleans up several ifdefs and makes the code in preempt
tracer and irqsoff tracers simpler. It gets rid of all the horrific
ifdeferry around PROVE_LOCKING and makes configuration of the different
users of the tracepoints more easy and understandable. It also gets rid
of the time_* function calls from the lockdep hooks used to call into
the preemptirq tracer which is not needed anymore. The negative delta in
lines of code in this patch is quite large too.
In the patch we introduce a new CONFIG option PREEMPTIRQ_TRACEPOINTS
as a single point for registering probes onto the tracepoints. With
this,
the web of config options for preempt/irq toggle tracepoints and its
users becomes:
PREEMPT_TRACER PREEMPTIRQ_EVENTS IRQSOFF_TRACER PROVE_LOCKING
| | \ | |
\ (selects) / \ \ (selects) /
TRACE_PREEMPT_TOGGLE ----> TRACE_IRQFLAGS
\ /
\ (depends on) /
PREEMPTIRQ_TRACEPOINTS
Other than the performance tests mentioned in the previous patch, I also
ran the locking API test suite. I verified that all tests cases are
passing.
I also injected issues by not registering lockdep probes onto the
tracepoints and I see failures to confirm that the probes are indeed
working.
This series + lockdep probes not registered (just to inject errors):
[ 0.000000] hard-irqs-on + irq-safe-A/21: ok | ok | ok |
[ 0.000000] soft-irqs-on + irq-safe-A/21: ok | ok | ok |
[ 0.000000] sirq-safe-A => hirqs-on/12:FAILED|FAILED| ok |
[ 0.000000] sirq-safe-A => hirqs-on/21:FAILED|FAILED| ok |
[ 0.000000] hard-safe-A + irqs-on/12:FAILED|FAILED| ok |
[ 0.000000] soft-safe-A + irqs-on/12:FAILED|FAILED| ok |
[ 0.000000] hard-safe-A + irqs-on/21:FAILED|FAILED| ok |
[ 0.000000] soft-safe-A + irqs-on/21:FAILED|FAILED| ok |
[ 0.000000] hard-safe-A + unsafe-B #1/123: ok | ok | ok |
[ 0.000000] soft-safe-A + unsafe-B #1/123: ok | ok | ok |
With this series + lockdep probes registered, all locking tests pass:
[ 0.000000] hard-irqs-on + irq-safe-A/21: ok | ok | ok |
[ 0.000000] soft-irqs-on + irq-safe-A/21: ok | ok | ok |
[ 0.000000] sirq-safe-A => hirqs-on/12: ok | ok | ok |
[ 0.000000] sirq-safe-A => hirqs-on/21: ok | ok | ok |
[ 0.000000] hard-safe-A + irqs-on/12: ok | ok | ok |
[ 0.000000] soft-safe-A + irqs-on/12: ok | ok | ok |
[ 0.000000] hard-safe-A + irqs-on/21: ok | ok | ok |
[ 0.000000] soft-safe-A + irqs-on/21: ok | ok | ok |
[ 0.000000] hard-safe-A + unsafe-B #1/123: ok | ok | ok |
[ 0.000000] soft-safe-A + unsafe-B #1/123: ok | ok | ok |
Link: http://lkml.kernel.org/r/20180730222423.196630-4-joel@joelfernandes.org
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
get_cpu_var disables preemption which has the potential to call into the
preemption disable trace points causing some complications. There's also
no need to disable preemption in uses of get_lock_stats anyway since
preempt is already disabled. So lets simplify the code.
Link: http://lkml.kernel.org/r/20180730222423.196630-2-joel@joelfernandes.org
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
While debugging where things were going wrong with mapping
enabling/disabling interrupts with the lockdep state and actual real
enabling and disabling interrupts, I had to silent the IRQ
disabling/enabling in debug_check_no_locks_freed() because it was
always showing up as it was called before the splat was.
Use raw_local_irq_save/restore() for not only debug_check_no_locks_freed()
but for all internal lockdep functions, as they hide useful information
about where interrupts were used incorrectly last.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: https://lkml.kernel.org/lkml/20180404140630.3f4f4c7a@gandalf.local.home
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Calling lockdep_print_held_locks() on a running thread is considered unsafe.
Since all callers should follow that rule and the sanity check is not heavy,
this patch moves the sanity check to inside lockdep_print_held_locks().
As a side effect of this patch, the number of locks held by running threads
will be printed as well. This change will be preferable when we want to
know which threads might be relevant to a problem but are unable to print
any clues because that thread is running.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1523011279-8206-2-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Ingo Molnar <mingo@kernel.org>
debug_show_all_locks() tries to grab the tasklist_lock for two seconds, but
calling while_each_thread() without tasklist_lock held is not safe.
See the following commit for more information:
4449a51a7c ("vm_is_stack: use for_each_thread() rather then buggy while_each_thread()")
Change debug_show_all_locks() from "do_each_thread()/while_each_thread()
with possibility of missing tasklist_lock" to "for_each_process_thread()
with RCU", and add a call to touch_all_softlockup_watchdogs() like
show_state_filter() does.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1523011279-8206-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The lock debug output in print_lock() has a few shortcomings:
- It prints the hlock->acquire_ip field in %px and %pS format. That's
redundant information.
- It lacks information about the lock object itself. The lock class is
not helpful to identify a particular instance of a lock.
Change the output so it prints:
- hlock->instance to allow identification of a particular lock instance.
- only the %pS format of hlock->ip_acquire which is sufficient to decode
the actual code line with faddr2line.
The resulting output is:
3 locks held by a.out/31106:
#0: 00000000b0f753ba (&mm->mmap_sem){++++}, at: copy_process.part.41+0x10d5/0x1fe0
#1: 00000000ef64d539 (&mm->mmap_sem/1){+.+.}, at: copy_process.part.41+0x10fe/0x1fe0
#2: 00000000b41a282e (&mapping->i_mmap_rwsem){++++}, at: copy_process.part.41+0x12f2/0x1fe0
[ tglx: Massaged changelog ]
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: linux-mm@kvack.org
Cc: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/201803271941.GBE57310.tVSOJLQOFFOHFM@I-love.SAKURA.ne.jp
Show unadorned pointers in lockdep reports - lockdep is a debugging
facility and hashing pointers there doesn't make a whole lotta sense.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20180226134926.23069-1-bp@alien8.de
Pull locking updates from Ingo Molnar:
"The main changes relate to making lock_is_held() et al (and external
wrappers of them) work on const data types - this requires const
propagation through the depths of lockdep.
This removes a number of ugly type hacks the external helpers used"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lockdep: Convert some users to const
lockdep: Make lockdep checking constant
lockdep: Assign lock keys on registration
debug_show_all_locks() iterates all tasks and print held locks whole
holding tasklist_lock. This can take a while on a slow console device
and may end up triggering NMI hardlockup detector if someone else ends
up waiting for tasklist_lock.
Touch the NMI watchdog while printing the held locks to avoid
spuriously triggering the hardlockup detector.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@fb.com
Link: http://lkml.kernel.org/r/20180122220055.GB1771050@devbig577.frc2.facebook.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There are several places in the kernel which would like to pass a const
pointer to lockdep_is_held(). Constify the entire path so nobody has to
trick the compiler.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Link: https://lkml.kernel.org/r/20180117151414.23686-3-willy@infradead.org
Lockdep is assigning lock keys when a lock was looked up. This is
unnecessary; if the lock has never been registered then it is known that it
is not locked. It also complicates the calling convention. Switch to
assigning the lock key in register_lock_class().
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Link: https://lkml.kernel.org/r/20180117151414.23686-2-willy@infradead.org
This code (CONFIG_LOCKDEP_CROSSRELEASE=y and CONFIG_LOCKDEP_COMPLETIONS=y),
while it found a number of old bugs initially, was also causing too many
false positives that caused people to disable lockdep - which is arguably
a worse overall outcome.
If we disable cross-release by default but keep the code upstream then
in practice the most likely outcome is that we'll allow the situation
to degrade gradually, by allowing entropy to introduce more and more
false positives, until it overwhelms maintenance capacity.
Another bad side effect was that people were trying to work around
the false positives by uglifying/complicating unrelated code. There's
a marked difference between annotating locking operations and
uglifying good code just due to bad lock debugging code ...
This gradual decrease in quality happened to a number of debugging
facilities in the kernel, and lockdep is pretty complex already,
so we cannot risk this outcome.
Either cross-release checking can be done right with no false positives,
or it should not be included in the upstream kernel.
( Note that it might make sense to maintain it out of tree and go through
the false positives every now and then and see whether new bugs were
introduced. )
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Patch series "kmemcheck: kill kmemcheck", v2.
As discussed at LSF/MM, kill kmemcheck.
KASan is a replacement that is able to work without the limitation of
kmemcheck (single CPU, slow). KASan is already upstream.
We are also not aware of any users of kmemcheck (or users who don't
consider KASan as a suitable replacement).
The only objection was that since KASAN wasn't supported by all GCC
versions provided by distros at that time we should hold off for 2
years, and try again.
Now that 2 years have passed, and all distros provide gcc that supports
KASAN, kill kmemcheck again for the very same reasons.
This patch (of 4):
Remove kmemcheck annotations, and calls to kmemcheck from the kernel.
[alexander.levin@verizon.com: correctly remove kmemcheck call from dma_map_sg_attrs]
Link: http://lkml.kernel.org/r/20171012192151.26531-1-alexander.levin@verizon.com
Link: http://lkml.kernel.org/r/20171007030159.22241-2-alexander.levin@verizon.com
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Hansen <devtimhansen@gmail.com>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johan Hovold reported a heavy performance regression caused by lockdep
cross-release:
> Boot time (from "Linux version" to login prompt) had in fact doubled
> since 4.13 where it took 17 seconds (with my current config) compared to
> the 35 seconds I now see with 4.14-rc4.
>
> I quick bisect pointed to lockdep and specifically the following commit:
>
> 28a903f63e ("locking/lockdep: Handle non(or multi)-acquisition
> of a crosslock")
>
> which I've verified is the commit which doubled the boot time (compared
> to 28a903f63ec0^) (added by lockdep crossrelease series [1]).
Currently cross-release performs unwind on every acquisition, but that
is very expensive.
This patch makes unwind optional and disables it by default and only
records acquire_ip.
Full stack traces are sometimes required for full analysis, in which
case a boot paramter, crossrelease_fullstack, can be specified.
On my qemu Ubuntu machine (x86_64, 4 cores, 512M), the regression was
fixed. We measure boot times with 'perf stat --null --repeat 10 $QEMU',
where $QEMU launches a kernel with init=/bin/true:
1. No lockdep enabled:
Performance counter stats for 'qemu_booting_time.sh bzImage' (10 runs):
2.756558155 seconds time elapsed ( +- 0.09% )
2. Lockdep enabled:
Performance counter stats for 'qemu_booting_time.sh bzImage' (10 runs):
2.968710420 seconds time elapsed ( +- 0.12% )
3. Lockdep enabled + cross-release enabled:
Performance counter stats for 'qemu_booting_time.sh bzImage' (10 runs):
3.153839636 seconds time elapsed ( +- 0.31% )
4. Lockdep enabled + cross-release enabled + this patch applied:
Performance counter stats for 'qemu_booting_time.sh bzImage' (10 runs):
2.963669551 seconds time elapsed ( +- 0.11% )
I.e. lockdep cross-release performance is now indistinguishable from
vanilla lockdep.
Bisected-by: Johan Hovold <johan@kernel.org>
Analyzed-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: amir73il@gmail.com
Cc: axboe@kernel.dk
Cc: darrick.wong@oracle.com
Cc: david@fromorbit.com
Cc: hch@infradead.org
Cc: idryomov@gmail.com
Cc: johannes.berg@intel.com
Cc: kernel-team@lge.com
Cc: linux-block@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-xfs@vger.kernel.org
Cc: oleg@redhat.com
Cc: tj@kernel.org
Link: http://lkml.kernel.org/r/1508921765-15396-5-git-send-email-byungchul.park@lge.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There is some complication between check_prevs_add() and
check_prev_add() wrt. saving stack traces. The problem is that we want
to be frugal with saving stack traces, since it consumes static
resources.
We'll only know in check_prev_add() if we need the trace, but we can
call into it multiple times. So we want to do on-demand and re-use.
A further complication is that check_prev_add() can drop graph_lock
and mess with our static resources.
In any case, the current state; after commit:
ce07a9415f ("locking/lockdep: Make check_prev_add() able to handle external stack_trace")
is that we'll assume the trace contains valid data once
check_prev_add() returns '2'. However, as noted by Josh, this is
false, check_prev_add() can return '2' before having saved a trace,
this then result in the possibility of using uninitialized data.
Testing, as reported by Wu, shows a NULL deref.
So simplify.
Since the graph_lock() thing is a debug path that hasn't
really been used in a long while, take it out back and avoid the
head-ache.
Further initialize the stack_trace to a known 'empty' state; as long
as nr_entries == 0, nothing should deref entries. We can then use the
'entries == NULL' test for a valid trace / on-demand saving.
Analyzed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: ce07a9415f ("locking/lockdep: Make check_prev_add() able to handle external stack_trace")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Where XHLOCK_{SOFT,HARD} are save/restore points in the xhlocks[] to
ensure the temporal IRQ events don't interact with task state, the
XHLOCK_PROC is a fundament different beast that just happens to share
the interface.
The purpose of XHLOCK_PROC is to annotate independent execution inside
one task. For example workqueues, each work should appear to run in its
own 'pristine' 'task'.
Remove XHLOCK_PROC in favour of its own interface to avoid confusion.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boqun.feng@gmail.com
Cc: david@fromorbit.com
Cc: johannes@sipsolutions.net
Cc: kernel-team@lge.com
Cc: oleg@redhat.com
Cc: tj@kernel.org
Link: http://lkml.kernel.org/r/20170829085939.ggmb6xiohw67micb@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The new completion/crossrelease annotations interact unfavourable with
the extant flush_work()/flush_workqueue() annotations.
The problem is that when a single work class does:
wait_for_completion(&C)
and
complete(&C)
in different executions, we'll build dependencies like:
lock_map_acquire(W)
complete_acquire(C)
and
lock_map_acquire(W)
complete_release(C)
which results in the dependency chain: W->C->W, which lockdep thinks
spells deadlock, even though there is no deadlock potential since
works are ran concurrently.
One possibility would be to change the work 'lock' to recursive-read,
but that would mean hitting a lockdep limitation on recursive locks.
Also, unconditinoally switching to recursive-read here would fail to
detect the actual deadlock on single-threaded workqueues, which do
have a problem with this.
For now, forcefully disregard these locks for crossrelease.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boqun.feng@gmail.com
Cc: byungchul.park@lge.com
Cc: david@fromorbit.com
Cc: johannes@sipsolutions.net
Cc: oleg@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As Boqun Feng pointed out, current->hist_id should be aligned with the
latest valid xhlock->hist_id so that hist_id_save[] storing current->hist_id
can be comparable with xhlock->hist_id. Fix it.
Additionally, the condition for overwrite-detection should be the
opposite. Fix the code and the comments as well.
<- direction to visit
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh (h: history)
^^ ^
|| start from here
|previous entry
current entry
Reported-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: kernel-team@lge.com
Cc: kirill@shutemov.name
Cc: linux-mm@kvack.org
Cc: npiggin@gmail.com
Cc: walken@google.com
Cc: willy@infradead.org
Link: http://lkml.kernel.org/r/1502694052-16085-3-git-send-email-byungchul.park@lge.com
[ Improve the comments some more. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
No acquisition might be in progress on commit of a crosslock. Completion
operations enabling crossrelease are the case like:
CONTEXT X CONTEXT Y
--------- ---------
trigger completion context
complete AX
commit AX
wait_for_complete AX
acquire AX
wait
where AX is a crosslock.
When no acquisition is in progress, we should not perform commit because
the lock does not exist, which might cause incorrect memory access. So
we have to track the number of acquisitions of a crosslock to handle it.
Moreover, in case that more than one acquisition of a crosslock are
overlapped like:
CONTEXT W CONTEXT X CONTEXT Y CONTEXT Z
--------- --------- --------- ---------
acquire AX (gen_id: 1)
acquire A
acquire AX (gen_id: 10)
acquire B
commit AX
acquire C
commit AX
where A, B and C are typical locks and AX is a crosslock.
Current crossrelease code performs commits in Y and Z with gen_id = 10.
However, we can use gen_id = 1 to do it, since not only 'acquire AX in X'
but 'acquire AX in W' also depends on each acquisition in Y and Z until
their commits. So make it use gen_id = 1 instead of 10 on their commits,
which adds an additional dependency 'AX -> A' in the example above.
Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: boqun.feng@gmail.com
Cc: kernel-team@lge.com
Cc: kirill@shutemov.name
Cc: npiggin@gmail.com
Cc: walken@google.com
Cc: willy@infradead.org
Link: http://lkml.kernel.org/r/1502089981-21272-8-git-send-email-byungchul.park@lge.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The ring buffer can be overwritten by hardirq/softirq/work contexts.
That cases must be considered on rollback or commit. For example,
|<------ hist_lock ring buffer size ----->|
ppppppppppppiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
wrapped > iiiiiiiiiiiiiiiiiiiiiii....................
where 'p' represents an acquisition in process context,
'i' represents an acquisition in irq context.
On irq exit, crossrelease tries to rollback idx to original position,
but it should not because the entry already has been invalid by
overwriting 'i'. Avoid rollback or commit for entries overwritten.
Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: boqun.feng@gmail.com
Cc: kernel-team@lge.com
Cc: kirill@shutemov.name
Cc: npiggin@gmail.com
Cc: walken@google.com
Cc: willy@infradead.org
Link: http://lkml.kernel.org/r/1502089981-21272-7-git-send-email-byungchul.park@lge.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Lockdep is a runtime locking correctness validator that detects and
reports a deadlock or its possibility by checking dependencies between
locks. It's useful since it does not report just an actual deadlock but
also the possibility of a deadlock that has not actually happened yet.
That enables problems to be fixed before they affect real systems.
However, this facility is only applicable to typical locks, such as
spinlocks and mutexes, which are normally released within the context in
which they were acquired. However, synchronization primitives like page
locks or completions, which are allowed to be released in any context,
also create dependencies and can cause a deadlock.
So lockdep should track these locks to do a better job. The 'crossrelease'
implementation makes these primitives also be tracked.
Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: boqun.feng@gmail.com
Cc: kernel-team@lge.com
Cc: kirill@shutemov.name
Cc: npiggin@gmail.com
Cc: walken@google.com
Cc: willy@infradead.org
Link: http://lkml.kernel.org/r/1502089981-21272-6-git-send-email-byungchul.park@lge.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently, a space for stack_trace is pinned in check_prev_add(), that
makes us not able to use external stack_trace. The simplest way to
achieve it is to pass an external stack_trace as an argument.
A more suitable solution is to pass a callback additionally along with
a stack_trace so that callers can decide the way to save or whether to
save. Actually crossrelease needs to do other than saving a stack_trace.
So pass a stack_trace and callback to handle it, to check_prev_add().
Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: boqun.feng@gmail.com
Cc: kernel-team@lge.com
Cc: kirill@shutemov.name
Cc: npiggin@gmail.com
Cc: walken@google.com
Cc: willy@infradead.org
Link: http://lkml.kernel.org/r/1502089981-21272-5-git-send-email-byungchul.park@lge.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Firstly, return 1 instead of 2 when 'prev -> next' dependency already
exists. Since the value 2 is not referenced anywhere, just return 1
indicating success in this case.
Secondly, return 2 instead of 1 when successfully added a lock_list
entry with saving stack_trace. With that, a caller can decide whether
to avoid redundant save_trace() on the caller site.
Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: boqun.feng@gmail.com
Cc: kernel-team@lge.com
Cc: kirill@shutemov.name
Cc: npiggin@gmail.com
Cc: walken@google.com
Cc: willy@infradead.org
Link: http://lkml.kernel.org/r/1502089981-21272-4-git-send-email-byungchul.park@lge.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Two boots + a make defconfig, the first didn't have the redundant bit
in, the second did:
lock-classes: 1168 1169 [max: 8191]
direct dependencies: 7688 5812 [max: 32768]
indirect dependencies: 25492 25937
all direct dependencies: 220113 217512
dependency chains: 9005 9008 [max: 65536]
dependency chain hlocks: 34450 34366 [max: 327680]
in-hardirq chains: 55 51
in-softirq chains: 371 378
in-process chains: 8579 8579
stack-trace entries: 108073 88474 [max: 524288]
combined max dependencies: 178738560 169094640
max locking depth: 15 15
max bfs queue depth: 320 329
cyclic checks: 9123 9190
redundant checks: 5046
redundant links: 1828
find-mask forwards checks: 2564 2599
find-mask backwards checks: 39521 39789
So it saves nearly 2k links and a fair chunk of stack-trace entries, but
as expected, makes no real difference on the indirect dependencies.
At the same time, you see the max BFS depth increase, which is also
expected, although it could easily be boot variance -- these numbers are
not entirely stable between boots.
The down side is that the cycles in the graph become larger and thus
the reports harder to read.
XXX: do we want this as a CONFIG variable, implied by LOCKDEP_SMALL?
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Nikolay Borisov <nborisov@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: boqun.feng@gmail.com
Cc: iamjoonsoo.kim@lge.com
Cc: kernel-team@lge.com
Cc: kirill@shutemov.name
Cc: npiggin@gmail.com
Cc: walken@google.com
Link: http://lkml.kernel.org/r/20170303091338.GH6536@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A while ago someone, and I cannot find the email just now, asked if we
could not implement the RECLAIM_FS inversion stuff with a 'fake' lock
like we use for other things like workqueues etc. I think this should
be possible which allows reducing the 'irq' states and will reduce the
amount of __bfs() lookups we do.
Removing the 1 IRQ state results in 4 less __bfs() walks per
dependency, improving lockdep performance. And by moving this
annotation out of the lockdep code it becomes easier for the mm people
to extend.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Nikolay Borisov <nborisov@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: boqun.feng@gmail.com
Cc: iamjoonsoo.kim@lge.com
Cc: kernel-team@lge.com
Cc: kirill@shutemov.name
Cc: npiggin@gmail.com
Cc: walken@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The PROVE_RCU_REPEATEDLY Kconfig option was initially added due to
the volume of messages from PROVE_RCU: Doing just one per boot would
have required excessive numbers of boots to locate them all. However,
PROVE_RCU messages are now relatively rare, so there is no longer any
reason to need more than one such message per boot. This commit therefore
removes the PROVE_RCU_REPEATEDLY Kconfig option.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Commit a5dd63efda ("lockdep: Use "WARNING" tag on lockdep splats")
substituted pr_warn() for printk() in places called out by Dmitry Vyukov.
However, this resulted in an ugly mix of pr_warn() and printk(). This
commit therefore changes printk() to pr_warn() or pr_cont(), depending
on the absence or presence of KERN_CONT. This is done in all functions
that had printk() changed to pr_warn() by the aforementioned commit.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Pull RCU updates from Ingo Molnar:
"The main changes are:
- Debloat RCU headers
- Parallelize SRCU callback handling (plus overlapping patches)
- Improve the performance of Tree SRCU on a CPU-hotplug stress test
- Documentation updates
- Miscellaneous fixes"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits)
rcu: Open-code the rcu_cblist_n_lazy_cbs() function
rcu: Open-code the rcu_cblist_n_cbs() function
rcu: Open-code the rcu_cblist_empty() function
rcu: Separately compile large rcu_segcblist functions
srcu: Debloat the <linux/rcu_segcblist.h> header
srcu: Adjust default auto-expediting holdoff
srcu: Specify auto-expedite holdoff time
srcu: Expedite first synchronize_srcu() when idle
srcu: Expedited grace periods with reduced memory contention
srcu: Make rcutorture writer stalls print SRCU GP state
srcu: Exact tracking of srcu_data structures containing callbacks
srcu: Make SRCU be built by default
srcu: Fix Kconfig botch when SRCU not selected
rcu: Make non-preemptive schedule be Tasks RCU quiescent state
srcu: Expedite srcu_schedule_cbs_snp() callback invocation
srcu: Parallelize callback handling
kvm: Move srcu_struct fields to end of struct kvm
rcu: Fix typo in PER_RCU_NODE_PERIOD header comment
rcu: Use true/false in assignment to bool
rcu: Use bool value directly
...
Merge misc updates from Andrew Morton:
- a few misc things
- most of MM
- KASAN updates
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits)
kasan: separate report parts by empty lines
kasan: improve double-free report format
kasan: print page description after stacks
kasan: improve slab object description
kasan: change report header
kasan: simplify address description logic
kasan: change allocation and freeing stack traces headers
kasan: unify report headers
kasan: introduce helper functions for determining bug type
mm: hwpoison: call shake_page() after try_to_unmap() for mlocked page
mm: hwpoison: call shake_page() unconditionally
mm/swapfile.c: fix swap space leak in error path of swap_free_entries()
mm/gup.c: fix access_ok() argument type
mm/truncate: avoid pointless cleancache_invalidate_inode() calls.
mm/truncate: bail out early from invalidate_inode_pages2_range() if mapping is empty
fs/block_dev: always invalidate cleancache in invalidate_bdev()
fs: fix data invalidation in the cleancache during direct IO
zram: reduce load operation in page_same_filled
zram: use zram_free_page instead of open-coded
zram: introduce zram data accessor
...
GFP_NOFS context is used for the following 5 reasons currently:
- to prevent from deadlocks when the lock held by the allocation
context would be needed during the memory reclaim
- to prevent from stack overflows during the reclaim because the
allocation is performed from a deep context already
- to prevent lockups when the allocation context depends on other
reclaimers to make a forward progress indirectly
- just in case because this would be safe from the fs POV
- silence lockdep false positives
Unfortunately overuse of this allocation context brings some problems to
the MM. Memory reclaim is much weaker (especially during heavy FS
metadata workloads), OOM killer cannot be invoked because the MM layer
doesn't have enough information about how much memory is freeable by the
FS layer.
In many cases it is far from clear why the weaker context is even used
and so it might be used unnecessarily. We would like to get rid of
those as much as possible. One way to do that is to use the flag in
scopes rather than isolated cases. Such a scope is declared when really
necessary, tracked per task and all the allocation requests from within
the context will simply inherit the GFP_NOFS semantic.
Not only this is easier to understand and maintain because there are
much less problematic contexts than specific allocation requests, this
also helps code paths where FS layer interacts with other layers (e.g.
crypto, security modules, MM etc...) and there is no easy way to convey
the allocation context between the layers.
Introduce memalloc_nofs_{save,restore} API to control the scope of
GFP_NOFS allocation context. This is basically copying
memalloc_noio_{save,restore} API we have for other restricted allocation
context GFP_NOIO. The PF_MEMALLOC_NOFS flag already exists and it is
just an alias for PF_FSTRANS which has been xfs specific until recently.
There are no more PF_FSTRANS users anymore so let's just drop it.
PF_MEMALLOC_NOFS is now checked in the MM layer and drops __GFP_FS
implicitly same as PF_MEMALLOC_NOIO drops __GFP_IO. memalloc_noio_flags
is renamed to current_gfp_context because it now cares about both
PF_MEMALLOC_NOFS and PF_MEMALLOC_NOIO contexts. Xfs code paths preserve
their semantic. kmem_flags_convert() doesn't need to evaluate the flag
anymore.
This patch shouldn't introduce any functional changes.
Let's hope that filesystems will drop direct GFP_NOFS (resp. ~__GFP_FS)
usage as much as possible and only use a properly documented
memalloc_nofs_{save,restore} checkpoints where they are appropriate.
[akpm@linux-foundation.org: fix comment typo, reflow comment]
Link: http://lkml.kernel.org/r/20170306131408.9828-5-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Chris Mason <clm@fb.com>
Cc: David Sterba <dsterba@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Brian Foster <bfoster@redhat.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Nikolay Borisov <nborisov@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current implementation of the reclaim lockup detection can lead to
false positives and those even happen and usually lead to tweak the code
to silence the lockdep by using GFP_NOFS even though the context can use
__GFP_FS just fine.
See
http://lkml.kernel.org/r/20160512080321.GA18496@dastard
as an example.
=================================
[ INFO: inconsistent lock state ]
4.5.0-rc2+ #4 Tainted: G O
---------------------------------
inconsistent {RECLAIM_FS-ON-R} -> {IN-RECLAIM_FS-W} usage.
kswapd0/543 [HC0[0]:SC0[0]:HE1:SE1] takes:
(&xfs_nondir_ilock_class){++++-+}, at: xfs_ilock+0x177/0x200 [xfs]
{RECLAIM_FS-ON-R} state was registered at:
mark_held_locks+0x79/0xa0
lockdep_trace_alloc+0xb3/0x100
kmem_cache_alloc+0x33/0x230
kmem_zone_alloc+0x81/0x120 [xfs]
xfs_refcountbt_init_cursor+0x3e/0xa0 [xfs]
__xfs_refcount_find_shared+0x75/0x580 [xfs]
xfs_refcount_find_shared+0x84/0xb0 [xfs]
xfs_getbmap+0x608/0x8c0 [xfs]
xfs_vn_fiemap+0xab/0xc0 [xfs]
do_vfs_ioctl+0x498/0x670
SyS_ioctl+0x79/0x90
entry_SYSCALL_64_fastpath+0x12/0x6f
CPU0
----
lock(&xfs_nondir_ilock_class);
<Interrupt>
lock(&xfs_nondir_ilock_class);
*** DEADLOCK ***
3 locks held by kswapd0/543:
stack backtrace:
CPU: 0 PID: 543 Comm: kswapd0 Tainted: G O 4.5.0-rc2+ #4
Call Trace:
lock_acquire+0xd8/0x1e0
down_write_nested+0x5e/0xc0
xfs_ilock+0x177/0x200 [xfs]
xfs_reflink_cancel_cow_range+0x150/0x300 [xfs]
xfs_fs_evict_inode+0xdc/0x1e0 [xfs]
evict+0xc5/0x190
dispose_list+0x39/0x60
prune_icache_sb+0x4b/0x60
super_cache_scan+0x14f/0x1a0
shrink_slab.part.63.constprop.79+0x1e9/0x4e0
shrink_zone+0x15e/0x170
kswapd+0x4f1/0xa80
kthread+0xf2/0x110
ret_from_fork+0x3f/0x70
To quote Dave:
"Ignoring whether reflink should be doing anything or not, that's a
"xfs_refcountbt_init_cursor() gets called both outside and inside
transactions" lockdep false positive case. The problem here is lockdep
has seen this allocation from within a transaction, hence a GFP_NOFS
allocation, and now it's seeing it in a GFP_KERNEL context. Also note
that we have an active reference to this inode.
So, because the reclaim annotations overload the interrupt level
detections and it's seen the inode ilock been taken in reclaim
("interrupt") context, this triggers a reclaim context warning where
it thinks it is unsafe to do this allocation in GFP_KERNEL context
holding the inode ilock..."
This sounds like a fundamental problem of the reclaim lock detection.
It is really impossible to annotate such a special usecase IMHO unless
the reclaim lockup detection is reworked completely. Until then it is
much better to provide a way to add "I know what I am doing flag" and
mark problematic places. This would prevent from abusing GFP_NOFS flag
which has a runtime effect even on configurations which have lockdep
disabled.
Introduce __GFP_NOLOCKDEP flag which tells the lockdep gfp tracking to
skip the current allocation request.
While we are at it also make sure that the radix tree doesn't
accidentaly override tags stored in the upper part of the gfp_mask.
Link: http://lkml.kernel.org/r/20170306131408.9828-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Chris Mason <clm@fb.com>
Cc: David Sterba <dsterba@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Brian Foster <bfoster@redhat.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "scope GFP_NOFS api", v5.
This patch (of 7):
Commit 21caf2fc19 ("mm: teach mm by current context info to not do I/O
during memory allocation") added the memalloc_noio_(save|restore)
functions to enable people to modify the MM behavior by disabling I/O
during memory allocation.
This was further extended in commit 934f3072c1 ("mm: clear __GFP_FS
when PF_MEMALLOC_NOIO is set").
memalloc_noio_* functions prevent allocation paths recursing back into
the filesystem without explicitly changing the flags for every
allocation site.
However, lockdep hasn't been keeping up with the changes and it entirely
misses handling the memalloc_noio adjustments. Instead, it is left to
the callers of __lockdep_trace_alloc to call the function after they
have shaven the respective GFP flags which can lead to false positives:
=================================
[ INFO: inconsistent lock state ]
4.10.0-nbor #134 Not tainted
---------------------------------
inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
fsstress/3365 [HC0[0]:SC0[0]:HE1:SE1] takes:
(&xfs_nondir_ilock_class){++++?.}, at: xfs_ilock+0x141/0x230
{IN-RECLAIM_FS-W} state was registered at:
__lock_acquire+0x62a/0x17c0
lock_acquire+0xc5/0x220
down_write_nested+0x4f/0x90
xfs_ilock+0x141/0x230
xfs_reclaim_inode+0x12a/0x320
xfs_reclaim_inodes_ag+0x2c8/0x4e0
xfs_reclaim_inodes_nr+0x33/0x40
xfs_fs_free_cached_objects+0x19/0x20
super_cache_scan+0x191/0x1a0
shrink_slab+0x26f/0x5f0
shrink_node+0xf9/0x2f0
kswapd+0x356/0x920
kthread+0x10c/0x140
ret_from_fork+0x31/0x40
irq event stamp: 173777
hardirqs last enabled at (173777): __local_bh_enable_ip+0x70/0xc0
hardirqs last disabled at (173775): __local_bh_enable_ip+0x37/0xc0
softirqs last enabled at (173776): _xfs_buf_find+0x67a/0xb70
softirqs last disabled at (173774): _xfs_buf_find+0x5db/0xb70
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&xfs_nondir_ilock_class);
<Interrupt>
lock(&xfs_nondir_ilock_class);
*** DEADLOCK ***
4 locks held by fsstress/3365:
#0: (sb_writers#10){++++++}, at: mnt_want_write+0x24/0x50
#1: (&sb->s_type->i_mutex_key#12){++++++}, at: vfs_setxattr+0x6f/0xb0
#2: (sb_internal#2){++++++}, at: xfs_trans_alloc+0xfc/0x140
#3: (&xfs_nondir_ilock_class){++++?.}, at: xfs_ilock+0x141/0x230
stack backtrace:
CPU: 0 PID: 3365 Comm: fsstress Not tainted 4.10.0-nbor #134
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
Call Trace:
kmem_cache_alloc_node_trace+0x3a/0x2c0
vm_map_ram+0x2a1/0x510
_xfs_buf_map_pages+0x77/0x140
xfs_buf_get_map+0x185/0x2a0
xfs_attr_rmtval_set+0x233/0x430
xfs_attr_leaf_addname+0x2d2/0x500
xfs_attr_set+0x214/0x420
xfs_xattr_set+0x59/0xb0
__vfs_setxattr+0x76/0xa0
__vfs_setxattr_noperm+0x5e/0xf0
vfs_setxattr+0xae/0xb0
setxattr+0x15e/0x1a0
path_setxattr+0x8f/0xc0
SyS_lsetxattr+0x11/0x20
entry_SYSCALL_64_fastpath+0x23/0xc6
Let's fix this by making lockdep explicitly do the shaving of respective
GFP flags.
Fixes: 934f3072c1 ("mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set")
Link: http://lkml.kernel.org/r/20170306131408.9828-2-mhocko@kernel.org
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Chris Mason <clm@fb.com>
Cc: David Sterba <dsterba@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Brian Foster <bfoster@redhat.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZCTzvAAoJEAx081l5xIa+9kcQAJsQiija4/7QGx6IzakOMqjx
WulJ3zYG/cU/HLwCBcuWRDF6wAj+7iWNeLCPmolHwEazcI8tQVdgMlWtbdMbDh8U
ckzD3FBXsEVfIfab+u6tyoUkm3l/VDhMXbjkUK7NTo/+dkRqe5LuFfZPCGN09jft
Y+5salkRXzDhXPSFsqmjfzhx1v7PTgf0a5HUenKWEWOv+sJQaW4/iPvcDSIcg5qR
l9WjAqro1NpFYhUodnh6DkLeledL1U5whdtp/yvrUAck8y+WP/jwGYmQ7pZ0UkQm
f0M3kV6K67ox9eqN++jsGX5o8sB1qF01Uh95kBAnyzYzsw4ZlMCx6pV7PDX+J88M
UBNMEqX10hrLkNJA9lGjPWx+/6fudcwg9anKvTRO3Uyx7MbYoJAgjzAM+yBqqtV0
8Otxa4Bw0V2pmUD+0lqJDERRvE77VCXkLb8SaI5lQo0MHpQqT2cZA+GD+B+rZHO6
Ie5LDFY87vM2GG1IECufG+xOa3v6sn2FfQ1ouu1KNGKOAMBKcQCQyQx3kGVuNW2i
HDACVXALJgXdRlVLm4jydOCZdRoguX7AWmRjtdwxgaO+lBcGfLhkXdjLQ7Ho+29p
32ArJfkZPfA53vMB6lHxAfbtrs1q2RzyVnPHj/KqeJnGZbABKTsF2HQ5BQc4Xq/J
mqXoz6Oubdvk4Pwyx7Ne
=UxFF
-----END PGP SIGNATURE-----
Merge tag 'drm-for-v4.12' of git://people.freedesktop.org/~airlied/linux
Pull drm u pdates from Dave Airlie:
"This is the main drm pull request for v4.12. Apart from two fixes
pulls, everything should have been in drm-next for at least 2 weeks.
The biggest thing in here is AMD released the public headers for their
upcoming VEGA GPUs. These as always are quite a sizeable chunk of
header files. They've also added initial non-display support for those
GPUs, though they aren't available in production yet.
Otherwise it's pretty much normal.
New bridge drivers:
- megachips-stdpxxxx-ge-b850v3-fw LVDS->DP++
- generic LVDS bridge support.
Core:
- Displayport link train failure reporting to userspace
- debugfs interface cleaned up
- subsystem TODO in kerneldoc now
- Extended fbdev support (flipping and vblank wait)
- drm_platform removed
- EDP CRC support in helper
- HF-VSDB SCDC support in EDID parser
- Lots of code cleanups and header extraction
- Thunderbolt external GPU awareness
- Atomic helper improvements
- Documentation improvements
panel:
- Sitronix and Samsung new panel support
amdgpu:
- Preliminary vega10 support
- Multi-level page table support
- GPU sensor support for userspace
- PRT support for sparse buffers
- SR-IOV improvements
- Non-contig VRAM CPU mapping
i915:
- Atomic modesetting enabled by default on Gen5+
- LSPCON improvements
- Atomic state handling for cdclk
- GPU reset improvements
- In-kernel unit tests
- Geminilake improvements and color manager support
- Designware i2c fixes
- vblank evasion improvements
- Hotplug safe connector iterators
- GVT scheduler QoS support
- GVT Kabylake support
nouveau:
- Acceleration support for Pascal (GP10x).
- Rearchitecture of code handling proprietary signed firmware
- Fix GTX 970 with odd MMU configuration
- GP10B support
- GP107 acceleration support
vmwgfx:
- Atomic modesetting support for vmwgfx
omapdrm:
- Support for render nodes
- Refactor omapdss code
- Fix some probe ordering issues
- Fix too dark RGB565 rendering
sunxi:
- prelim rework for multiple pipes.
mali-dp:
- Color management support
- Plane scaling
- Power management improvements
imx-drm:
- Prefetch Resolve Engine/Gasket on i.MX6QP
- Deferred plane disabling
- Separate alpha support
mediatek:
- Mediatek SoC MT2701 support
rcar-du:
- Gen3 HDMI support
msm:
- 4k support for newer chips
- OPP bindings for gpu
- prep work for per-process pagetables
vc4:
- HDMI audio support
- fixes
qxl:
- minor fixes.
dw-hdmi:
- PHY improvements
- CSC fixes
- Amlogic GX SoC support"
* tag 'drm-for-v4.12' of git://people.freedesktop.org/~airlied/linux: (1778 commits)
drm/nouveau/fb/gf100-: Fix 32 bit wraparound in new ram detection
drm/nouveau/secboot/gm20b: fix the error return code in gm20b_secboot_tegra_read_wpr()
drm/nouveau/kms: Increase max retries in scanout position queries.
drm/nouveau/bios/bitP: check that table is long enough for optional pointers
drm/nouveau/fifo/nv40: no ctxsw for pre-nv44 mpeg engine
drm: mali-dp: use div_u64 for expensive 64-bit divisions
drm/i915: Confirm the request is still active before adding it to the await
drm/i915: Avoid busy-spinning on VLV_GLTC_PW_STATUS mmio
drm/i915/selftests: Allocate inode/file dynamically
drm/i915: Fix system hang with EI UP masked on Haswell
drm/i915: checking for NULL instead of IS_ERR() in mock selftests
drm/i915: Perform link quality check unconditionally during long pulse
drm/i915: Fix use after free in lpe_audio_platdev_destroy()
drm/i915: Use the right mapping_gfp_mask for final shmem allocation
drm/i915: Make legacy cursor updates more unsynced
drm/i915: Apply a cond_resched() to the saturated signaler
drm/i915: Park the signaler before sleeping
drm: mali-dp: Check the mclk rate and allow up/down scaling
drm: mali-dp: Enable image enhancement when scaling
drm: mali-dp: Add plane upscaling support
...
This commit changes lockdep splats to begin lines with "WARNING" and
to use pr_warn() instead of printk(). This change eases scripted
analysis of kernel console output.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJYzznuAAoJEHm+PkMAQRiGAzMIAJDBo5otTMMLhg8eKj8Cnab4
2NyaoWDN6mtU427rzEKEfZlTtp3gIBVdFex5x442weIdw6BgRQW0dvF/uwEn08yI
9Wx7VJmIUyH9M8VmhDtkUTFrhwUGr29qb3JhENMd7tv/CiJaehGRHCT3xqo5BDdu
xiyPcwSkwP/NH24TS91G87gV6r0I0oKLSAxu+KifEFESrb8gaZaduslzpEj3m/Ds
o9EPpfzaiGAdW5EdNfPtviYbBk7ZOXwtxdMV+zlvsLcaqtYnFEsJZd2WyZL0zGML
VXBVxaYtlyTeA7Mt8YYUL+rDHELSOtCeN5zLfxUvYt+Yc0Y6LFBLDOE5h8b3eCw=
=uKUo
-----END PGP SIGNATURE-----
BackMerge tag 'v4.11-rc3' into drm-next
Linux 4.11-rc3 as requested by Daniel
If a PER_CPU struct which contains a spin_lock is statically initialized
via:
DEFINE_PER_CPU(struct foo, bla) = {
.lock = __SPIN_LOCK_UNLOCKED(bla.lock)
};
then lockdep assigns a seperate key to each lock because the logic for
assigning a key to statically initialized locks is to use the address as
the key. With per CPU locks the address is obvioulsy different on each CPU.
That's wrong, because all locks should have the same key.
To solve this the following modifications are required:
1) Extend the is_kernel/module_percpu_addr() functions to hand back the
canonical address of the per CPU address, i.e. the per CPU address
minus the per CPU offset.
2) Check the lock address with these functions and if the per CPU check
matches use the returned canonical address as the lock key, so all per
CPU locks have the same key.
3) Move the static_obj(key) check into look_up_lock_class() so this check
can be avoided for statically initialized per CPU locks. That's
required because the canonical address fails the static_obj(key) check
for obvious reasons.
Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ Merged Dan's fixups for !MODULES and !SMP into this patch. ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Murphy <dmurphy@ti.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170227143736.pectaimkjkan5kow@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>