2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

629 Commits

Author SHA1 Message Date
Arnd Bergmann
4bf5d56d42 x86/pti: Mark constant arrays as __initconst
I'm seeing build failures from the two newly introduced arrays that
are marked 'const' and '__initdata', which are mutually exclusive:

arch/x86/kernel/cpu/common.c:882:43: error: 'cpu_no_speculation' causes a section type conflict with 'e820_table_firmware_init'
arch/x86/kernel/cpu/common.c:895:43: error: 'cpu_no_meltdown' causes a section type conflict with 'e820_table_firmware_init'

The correct annotation is __initconst.

Fixes: fec9434a12 ("x86/pti: Do not enable PTI on CPUs which are not vulnerable to Meltdown")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lkml.kernel.org/r/20180202213959.611210-1-arnd@arndb.de
2018-02-02 23:13:56 +01:00
David Woodhouse
7fcae1118f x86/cpuid: Fix up "virtual" IBRS/IBPB/STIBP feature bits on Intel
Despite the fact that all the other code there seems to be doing it, just
using set_cpu_cap() in early_intel_init() doesn't actually work.

For CPUs with PKU support, setup_pku() calls get_cpu_cap() after
c->c_init() has set those feature bits. That resets those bits back to what
was queried from the hardware.

Turning the bits off for bad microcode is easy to fix. That can just use
setup_clear_cpu_cap() to force them off for all CPUs.

I was less keen on forcing the feature bits *on* that way, just in case
of inconsistencies. I appreciate that the kernel is going to get this
utterly wrong if CPU features are not consistent, because it has already
applied alternatives by the time secondary CPUs are brought up.

But at least if setup_force_cpu_cap() isn't being used, we might have a
chance of *detecting* the lack of the corresponding bit and either
panicking or refusing to bring the offending CPU online.

So ensure that the appropriate feature bits are set within get_cpu_cap()
regardless of how many extra times it's called.

Fixes: 2961298e ("x86/cpufeatures: Clean up Spectre v2 related CPUID flags")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: karahmed@amazon.de
Cc: peterz@infradead.org
Cc: bp@alien8.de
Link: https://lkml.kernel.org/r/1517322623-15261-1-git-send-email-dwmw@amazon.co.uk
2018-01-30 22:35:05 +01:00
Ingo Molnar
7e86548e2c Linux 4.15
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJabj6pAAoJEHm+PkMAQRiGs8cIAJQFkCWnbz86e3vG4DuWhyA8
 CMGHCQdUOxxFGa/ixhIiuetbC0x+JVHAjV2FwVYbAQfaZB3pfw2iR1ncQxpAP1AI
 oLU9vBEqTmwKMPc9CM5rRfnLFWpGcGwUNzgPdxD5yYqGDtcM8K840mF6NdkYe5AN
 xU8rv1wlcFPF4A5pvHCH0pvVmK4VxlVFk/2H67TFdxBs4PyJOnSBnf+bcGWgsKO6
 hC8XIVtcKCH2GfFxt5d0Vgc5QXJEpX1zn2mtCa1MwYRjN2plgYfD84ha0xE7J0B0
 oqV/wnjKXDsmrgVpncr3txd4+zKJFNkdNRE4eLAIupHo2XHTG4HvDJ5dBY2NhGU=
 =sOml
 -----END PGP SIGNATURE-----

Merge tag 'v4.15' into x86/pti, to be able to merge dependent changes

Time has come to switch PTI development over to a v4.15 base - we'll still
try to make sure that all PTI fixes backport cleanly to v4.14 and earlier.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-30 15:08:27 +01:00
David Woodhouse
fec9434a12 x86/pti: Do not enable PTI on CPUs which are not vulnerable to Meltdown
Also, for CPUs which don't speculate at all, don't report that they're
vulnerable to the Spectre variants either.

Leave the cpu_no_meltdown[] match table with just X86_VENDOR_AMD in it
for now, even though that could be done with a simple comparison, on the
assumption that we'll have more to add.

Based on suggestions from Dave Hansen and Alan Cox.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: ak@linux.intel.com
Cc: ashok.raj@intel.com
Cc: karahmed@amazon.de
Cc: arjan@linux.intel.com
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Cc: bp@alien8.de
Cc: pbonzini@redhat.com
Cc: tim.c.chen@linux.intel.com
Cc: gregkh@linux-foundation.org
Link: https://lkml.kernel.org/r/1516896855-7642-6-git-send-email-dwmw@amazon.co.uk
2018-01-26 15:53:18 +01:00
David Woodhouse
95ca0ee863 x86/cpufeatures: Add CPUID_7_EDX CPUID leaf
This is a pure feature bits leaf. There are two AVX512 feature bits in it
already which were handled as scattered bits, and three more from this leaf
are going to be added for speculation control features.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: ak@linux.intel.com
Cc: ashok.raj@intel.com
Cc: dave.hansen@intel.com
Cc: karahmed@amazon.de
Cc: arjan@linux.intel.com
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Cc: bp@alien8.de
Cc: pbonzini@redhat.com
Cc: tim.c.chen@linux.intel.com
Cc: gregkh@linux-foundation.org
Link: https://lkml.kernel.org/r/1516896855-7642-2-git-send-email-dwmw@amazon.co.uk
2018-01-26 15:53:16 +01:00
Linus Torvalds
40548c6b6c Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 pti updates from Thomas Gleixner:
 "This contains:

   - a PTI bugfix to avoid setting reserved CR3 bits when PCID is
     disabled. This seems to cause issues on a virtual machine at least
     and is incorrect according to the AMD manual.

   - a PTI bugfix which disables the perf BTS facility if PTI is
     enabled. The BTS AUX buffer is not globally visible and causes the
     CPU to fault when the mapping disappears on switching CR3 to user
     space. A full fix which restores BTS on PTI is non trivial and will
     be worked on.

   - PTI bugfixes for EFI and trusted boot which make sure that the user
     space visible page table entries have the NX bit cleared

   - removal of dead code in the PTI pagetable setup functions

   - add PTI documentation

   - add a selftest for vsyscall to verify that the kernel actually
     implements what it advertises.

   - a sysfs interface to expose vulnerability and mitigation
     information so there is a coherent way for users to retrieve the
     status.

   - the initial spectre_v2 mitigations, aka retpoline:

      + The necessary ASM thunk and compiler support

      + The ASM variants of retpoline and the conversion of affected ASM
        code

      + Make LFENCE serializing on AMD so it can be used as speculation
        trap

      + The RSB fill after vmexit

   - initial objtool support for retpoline

  As I said in the status mail this is the most of the set of patches
  which should go into 4.15 except two straight forward patches still on
  hold:

   - the retpoline add on of LFENCE which waits for ACKs

   - the RSB fill after context switch

  Both should be ready to go early next week and with that we'll have
  covered the major holes of spectre_v2 and go back to normality"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
  x86,perf: Disable intel_bts when PTI
  security/Kconfig: Correct the Documentation reference for PTI
  x86/pti: Fix !PCID and sanitize defines
  selftests/x86: Add test_vsyscall
  x86/retpoline: Fill return stack buffer on vmexit
  x86/retpoline/irq32: Convert assembler indirect jumps
  x86/retpoline/checksum32: Convert assembler indirect jumps
  x86/retpoline/xen: Convert Xen hypercall indirect jumps
  x86/retpoline/hyperv: Convert assembler indirect jumps
  x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
  x86/retpoline/entry: Convert entry assembler indirect jumps
  x86/retpoline/crypto: Convert crypto assembler indirect jumps
  x86/spectre: Add boot time option to select Spectre v2 mitigation
  x86/retpoline: Add initial retpoline support
  objtool: Allow alternatives to be ignored
  objtool: Detect jumps to retpoline thunks
  x86/pti: Make unpoison of pgd for trusted boot work for real
  x86/alternatives: Fix optimize_nops() checking
  sysfs/cpu: Fix typos in vulnerability documentation
  x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC
  ...
2018-01-14 09:51:25 -08:00
David Woodhouse
da28512156 x86/spectre: Add boot time option to select Spectre v2 mitigation
Add a spectre_v2= option to select the mitigation used for the indirect
branch speculation vulnerability.

Currently, the only option available is retpoline, in its various forms.
This will be expanded to cover the new IBRS/IBPB microcode features.

The RETPOLINE_AMD feature relies on a serializing LFENCE for speculation
control. For AMD hardware, only set RETPOLINE_AMD if LFENCE is a
serializing instruction, which is indicated by the LFENCE_RDTSC feature.

[ tglx: Folded back the LFENCE/AMD fixes and reworked it so IBRS
  	integration becomes simple ]

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-5-git-send-email-dwmw@amazon.co.uk
2018-01-12 00:14:29 +01:00
David Woodhouse
76b043848f x86/retpoline: Add initial retpoline support
Enable the use of -mindirect-branch=thunk-extern in newer GCC, and provide
the corresponding thunks. Provide assembler macros for invoking the thunks
in the same way that GCC does, from native and inline assembler.

This adds X86_FEATURE_RETPOLINE and sets it by default on all CPUs. In
some circumstances, IBRS microcode features may be used instead, and the
retpoline can be disabled.

On AMD CPUs if lfence is serialising, the retpoline can be dramatically
simplified to a simple "lfence; jmp *\reg". A future patch, after it has
been verified that lfence really is serialising in all circumstances, can
enable this by setting the X86_FEATURE_RETPOLINE_AMD feature bit in addition
to X86_FEATURE_RETPOLINE.

Do not align the retpoline in the altinstr section, because there is no
guarantee that it stays aligned when it's copied over the oldinstr during
alternative patching.

[ Andi Kleen: Rename the macros, add CONFIG_RETPOLINE option, export thunks]
[ tglx: Put actual function CALL/JMP in front of the macros, convert to
  	symbolic labels ]
[ dwmw2: Convert back to numeric labels, merge objtool fixes ]

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-4-git-send-email-dwmw@amazon.co.uk
2018-01-12 00:14:28 +01:00
David Woodhouse
99c6fa2511 x86/cpufeatures: Add X86_BUG_SPECTRE_V[12]
Add the bug bits for spectre v1/2 and force them unconditionally for all
cpus.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1515239374-23361-2-git-send-email-dwmw@amazon.co.uk
2018-01-06 21:57:19 +01:00
Linus Torvalds
abb7099dbc Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull  more x86 pti fixes from Thomas Gleixner:
 "Another small stash of fixes for fallout from the PTI work:

   - Fix the modules vs. KASAN breakage which was caused by making
     MODULES_END depend of the fixmap size. That was done when the cpu
     entry area moved into the fixmap, but now that we have a separate
     map space for that this is causing more issues than it solves.

   - Use the proper cache flush methods for the debugstore buffers as
     they are mapped/unmapped during runtime and not statically mapped
     at boot time like the rest of the cpu entry area.

   - Make the map layout of the cpu_entry_area consistent for 4 and 5
     level paging and fix the KASLR vaddr_end wreckage.

   - Use PER_CPU_EXPORT for per cpu variable and while at it unbreak
     nvidia gfx drivers by dropping the GPL export. The subject line of
     the commit tells it the other way around, but I noticed that too
     late.

   - Fix the ASM alternative macros so they can be used in the middle of
     an inline asm block.

   - Rename the BUG_CPU_INSECURE flag to BUG_CPU_MELTDOWN so the attack
     vector is properly identified. The Spectre mitigations will come
     with their own bug bits later"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/pti: Rename BUG_CPU_INSECURE to BUG_CPU_MELTDOWN
  x86/alternatives: Add missing '\n' at end of ALTERNATIVE inline asm
  x86/tlb: Drop the _GPL from the cpu_tlbstate export
  x86/events/intel/ds: Use the proper cache flush method for mapping ds buffers
  x86/kaslr: Fix the vaddr_end mess
  x86/mm: Map cpu_entry_area at the same place on 4/5 level
  x86/mm: Set MODULES_END to 0xffffffffff000000
2018-01-05 12:23:57 -08:00
Thomas Gleixner
de791821c2 x86/pti: Rename BUG_CPU_INSECURE to BUG_CPU_MELTDOWN
Use the name associated with the particular attack which needs page table
isolation for mitigation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Cc: Jiri Koshina <jikos@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Lutomirski  <luto@amacapital.net>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Turner <pjt@google.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Greg KH <gregkh@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@google.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801051525300.1724@nanos
2018-01-05 15:34:43 +01:00
Linus Torvalds
00a5ae218d Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 page table isolation fixes from Thomas Gleixner:
 "A couple of urgent fixes for PTI:

   - Fix a PTE mismatch between user and kernel visible mapping of the
     cpu entry area (differs vs. the GLB bit) and causes a TLB mismatch
     MCE on older AMD K8 machines

   - Fix the misplaced CR3 switch in the SYSCALL compat entry code which
     causes access to unmapped kernel memory resulting in double faults.

   - Fix the section mismatch of the cpu_tss_rw percpu storage caused by
     using a different mechanism for declaration and definition.

   - Two fixes for dumpstack which help to decode entry stack issues
     better

   - Enable PTI by default in Kconfig. We should have done that earlier,
     but it slipped through the cracks.

   - Exclude AMD from the PTI enforcement. Not necessarily a fix, but if
     AMD is so confident that they are not affected, then we should not
     burden users with the overhead"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/process: Define cpu_tss_rw in same section as declaration
  x86/pti: Switch to kernel CR3 at early in entry_SYSCALL_compat()
  x86/dumpstack: Print registers for first stack frame
  x86/dumpstack: Fix partial register dumps
  x86/pti: Make sure the user/kernel PTEs match
  x86/cpu, x86/pti: Do not enable PTI on AMD processors
  x86/pti: Enable PTI by default
2018-01-03 16:41:07 -08:00
Tom Lendacky
694d99d409 x86/cpu, x86/pti: Do not enable PTI on AMD processors
AMD processors are not subject to the types of attacks that the kernel
page table isolation feature protects against.  The AMD microarchitecture
does not allow memory references, including speculative references, that
access higher privileged data when running in a lesser privileged mode
when that access would result in a page fault.

Disable page table isolation by default on AMD processors by not setting
the X86_BUG_CPU_INSECURE feature, which controls whether X86_FEATURE_PTI
is set.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20171227054354.20369.94587.stgit@tlendack-t1.amdoffice.net
2018-01-03 15:57:59 +01:00
Linus Torvalds
5aa90a8458 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 page table isolation updates from Thomas Gleixner:
 "This is the final set of enabling page table isolation on x86:

   - Infrastructure patches for handling the extra page tables.

   - Patches which map the various bits and pieces which are required to
     get in and out of user space into the user space visible page
     tables.

   - The required changes to have CR3 switching in the entry/exit code.

   - Optimizations for the CR3 switching along with documentation how
     the ASID/PCID mechanism works.

   - Updates to dump pagetables to cover the user space page tables for
     W+X scans and extra debugfs files to analyze both the kernel and
     the user space visible page tables

  The whole functionality is compile time controlled via a config switch
  and can be turned on/off on the command line as well"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
  x86/ldt: Make the LDT mapping RO
  x86/mm/dump_pagetables: Allow dumping current pagetables
  x86/mm/dump_pagetables: Check user space page table for WX pages
  x86/mm/dump_pagetables: Add page table directory to the debugfs VFS hierarchy
  x86/mm/pti: Add Kconfig
  x86/dumpstack: Indicate in Oops whether PTI is configured and enabled
  x86/mm: Clarify the whole ASID/kernel PCID/user PCID naming
  x86/mm: Use INVPCID for __native_flush_tlb_single()
  x86/mm: Optimize RESTORE_CR3
  x86/mm: Use/Fix PCID to optimize user/kernel switches
  x86/mm: Abstract switching CR3
  x86/mm: Allow flushing for future ASID switches
  x86/pti: Map the vsyscall page if needed
  x86/pti: Put the LDT in its own PGD if PTI is on
  x86/mm/64: Make a full PGD-entry size hole in the memory map
  x86/events/intel/ds: Map debug buffers in cpu_entry_area
  x86/cpu_entry_area: Add debugstore entries to cpu_entry_area
  x86/mm/pti: Map ESPFIX into user space
  x86/mm/pti: Share entry text PMD
  x86/entry: Align entry text section to PMD boundary
  ...
2017-12-29 17:02:49 -08:00
Thomas Gleixner
8d4b067895 x86/mm/pti: Force entry through trampoline when PTI active
Force the entry through the trampoline only when PTI is active. Otherwise
go through the normal entry code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-23 21:13:00 +01:00
Thomas Gleixner
a89f040fa3 x86/cpufeatures: Add X86_BUG_CPU_INSECURE
Many x86 CPUs leak information to user space due to missing isolation of
user space and kernel space page tables. There are many well documented
ways to exploit that.

The upcoming software migitation of isolating the user and kernel space
page tables needs a misfeature flag so code can be made runtime
conditional.

Add the BUG bits which indicates that the CPU is affected and add a feature
bit which indicates that the software migitation is enabled.

Assume for now that _ALL_ x86 CPUs are affected by this. Exceptions can be
made later.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-23 21:12:59 +01:00
Linus Torvalds
caf9a82657 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 PTI preparatory patches from Thomas Gleixner:
 "Todays Advent calendar window contains twentyfour easy to digest
  patches. The original plan was to have twenty three matching the date,
  but a late fixup made that moot.

   - Move the cpu_entry_area mapping out of the fixmap into a separate
     address space. That's necessary because the fixmap becomes too big
     with NRCPUS=8192 and this caused already subtle and hard to
     diagnose failures.

     The top most patch is fresh from today and cures a brain slip of
     that tall grumpy german greybeard, who ignored the intricacies of
     32bit wraparounds.

   - Limit the number of CPUs on 32bit to 64. That's insane big already,
     but at least it's small enough to prevent address space issues with
     the cpu_entry_area map, which have been observed and debugged with
     the fixmap code

   - A few TLB flush fixes in various places plus documentation which of
     the TLB functions should be used for what.

   - Rename the SYSENTER stack to CPU_ENTRY_AREA stack as it is used for
     more than sysenter now and keeping the name makes backtraces
     confusing.

   - Prevent LDT inheritance on exec() by moving it to arch_dup_mmap(),
     which is only invoked on fork().

   - Make vysycall more robust.

   - A few fixes and cleanups of the debug_pagetables code. Check
     PAGE_PRESENT instead of checking the PTE for 0 and a cleanup of the
     C89 initialization of the address hint array which already was out
     of sync with the index enums.

   - Move the ESPFIX init to a different place to prepare for PTI.

   - Several code moves with no functional change to make PTI
     integration simpler and header files less convoluted.

   - Documentation fixes and clarifications"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  x86/cpu_entry_area: Prevent wraparound in setup_cpu_entry_area_ptes() on 32bit
  init: Invoke init_espfix_bsp() from mm_init()
  x86/cpu_entry_area: Move it out of the fixmap
  x86/cpu_entry_area: Move it to a separate unit
  x86/mm: Create asm/invpcid.h
  x86/mm: Put MMU to hardware ASID translation in one place
  x86/mm: Remove hard-coded ASID limit checks
  x86/mm: Move the CR3 construction functions to tlbflush.h
  x86/mm: Add comments to clarify which TLB-flush functions are supposed to flush what
  x86/mm: Remove superfluous barriers
  x86/mm: Use __flush_tlb_one() for kernel memory
  x86/microcode: Dont abuse the TLB-flush interface
  x86/uv: Use the right TLB-flush API
  x86/entry: Rename SYSENTER_stack to CPU_ENTRY_AREA_entry_stack
  x86/doc: Remove obvious weirdnesses from the x86 MM layout documentation
  x86/mm/64: Improve the memory map documentation
  x86/ldt: Prevent LDT inheritance on exec
  x86/ldt: Rework locking
  arch, mm: Allow arch_dup_mmap() to fail
  x86/vsyscall/64: Warn and fail vsyscall emulation in NATIVE mode
  ...
2017-12-23 11:53:04 -08:00
Thomas Gleixner
ed1bbc40a0 x86/cpu_entry_area: Move it to a separate unit
Separate the cpu_entry_area code out of cpu/common.c and the fixmap.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-22 20:13:04 +01:00
Dave Hansen
4fe2d8b11a x86/entry: Rename SYSENTER_stack to CPU_ENTRY_AREA_entry_stack
If the kernel oopses while on the trampoline stack, it will print
"<SYSENTER>" even if SYSENTER is not involved.  That is rather confusing.

The "SYSENTER" stack is used for a lot more than SYSENTER now.  Give it a
better string to display in stack dumps, and rename the kernel code to
match.

Also move the 32-bit code over to the new naming even though it still uses
the entry stack only for SYSENTER.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-22 20:13:02 +01:00
Linus Torvalds
64a48099b3 Merge branch 'WIP.x86-pti.entry-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 syscall entry code changes for PTI from Ingo Molnar:
 "The main changes here are Andy Lutomirski's changes to switch the
  x86-64 entry code to use the 'per CPU entry trampoline stack'. This,
  besides helping fix KASLR leaks (the pending Page Table Isolation
  (PTI) work), also robustifies the x86 entry code"

* 'WIP.x86-pti.entry-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
  x86/cpufeatures: Make CPU bugs sticky
  x86/paravirt: Provide a way to check for hypervisors
  x86/paravirt: Dont patch flush_tlb_single
  x86/entry/64: Make cpu_entry_area.tss read-only
  x86/entry: Clean up the SYSENTER_stack code
  x86/entry/64: Remove the SYSENTER stack canary
  x86/entry/64: Move the IST stacks into struct cpu_entry_area
  x86/entry/64: Create a per-CPU SYSCALL entry trampoline
  x86/entry/64: Return to userspace from the trampoline stack
  x86/entry/64: Use a per-CPU trampoline stack for IDT entries
  x86/espfix/64: Stop assuming that pt_regs is on the entry stack
  x86/entry/64: Separate cpu_current_top_of_stack from TSS.sp0
  x86/entry: Remap the TSS into the CPU entry area
  x86/entry: Move SYSENTER_stack to the beginning of struct tss_struct
  x86/dumpstack: Handle stack overflow on all stacks
  x86/entry: Fix assumptions that the HW TSS is at the beginning of cpu_tss
  x86/kasan/64: Teach KASAN about the cpu_entry_area
  x86/mm/fixmap: Generalize the GDT fixmap mechanism, introduce struct cpu_entry_area
  x86/entry/gdt: Put per-CPU GDT remaps in ascending order
  x86/dumpstack: Add get_stack_info() support for the SYSENTER stack
  ...
2017-12-18 08:59:15 -08:00
Thomas Gleixner
6cbd2171e8 x86/cpufeatures: Make CPU bugs sticky
There is currently no way to force CPU bug bits like CPU feature bits. That
makes it impossible to set a bug bit once at boot and have it stick for all
upcoming CPUs.

Extend the force set/clear arrays to handle bug bits as well.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Link: https://lkml.kernel.org/r/20171204150606.992156574@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 14:27:53 +01:00
Andy Lutomirski
c482feefe1 x86/entry/64: Make cpu_entry_area.tss read-only
The TSS is a fairly juicy target for exploits, and, now that the TSS
is in the cpu_entry_area, it's no longer protected by kASLR.  Make it
read-only on x86_64.

On x86_32, it can't be RO because it's written by the CPU during task
switches, and we use a task gate for double faults.  I'd also be
nervous about errata if we tried to make it RO even on configurations
without double fault handling.

[ tglx: AMD confirmed that there is no problem on 64-bit with TSS RO.  So
  	it's probably safe to assume that it's a non issue, though Intel
  	might have been creative in that area. Still waiting for
  	confirmation. ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bpetkov@suse.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Link: https://lkml.kernel.org/r/20171204150606.733700132@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 14:27:52 +01:00
Andy Lutomirski
0f9a48100f x86/entry: Clean up the SYSENTER_stack code
The existing code was a mess, mainly because C arrays are nasty.  Turn
SYSENTER_stack into a struct, add a helper to find it, and do all the
obvious cleanups this enables.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bpetkov@suse.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Link: https://lkml.kernel.org/r/20171204150606.653244723@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 14:27:51 +01:00
Andy Lutomirski
40e7f949e0 x86/entry/64: Move the IST stacks into struct cpu_entry_area
The IST stacks are needed when an IST exception occurs and are accessed
before any kernel code at all runs.  Move them into struct cpu_entry_area.

The IST stacks are unlike the rest of cpu_entry_area: they're used even for
entries from kernel mode.  This means that they should be set up before we
load the final IDT.  Move cpu_entry_area setup to trap_init() for the boot
CPU and set it up for all possible CPUs at once in native_smp_prepare_cpus().

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Link: https://lkml.kernel.org/r/20171204150606.480598743@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 14:27:51 +01:00
Andy Lutomirski
3386bc8aed x86/entry/64: Create a per-CPU SYSCALL entry trampoline
Handling SYSCALL is tricky: the SYSCALL handler is entered with every
single register (except FLAGS), including RSP, live.  It somehow needs
to set RSP to point to a valid stack, which means it needs to save the
user RSP somewhere and find its own stack pointer.  The canonical way
to do this is with SWAPGS, which lets us access percpu data using the
%gs prefix.

With PAGE_TABLE_ISOLATION-like pagetable switching, this is
problematic.  Without a scratch register, switching CR3 is impossible, so
%gs-based percpu memory would need to be mapped in the user pagetables.
Doing that without information leaks is difficult or impossible.

Instead, use a different sneaky trick.  Map a copy of the first part
of the SYSCALL asm at a different address for each CPU.  Now RIP
varies depending on the CPU, so we can use RIP-relative memory access
to access percpu memory.  By putting the relevant information (one
scratch slot and the stack address) at a constant offset relative to
RIP, we can make SYSCALL work without relying on %gs.

A nice thing about this approach is that we can easily switch it on
and off if we want pagetable switching to be configurable.

The compat variant of SYSCALL doesn't have this problem in the first
place -- there are plenty of scratch registers, since we don't care
about preserving r8-r15.  This patch therefore doesn't touch SYSCALL32
at all.

This patch actually seems to be a small speedup.  With this patch,
SYSCALL touches an extra cache line and an extra virtual page, but
the pipeline no longer stalls waiting for SWAPGS.  It seems that, at
least in a tight loop, the latter outweights the former.

Thanks to David Laight for an optimization tip.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bpetkov@suse.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Link: https://lkml.kernel.org/r/20171204150606.403607157@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 14:27:50 +01:00
Andy Lutomirski
7f2590a110 x86/entry/64: Use a per-CPU trampoline stack for IDT entries
Historically, IDT entries from usermode have always gone directly
to the running task's kernel stack.  Rearrange it so that we enter on
a per-CPU trampoline stack and then manually switch to the task's stack.
This touches a couple of extra cachelines, but it gives us a chance
to run some code before we touch the kernel stack.

The asm isn't exactly beautiful, but I think that fully refactoring
it can wait.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Link: https://lkml.kernel.org/r/20171204150606.225330557@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 14:27:38 +01:00
Andy Lutomirski
72f5e08dbb x86/entry: Remap the TSS into the CPU entry area
This has a secondary purpose: it puts the entry stack into a region
with a well-controlled layout.  A subsequent patch will take
advantage of this to streamline the SYSCALL entry code to be able to
find it more easily.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bpetkov@suse.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Link: https://lkml.kernel.org/r/20171204150605.962042855@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 13:59:56 +01:00
Andy Lutomirski
1a935bc3d4 x86/entry: Move SYSENTER_stack to the beginning of struct tss_struct
SYSENTER_stack should have reliable overflow detection, which
means that it needs to be at the bottom of a page, not the top.
Move it to the beginning of struct tss_struct and page-align it.

Also add an assertion to make sure that the fixed hardware TSS
doesn't cross a page boundary.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Link: https://lkml.kernel.org/r/20171204150605.881827433@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 13:59:56 +01:00
Andy Lutomirski
7fb983b4dd x86/entry: Fix assumptions that the HW TSS is at the beginning of cpu_tss
A future patch will move SYSENTER_stack to the beginning of cpu_tss
to help detect overflow.  Before this can happen, fix several code
paths that hardcode assumptions about the old layout.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Link: https://lkml.kernel.org/r/20171204150605.722425540@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 13:59:55 +01:00
Andy Lutomirski
ef8813ab28 x86/mm/fixmap: Generalize the GDT fixmap mechanism, introduce struct cpu_entry_area
Currently, the GDT is an ad-hoc array of pages, one per CPU, in the
fixmap.  Generalize it to be an array of a new 'struct cpu_entry_area'
so that we can cleanly add new things to it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Link: https://lkml.kernel.org/r/20171204150605.563271721@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 13:59:54 +01:00
Andy Lutomirski
1a79797b58 x86/entry/64: Allocate and enable the SYSENTER stack
This will simplify future changes that want scratch variables early in
the SYSENTER handler -- they'll be able to spill registers to the
stack.  It also lets us get rid of a SWAPGS_UNSAFE_STACK user.

This does not depend on CONFIG_IA32_EMULATION=y because we'll want the
stack space even without IA32 emulation.

As far as I can tell, the reason that this wasn't done from day 1 is
that we use IST for #DB and #BP, which is IMO rather nasty and causes
a lot more problems than it solves.  But, since #DB uses IST, we don't
actually need a real stack for SYSENTER (because SYSENTER with TF set
will invoke #DB on the IST stack rather than the SYSENTER stack).

I want to remove IST usage from these vectors some day, and this patch
is a prerequisite for that as well.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Link: https://lkml.kernel.org/r/20171204150605.312726423@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 13:59:53 +01:00
Ricardo Neri
770c775577 x86/umip: Print a line in the boot log that UMIP has been enabled
Indicate that this feature has been enabled.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: ricardo.neri@intel.com
Link: http://lkml.kernel.org/r/1510640985-18412-3-git-send-email-ricardo.neri-calderon@linux.intel.com
[ Changelog tweaks. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-14 08:38:09 +01:00
Linus Torvalds
6a9f70b0a5 Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:
 "Three smaller changes:

   - clang fix

   - boot message beautification

   - unnecessary header inclusion removal"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Disable Clang warnings about GNU extensions
  x86/boot: Remove unnecessary #include <generated/utsrelease.h>
  x86/boot: Spell out "boot CPU" for BP
2017-11-13 16:32:30 -08:00
Ricardo Neri
aa35f89697 x86/umip: Enable User-Mode Instruction Prevention at runtime
User-Mode Instruction Prevention (UMIP) is enabled by setting/clearing a
bit in %cr4.

It makes sense to enable UMIP at some point while booting, before user
spaces come up. Like SMAP and SMEP, is not critical to have it enabled
very early during boot. This is because UMIP is relevant only when there is
a user space to be protected from. Given these similarities, UMIP can be
enabled along with SMAP and SMEP.

At the moment, UMIP is disabled by default at build time. It can be enabled
at build time by selecting CONFIG_X86_INTEL_UMIP. If enabled at build time,
it can be disabled at run time by adding clearcpuid=514 to the kernel
parameters.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: ricardo.neri@intel.com
Link: http://lkml.kernel.org/r/1509935277-22138-10-git-send-email-ricardo.neri-calderon@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-08 11:16:23 +01:00
Andy Lutomirski
20bb83443e x86/entry/64: Stop initializing TSS.sp0 at boot
In my quest to get rid of thread_struct::sp0, I want to clean up or
remove all of its readers.  Two of them are in cpu_init() (32-bit and
64-bit), and they aren't needed.  This is because we never enter
userspace at all on the threads that CPUs are initialized in.

Poison the initial TSS.sp0 and stop initializing it on CPU init.

The comment text mostly comes from Dave Hansen.  Thanks!

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/ee4a00540ad28c6cff475fbcc7769a4460acc861.1509609304.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-02 11:04:46 +01:00
Andy Lutomirski
da51da189a x86/entry/64: Pass SP0 directly to load_sp0()
load_sp0() had an odd signature:

  void load_sp0(struct tss_struct *tss, struct thread_struct *thread);

Simplify it to:

  void load_sp0(unsigned long sp0);

Also simplify a few get_cpu()/put_cpu() sequences to
preempt_disable()/preempt_enable().

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/2655d8b42ed940aa384fe18ee1129bbbcf730a08.1509609304.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-02 11:04:44 +01:00
Andi Kleen
0c2a3913d6 x86/fpu: Parse clearcpuid= as early XSAVE argument
With a followon patch we want to make clearcpuid affect the XSAVE
configuration. But xsave is currently initialized before arguments
are parsed. Move the clearcpuid= parsing into the special
early xsave argument parsing code.

Since clearcpuid= contains a = we need to keep the old __setup
around as a dummy, otherwise it would end up as a environment
variable in init's environment.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171013215645.23166-4-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-17 17:14:57 +02:00
Jean Delvare
a1652bb8a0 x86/boot: Spell out "boot CPU" for BP
It's not obvious to everybody that BP stands for boot processor. At
least it was not for me. And BP is also a CPU register on x86, so it
is ambiguous. Spell out "boot CPU" everywhere instead.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-03 18:41:23 +02:00
Andy Lutomirski
b8b7abaed7 x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier
Otherwise we might have the PCID feature bit set during cpu_init().

This is just for robustness.  I haven't seen any actual bugs here.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: cba4671af7 ("x86/mm: Disable PCID on 32-bit kernels")
Link: http://lkml.kernel.org/r/b16dae9d6b0db5d9801ddbebbfd83384097c61f3.1505663533.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-17 18:59:08 +02:00
Andy Lutomirski
c7ad5ad297 x86/mm/64: Initialize CR4.PCIDE early
cpu_init() is weird: it's called rather late (after early
identification and after most MMU state is initialized) on the boot
CPU but is called extremely early (before identification) on secondary
CPUs.  It's called just late enough on the boot CPU that its CR4 value
isn't propagated to mmu_cr4_features.

Even if we put CR4.PCIDE into mmu_cr4_features, we'd hit two
problems.  First, we'd crash in the trampoline code.  That's
fixable, and I tried that.  It turns out that mmu_cr4_features is
totally ignored by secondary_start_64(), though, so even with the
trampoline code fixed, it wouldn't help.

This means that we don't currently have CR4.PCIDE reliably initialized
before we start playing with cpu_tlbstate.  This is very fragile and
tends to cause boot failures if I make even small changes to the TLB
handling code.

Make it more robust: initialize CR4.PCIDE earlier on the boot CPU
and propagate it to secondary CPUs in start_secondary().

( Yes, this is ugly.  I think we should have improved mmu_cr4_features
  to actually control CR4 during secondary bootup, but that would be
  fairly intrusive at this stage. )

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reported-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Tested-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Fixes: 660da7c922 ("x86/mm: Enable CR4.PCIDE on supported systems")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-13 09:54:43 +02:00
Andy Lutomirski
1c9fe4409c x86/mm: Document how CR4.PCIDE restore works
While debugging a problem, I thought that using
cr4_set_bits_and_update_boot() to restore CR4.PCIDE would be
helpful.  It turns out to be counterproductive.

Add a comment documenting how this works.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 20:12:57 -07:00
Andy Lutomirski
72c0098d92 x86/mm: Reinitialize TLB state on hotplug and resume
When Linux brings a CPU down and back up, it switches to init_mm and then
loads swapper_pg_dir into CR3.  With PCID enabled, this has the side effect
of masking off the ASID bits in CR3.

This can result in some confusion in the TLB handling code.  If we
bring a CPU down and back up with any ASID other than 0, we end up
with the wrong ASID active on the CPU after resume.  This could
cause our internal state to become corrupt, although major
corruption is unlikely because init_mm doesn't have any user pages.
More obviously, if CONFIG_DEBUG_VM=y, we'll trip over an assertion
in the next context switch.  The result of *that* is a failure to
resume from suspend with probability 1 - 1/6^(cpus-1).

Fix it by reinitializing cpu_tlbstate on resume and CPU bringup.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Jiri Kosina <jikos@kernel.org>
Fixes: 10af6235e0 ("x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 20:12:57 -07:00
Linus Torvalds
24e700e291 Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 apic updates from Thomas Gleixner:
 "This update provides:

   - Cleanup of the IDT management including the removal of the extra
     tracing IDT. A first step to cleanup the vector management code.

   - The removal of the paravirt op adjust_exception_frame. This is a
     XEN specific issue, but merged through this branch to avoid nasty
     merge collisions

   - Prevent dmesg spam about the TSC DEADLINE bug, when the CPU has
     disabled the TSC DEADLINE timer in CPUID.

   - Adjust a debug message in the ioapic code to print out the
     information correctly"

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
  x86/idt: Fix the X86_TRAP_BP gate
  x86/xen: Get rid of paravirt op adjust_exception_frame
  x86/eisa: Add missing include
  x86/idt: Remove superfluous ALIGNment
  x86/apic: Silence "FW_BUG TSC_DEADLINE disabled due to Errata" on CPUs without the feature
  x86/idt: Remove the tracing IDT leftovers
  x86/idt: Hide set_intr_gate()
  x86/idt: Simplify alloc_intr_gate()
  x86/idt: Deinline setup functions
  x86/idt: Remove unused functions/inlines
  x86/idt: Move interrupt gate initialization to IDT code
  x86/idt: Move APIC gate initialization to tables
  x86/idt: Move regular trap init to tables
  x86/idt: Move IST stack based traps to table init
  x86/idt: Move debug stack init to table based
  x86/idt: Switch early trap init to IDT tables
  x86/idt: Prepare for table based init
  x86/idt: Move early IDT setup out of 32-bit asm
  x86/idt: Move early IDT handler setup to IDT code
  x86/idt: Consolidate IDT invalidation
  ...
2017-09-04 17:43:56 -07:00
Thomas Gleixner
d8ed9d4826 x86/idt: Create file for IDT related code
IDT related code lives scattered around in various places. Create a new
source file in arch/x86/kernel/idt.c to hold it.

Move the idt_tables and descriptors to it for a start. Follow up patches
will gradually move more code over.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064958.367081121@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 12:07:25 +02:00
Andy Lutomirski
660da7c922 x86/mm: Enable CR4.PCIDE on supported systems
We can use PCID if the CPU has PCID and PGE and we're not on Xen.

By itself, this has no effect. A followup patch will start using PCID.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Nadav Amit <nadav.amit@gmail.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/6327ecd907b32f79d5aa0d466f04503bbec5df88.1498751203.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-05 10:52:58 +02:00
Andy Lutomirski
0790c9aad8 x86/mm: Add the 'nopcid' boot option to turn off PCID
The parameter is only present on x86_64 systems to save a few bytes,
as PCID is always disabled on x86_32.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Nadav Amit <nadav.amit@gmail.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/8bbb2e65bcd249a5f18bfb8128b4689f08ac2b60.1498751203.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-05 10:52:57 +02:00
Juergen Gross
65f9d65443 x86/cpu: remove hypervisor specific set_cpu_features
There is no user of x86_hyper->set_cpu_features() any more. Remove it.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-02 11:14:30 +02:00
Andy Lutomirski
b23adb7d3f x86/xen/gdt: Use X86_FEATURE_XENPV instead of globals for the GDT fixup
Xen imposes special requirements on the GDT.  Rather than using a
global variable for the pgprot, just use an explicit special case
for Xen -- this makes it clearer what's going on.  It also debloats
64-bit kernels very slightly.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e9ea96abbfd6a8c87753849171bb5987ecfeb523.1490218061.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-23 08:25:08 +01:00
Andy Lutomirski
aa4ea67552 x86/gdt: Fix setup_fixmap_gdt() to use the correct PA
__pa() cannot be used on percpu pointers because they may be
virtually mapped.  Use per_cpu_ptr_to_phys() instead.

This fixes a boot crash on a some 32-bit configurations.  I assume
this is related to which allocation strategy is chosen by the percpu
core.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 69218e4799 x86: ("Remap GDT tables in the fixmap section")
Link: http://lkml.kernel.org/r/22e0069c29fba31998f193201e359eebfdac4960.1490218061.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-23 08:25:07 +01:00
Thomas Garnier
45fc8757d1 x86: Make the GDT remapping read-only on 64-bit
This patch makes the GDT remapped pages read-only, to prevent accidental
(or intentional) corruption of this key data structure.

This change is done only on 64-bit, because 32-bit needs it to be writable
for TSS switches.

The native_load_tr_desc function was adapted to correctly handle a
read-only GDT. The LTR instruction always writes to the GDT TSS entry.
This generates a page fault if the GDT is read-only. This change checks
if the current GDT is a remap and swap GDTs as needed. This function was
tested by booting multiple machines and checking hibernation works
properly.

KVM SVM and VMX were adapted to use the writeable GDT. On VMX, the
per-cpu variable was removed for functions to fetch the original GDT.
Instead of reloading the previous GDT, VMX will reload the fixmap GDT as
expected. For testing, VMs were started and restored on multiple
configurations.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Luis R . Rodriguez <mcgrof@kernel.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rafael J . Wysocki <rjw@rjwysocki.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: kasan-dev@googlegroups.com
Cc: kernel-hardening@lists.openwall.com
Cc: kvm@vger.kernel.org
Cc: lguest@lists.ozlabs.org
Cc: linux-doc@vger.kernel.org
Cc: linux-efi@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-pm@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Cc: zijun_hu <zijun_hu@htc.com>
Link: http://lkml.kernel.org/r/20170314170508.100882-3-thgarnie@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16 09:06:35 +01:00
Thomas Garnier
69218e4799 x86: Remap GDT tables in the fixmap section
Each processor holds a GDT in its per-cpu structure. The sgdt
instruction gives the base address of the current GDT. This address can
be used to bypass KASLR memory randomization. With another bug, an
attacker could target other per-cpu structures or deduce the base of
the main memory section (PAGE_OFFSET).

This patch relocates the GDT table for each processor inside the
fixmap section. The space is reserved based on number of supported
processors.

For consistency, the remapping is done by default on 32 and 64-bit.

Each processor switches to its remapped GDT at the end of
initialization. For hibernation, the main processor returns with the
original GDT and switches back to the remapping at completion.

This patch was tested on both architectures. Hibernation and KVM were
both tested specially for their usage of the GDT.

Thanks to Boris Ostrovsky <boris.ostrovsky@oracle.com> for testing and
recommending changes for Xen support.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Luis R . Rodriguez <mcgrof@kernel.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rafael J . Wysocki <rjw@rjwysocki.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: kasan-dev@googlegroups.com
Cc: kernel-hardening@lists.openwall.com
Cc: kvm@vger.kernel.org
Cc: lguest@lists.ozlabs.org
Cc: linux-doc@vger.kernel.org
Cc: linux-efi@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-pm@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Cc: zijun_hu <zijun_hu@htc.com>
Link: http://lkml.kernel.org/r/20170314170508.100882-2-thgarnie@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16 09:06:35 +01:00
Linus Torvalds
609b07b72d Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "A fix for KVM's scheduler clock which (erroneously) was always marked
  unstable, a fix for RT/DL load balancing, plus latency fixes"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/clock, x86/tsc: Rework the x86 'unstable' sched_clock() interface
  sched/core: Fix pick_next_task() for RT,DL
  sched/fair: Make select_idle_cpu() more aggressive
2017-03-07 14:42:34 -08:00
Ingo Molnar
68e21be291 sched/headers: Move task->mm handling methods to <linux/sched/mm.h>
Move the following task->mm helper APIs into a new header file,
<linux/sched/mm.h>, to further reduce the size and complexity
of <linux/sched.h>.

Here are how the APIs are used in various kernel files:

  # mm_alloc():
  arch/arm/mach-rpc/ecard.c
  fs/exec.c
  include/linux/sched/mm.h
  kernel/fork.c

  # __mmdrop():
  arch/arc/include/asm/mmu_context.h
  include/linux/sched/mm.h
  kernel/fork.c

  # mmdrop():
  arch/arm/mach-rpc/ecard.c
  arch/m68k/sun3/mmu_emu.c
  arch/x86/mm/tlb.c
  drivers/gpu/drm/amd/amdkfd/kfd_process.c
  drivers/gpu/drm/i915/i915_gem_userptr.c
  drivers/infiniband/hw/hfi1/file_ops.c
  drivers/vfio/vfio_iommu_spapr_tce.c
  fs/exec.c
  fs/proc/base.c
  fs/proc/task_mmu.c
  fs/proc/task_nommu.c
  fs/userfaultfd.c
  include/linux/mmu_notifier.h
  include/linux/sched/mm.h
  kernel/fork.c
  kernel/futex.c
  kernel/sched/core.c
  mm/khugepaged.c
  mm/ksm.c
  mm/mmu_context.c
  mm/mmu_notifier.c
  mm/oom_kill.c
  virt/kvm/kvm_main.c

  # mmdrop_async_fn():
  include/linux/sched/mm.h

  # mmdrop_async():
  include/linux/sched/mm.h
  kernel/fork.c

  # mmget_not_zero():
  fs/userfaultfd.c
  include/linux/sched/mm.h
  mm/oom_kill.c

  # mmput():
  arch/arc/include/asm/mmu_context.h
  arch/arc/kernel/troubleshoot.c
  arch/frv/mm/mmu-context.c
  arch/powerpc/platforms/cell/spufs/context.c
  arch/sparc/include/asm/mmu_context_32.h
  drivers/android/binder.c
  drivers/gpu/drm/etnaviv/etnaviv_gem.c
  drivers/gpu/drm/i915/i915_gem_userptr.c
  drivers/infiniband/core/umem.c
  drivers/infiniband/core/umem_odp.c
  drivers/infiniband/core/uverbs_main.c
  drivers/infiniband/hw/mlx4/main.c
  drivers/infiniband/hw/mlx5/main.c
  drivers/infiniband/hw/usnic/usnic_uiom.c
  drivers/iommu/amd_iommu_v2.c
  drivers/iommu/intel-svm.c
  drivers/lguest/lguest_user.c
  drivers/misc/cxl/fault.c
  drivers/misc/mic/scif/scif_rma.c
  drivers/oprofile/buffer_sync.c
  drivers/vfio/vfio_iommu_type1.c
  drivers/vhost/vhost.c
  drivers/xen/gntdev.c
  fs/exec.c
  fs/proc/array.c
  fs/proc/base.c
  fs/proc/task_mmu.c
  fs/proc/task_nommu.c
  fs/userfaultfd.c
  include/linux/sched/mm.h
  kernel/cpuset.c
  kernel/events/core.c
  kernel/events/uprobes.c
  kernel/exit.c
  kernel/fork.c
  kernel/ptrace.c
  kernel/sys.c
  kernel/trace/trace_output.c
  kernel/tsacct.c
  mm/memcontrol.c
  mm/memory.c
  mm/mempolicy.c
  mm/migrate.c
  mm/mmu_notifier.c
  mm/nommu.c
  mm/oom_kill.c
  mm/process_vm_access.c
  mm/rmap.c
  mm/swapfile.c
  mm/util.c
  virt/kvm/async_pf.c

  # mmput_async():
  include/linux/sched/mm.h
  kernel/fork.c
  mm/oom_kill.c

  # get_task_mm():
  arch/arc/kernel/troubleshoot.c
  arch/powerpc/platforms/cell/spufs/context.c
  drivers/android/binder.c
  drivers/gpu/drm/etnaviv/etnaviv_gem.c
  drivers/infiniband/core/umem.c
  drivers/infiniband/core/umem_odp.c
  drivers/infiniband/hw/mlx4/main.c
  drivers/infiniband/hw/mlx5/main.c
  drivers/infiniband/hw/usnic/usnic_uiom.c
  drivers/iommu/amd_iommu_v2.c
  drivers/iommu/intel-svm.c
  drivers/lguest/lguest_user.c
  drivers/misc/cxl/fault.c
  drivers/misc/mic/scif/scif_rma.c
  drivers/oprofile/buffer_sync.c
  drivers/vfio/vfio_iommu_type1.c
  drivers/vhost/vhost.c
  drivers/xen/gntdev.c
  fs/proc/array.c
  fs/proc/base.c
  fs/proc/task_mmu.c
  include/linux/sched/mm.h
  kernel/cpuset.c
  kernel/events/core.c
  kernel/exit.c
  kernel/fork.c
  kernel/ptrace.c
  kernel/sys.c
  kernel/trace/trace_output.c
  kernel/tsacct.c
  mm/memcontrol.c
  mm/memory.c
  mm/mempolicy.c
  mm/migrate.c
  mm/mmu_notifier.c
  mm/nommu.c
  mm/util.c

  # mm_access():
  fs/proc/base.c
  include/linux/sched/mm.h
  kernel/fork.c
  mm/process_vm_access.c

  # mm_release():
  arch/arc/include/asm/mmu_context.h
  fs/exec.c
  include/linux/sched/mm.h
  include/uapi/linux/sched.h
  kernel/exit.c
  kernel/fork.c

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-03 01:43:28 +01:00
Peter Zijlstra
f94c8d1169 sched/clock, x86/tsc: Rework the x86 'unstable' sched_clock() interface
Wanpeng Li reported that since the following commit:

  acb04058de ("sched/clock: Fix hotplug crash")

... KVM always runs with unstable sched-clock even though KVM's
kvm_clock _is_ stable.

The problem is that we've tied clear_sched_clock_stable() to the TSC
state, and overlooked that sched_clock() is a paravirt function.

Solve this by doing two things:

 - tie the sched_clock() stable state more clearly to the TSC stable
   state for the normal (!paravirt) case.

 - only call clear_sched_clock_stable() when we mark TSC unstable
   when we use native_sched_clock().

The first means we can actually run with stable sched_clock in more
situations then before, which is good. And since commit:

  12907fbb1a ("sched/clock, clocksource: Add optional cs::mark_unstable() method")

... this should be reliable. Since any detection of TSC fail now results
in marking the TSC unstable.

Reported-by: Wanpeng Li <kernellwp@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Fixes: acb04058de ("sched/clock: Fix hotplug crash")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:50:49 +01:00
Ingo Molnar
9164bb4a18 sched/headers: Prepare to move 'init_task' and 'init_thread_union' from <linux/sched.h> to <linux/sched/task.h>
Update all usage sites first.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:38 +01:00
Ingo Molnar
e601757102 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/clock.h>
We are going to split <linux/sched/clock.h> out of <linux/sched.h>, which
will have to be picked up from other headers and .c files.

Create a trivial placeholder <linux/sched/clock.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:27 +01:00
Vegard Nossum
f1f1007644 mm: add new mmgrab() helper
Apart from adding the helper function itself, the rest of the kernel is
converted mechanically using:

  git grep -l 'atomic_inc.*mm_count' | xargs sed -i 's/atomic_inc(&\(.*\)->mm_count);/mmgrab\(\1\);/'
  git grep -l 'atomic_inc.*mm_count' | xargs sed -i 's/atomic_inc(&\(.*\)\.mm_count);/mmgrab\(\&\1\);/'

This is needed for a later patch that hooks into the helper, but might
be a worthwhile cleanup on its own.

(Michal Hocko provided most of the kerneldoc comment.)

Link: http://lkml.kernel.org/r/20161218123229.22952-1-vegard.nossum@oracle.com
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:48 -08:00
Linus Torvalds
280d7a1ede Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu updates from Ingo Molnar:
 "The main changes relate to fixes between (lack of) CPUID and FPU
  detection that should only affect old or weird CPUs, by Andy
  Lutomirski"

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu: Fix the "Giving up, no FPU found" test
  x86/fpu: Fix CPUID-less FPU detection
  x86/fpu: Fix "x86/fpu: Legacy x87 FPU detected" message
  x86/cpu: Re-apply forced caps every time CPU caps are re-read
  x86/cpu: Factor out application of forced CPU caps
  x86/cpu: Add X86_FEATURE_CPUID
  x86/fpu/xstate: Move XSAVES state init to a function
2017-02-20 15:03:51 -08:00
Linus Torvalds
8a9365a472 Merge branch 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpufeature updates from Ingo Molnar:
 "The main changes in this cycle were related to enable ring-3
  MONITOR/MWAIT instructions support on supported CPUs, by Grzegorz
  Andrejczuk and Piotr Luc"

* 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpufeature: Move RING3MWAIT feature to avoid conflicts
  x86/cpufeature: Enable RING3MWAIT for Knights Mill
  x86/cpufeature: Enable RING3MWAIT for Knights Landing
  x86/cpufeature: Add RING3MWAIT to CPU features
  x86/elf: Add HWCAP2 to expose ring 3 MONITOR/MWAIT
  x86/msr: Add MSR_MISC_FEATURE_ENABLES and RING3MWAIT bit
  x86/cpufeature: Add AVX512_VPOPCNTDQ feature
2017-02-20 14:37:08 -08:00
Linus Torvalds
828cad8ea0 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes in this (fairly busy) cycle were:

   - There was a class of scheduler bugs related to forgetting to update
     the rq-clock timestamp which can cause weird and hard to debug
     problems, so there's a new debug facility for this: which uncovered
     a whole lot of bugs which convinced us that we want to keep the
     debug facility.

     (Peter Zijlstra, Matt Fleming)

   - Various cputime related updates: eliminate cputime and use u64
     nanoseconds directly, simplify and improve the arch interfaces,
     implement delayed accounting more widely, etc. - (Frederic
     Weisbecker)

   - Move code around for better structure plus cleanups (Ingo Molnar)

   - Move IO schedule accounting deeper into the scheduler plus related
     changes to improve the situation (Tejun Heo)

   - ... plus a round of sched/rt and sched/deadline fixes, plus other
     fixes, updats and cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (85 commits)
  sched/core: Remove unlikely() annotation from sched_move_task()
  sched/autogroup: Rename auto_group.[ch] to autogroup.[ch]
  sched/topology: Split out scheduler topology code from core.c into topology.c
  sched/core: Remove unnecessary #include headers
  sched/rq_clock: Consolidate the ordering of the rq_clock methods
  delayacct: Include <uapi/linux/taskstats.h>
  sched/core: Clean up comments
  sched/rt: Show the 'sched_rr_timeslice' SCHED_RR timeslice tuning knob in milliseconds
  sched/clock: Add dummy clear_sched_clock_stable() stub function
  sched/cputime: Remove generic asm headers
  sched/cputime: Remove unused nsec_to_cputime()
  s390, sched/cputime: Remove unused cputime definitions
  powerpc, sched/cputime: Remove unused cputime definitions
  s390, sched/cputime: Make arch_cpu_idle_time() to return nsecs
  ia64, sched/cputime: Remove unused cputime definitions
  ia64: Convert vtime to use nsec units directly
  ia64, sched/cputime: Move the nsecs based cputime headers to the last arch using it
  sched/cputime: Remove jiffies based cputime
  sched/cputime, vtime: Return nsecs instead of cputime_t to account
  sched/cputime: Complete nsec conversion of tick based accounting
  ...
2017-02-20 12:52:55 -08:00
Borislav Petkov
79a8b9aa38 x86/CPU/AMD: Bring back Compute Unit ID
Commit:

  a33d331761 ("x86/CPU/AMD: Fix Bulldozer topology")

restored the initial approach we had with the Fam15h topology of
enumerating CU (Compute Unit) threads as cores. And this is still
correct - they're beefier than HT threads but still have some
shared functionality.

Our current approach has a problem with the Mad Max Steam game, for
example. Yves Dionne reported a certain "choppiness" while playing on
v4.9.5.

That problem stems most likely from the fact that the CU threads share
resources within one CU and when we schedule to a thread of a different
compute unit, this incurs latency due to migrating the working set to a
different CU through the caches.

When the thread siblings mask mirrors that aspect of the CUs and
threads, the scheduler pays attention to it and tries to schedule within
one CU first. Which takes care of the latency, of course.

Reported-by: Yves Dionne <yves.dionne@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # 4.9
Cc: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yazen Ghannam <yazen.ghannam@amd.com>
Link: http://lkml.kernel.org/r/20170205105022.8705-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-05 12:18:45 +01:00
Grzegorz Andrejczuk
0274f9551e x86/elf: Add HWCAP2 to expose ring 3 MONITOR/MWAIT
Introduce ELF_HWCAP2 variable for x86 and reserve its bit 0 to expose the
ring 3 MONITOR/MWAIT.

HWCAP variables contain bitmasks which can be used by userspace
applications to detect which instruction sets are supported by CPU.  On x86
architecture information about CPU capabilities can be checked via CPUID
instructions, unfortunately presence of ring 3 MONITOR/MWAIT feature cannot
be checked this way. ELF_HWCAP cannot be used as well, because on x86 it is
set to CPUID[1].EDX which means that all bits are reserved there.

HWCAP2 approach was chosen because it reuses existing solution present
in other architectures, so only minor modifications are required to the
kernel and userspace applications. When ELF_HWCAP2 is defined
kernel maps it to AT_HWCAP2 during the start of the application.
This way the ring 3 MONITOR/MWAIT feature can be detected using getauxval()
API in a simple and fast manner. ELF_HWCAP2 type is u32 to be consistent
with x86 ELF_HWCAP type.

Signed-off-by: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Cc: Piotr.Luc@intel.com
Cc: dave.hansen@linux.intel.com
Link: http://lkml.kernel.org/r/1484918557-15481-3-git-send-email-grzegorz.andrejczuk@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-04 08:51:09 +01:00
Ingo Molnar
ed5c8c854f Merge branch 'linus' into sched/core, to pick up fixes and refresh the branch
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-01 09:12:25 +01:00
Andy Lutomirski
60d3450167 x86/cpu: Re-apply forced caps every time CPU caps are re-read
Calling get_cpu_cap() will reset a bunch of CPU features.  This will
cause the system to lose track of force-set and force-cleared
features in the words that are reset until the end of CPU
initialization.  This can cause X86_FEATURE_FPU, for example, to
change back and forth during boot and potentially confuse CPU setup.

To minimize the chance of confusion, re-apply forced caps every time
get_cpu_cap() is called.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Whitehead <tedheadster@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/c817eb373d2c67c2c81413a70fc9b845fa34a37e.1484705016.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-25 10:12:41 +01:00
Andy Lutomirski
8bf1ebca21 x86/cpu: Factor out application of forced CPU caps
There are multiple call sites that apply forced CPU caps.  Factor
them into a helper.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Whitehead <tedheadster@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/623ff7555488122143e4417de09b18be2085ad06.1484705016.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-25 10:12:40 +01:00
Borislav Petkov
78d1b29684 x86/cpu: Add X86_FEATURE_CPUID
Add a synthetic CPUID flag denoting whether the CPU sports the CPUID
instruction or not. This will come useful later when accomodating
CPUID-less CPUs.

Signed-off-by: Borislav Petkov <bp@suse.de>
[ Slightly prettified. ]
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Whitehead <tedheadster@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/dcb355adae3ab812c79397056a61c212f1a0c7cc.1484705016.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-25 10:12:39 +01:00
Peter Zijlstra
acb04058de sched/clock: Fix hotplug crash
Mike reported that he could trigger the WARN_ON_ONCE() in
set_sched_clock_stable() using hotplug.

This exposed a fundamental problem with the interface, we should never
mark the TSC stable if we ever find it to be unstable. Therefore
set_sched_clock_stable() is a broken interface.

The reason it existed is that not having it is a pain, it means all
relevant architecture code needs to call clear_sched_clock_stable()
where appropriate.

Of the three architectures that select HAVE_UNSTABLE_SCHED_CLOCK ia64
and parisc are trivial in that they never called
set_sched_clock_stable(), so add an unconditional call to
clear_sched_clock_stable() to them.

For x86 the story is a lot more involved, and what this patch tries to
do is ensure we preserve the status quo. So even is Cyrix or Transmeta
have usable TSC they never called set_sched_clock_stable() so they now
get an explicit mark unstable.

Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 9881b024b7 ("sched/clock: Delay switching sched_clock to stable")
Link: http://lkml.kernel.org/r/20170119133633.GB6536@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-20 02:38:46 +01:00
Lukasz Odzioba
dd853fd216 x86/cpu: Fix bootup crashes by sanitizing the argument of the 'clearcpuid=' command-line option
A negative number can be specified in the cmdline which will be used as
setup_clear_cpu_cap() argument. With that we can clear/set some bit in
memory predceeding boot_cpu_data/cpu_caps_cleared which may cause kernel
to misbehave. This patch adds lower bound check to setup_disablecpuid().

Boris Petkov reproduced a crash:

  [    1.234575] BUG: unable to handle kernel paging request at ffffffff858bd540
  [    1.236535] IP: memcpy_erms+0x6/0x10

Signed-off-by: Lukasz Odzioba <lukasz.odzioba@intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: andi.kleen@intel.com
Cc: bp@alien8.de
Cc: dave.hansen@linux.intel.com
Cc: luto@kernel.org
Cc: slaoub@gmail.com
Fixes: ac72e7888a ("x86: add generic clearcpuid=... option")
Link: http://lkml.kernel.org/r/1482933340-11857-1-git-send-email-lukasz.odzioba@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-05 08:54:34 +01:00
Andy Lutomirski
3df8d92085 x86/cpu: Probe CPUID leaf 6 even when cpuid_level == 6
A typo (or mis-merge?) resulted in leaf 6 only being probed if
cpuid_level >= 7.

Fixes: 2ccd71f1b2 ("x86/cpufeature: Move some of the scattered feature bits to x86_capability")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Link: http://lkml.kernel.org/r/6ea30c0e9daec21e488b54761881a6dfcf3e04d0.1481825597.git.luto@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-19 11:50:24 +01:00
Thomas Gleixner
9d85eb9119 x86/smpboot: Make logical package management more robust
The logical package management has several issues:

 - The APIC ids provided by ACPI are not required to be the same as the
   initial APIC id which can be retrieved by CPUID. The APIC ids provided
   by ACPI are those which are written by the BIOS into the APIC. The
   initial id is set by hardware and can not be changed. The hardware
   provided ids contain the real hardware package information.

   Especially AMD sets the effective APIC id different from the hardware id
   as they need to reserve space for the IOAPIC ids starting at id 0.

   As a consequence those machines trigger the currently active firmware
   bug printouts in dmesg, These are obviously wrong.

 - Virtual machines have their own interesting of enumerating APICs and
   packages which are not reliably covered by the current implementation.

The sizing of the mapping array has been tweaked to be generously large to
handle systems which provide a wrong core count when HT is disabled so the
whole magic which checks for space in the physical hotplug case is not
needed anymore.

Simplify the whole machinery and do the mapping when the CPU starts and the
CPUID derived physical package information is available. This solves the
observed problems on AMD machines and works for the virtualization issues
as well.

Remove the extra call from XEN cpu bringup code as it is not longer
required.

Fixes: d49597fd3b ("x86/cpu: Deal with broken firmware (VMWare/XEN)")
Reported-and-tested-by: Borislav Petkov <bp@suse.de>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: M. Vefa Bicakci <m.v.b@runbox.com>
Cc: xen-devel <xen-devel@lists.xen.org>
Cc: Charles (Chas) Williams <ciwillia@brocade.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Alok Kataria <akataria@vmware.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1612121102260.3429@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-13 10:22:39 +01:00
Linus Torvalds
991bc36254 Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode update from Ingo Molnar:
 "The biggest change (by Borislav Petkov) is a thorough rewrite of the
  Intel microcode loader and its interactions with the core code.

  The biggest conceptual change is the decoupling of the microcode
  loading on boot and application processors (which load the microcode
  in different scenarios), so that both parse the input patches with as
  few assumptions as possible - this also fixes various kernel address
  space randomization bugs. (The AP side then goes on and caches the
  result to improve boot performance.)

  Since the AMD side already did this, this change also opened up the
  path towards more unification/simplification of the core microcode
  loading infrastructure:

     10 files changed, 647 insertions(+), 940 deletions(-)

  which speaks for itself"

* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode: Bump driver version, update copyrights
  x86/microcode: Rework microcode loading
  x86/microcode/intel: Remove intel_lib.c
  x86/microcode/amd: Move private inlines to .c and mark local functions static
  x86/microcode: Collect CPU info on resume
  x86/microcode: Issue the debug printk on resume only on success
  x86/microcode/amd: Hand down the CPU family
  x86/microcode: Export the microcode cache linked list
  x86/microcode: Remove one #ifdef clause
  x86/microcode/intel: Simplify generic_load_microcode()
  x86/microcode: Move driver authors to CREDITS
  x86/microcode: Run the AP-loading routine only on the application processors
2016-12-12 15:23:02 -08:00
Linus Torvalds
212f30008a Merge branch 'x86-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 idle updates from Ingo Molnar:
 "There were two bigger changes in this development cycle:

   - remove idle notifiers:

       32 files changed, 74 insertions(+), 803 deletions(-)

     These notifiers were of questionable value and the main usecase,
     the i7300 driver, was essentially unmaintained and can be removed,
     plus modern power management concepts don't need the callback - so
     use this golden opportunity and get rid of this opaque and fragile
     callback from a latency sensitive code path.

     (Len Brown, Thomas Gleixner)

   - improve the AMD Erratum 400 workaround that used high overhead MSR
     polling in the idle loop (Borisla Petkov, Thomas Gleixner)"

* 'x86-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Remove empty idle.h header
  x86/amd: Simplify AMD E400 aware idle routine
  x86/amd: Check for the C1E bug post ACPI subsystem init
  x86/bugs: Separate AMD E400 erratum and C1E bug
  x86/cpufeature: Provide helper to set bugs bits
  x86/idle: Remove enter_idle(), exit_idle()
  x86: Remove x86_test_and_clear_bit_percpu()
  x86/idle: Remove is_idle flag
  x86/idle: Remove idle_notifier
  i7300_idle: Remove this driver
2016-12-12 14:55:04 -08:00
Linus Torvalds
535b2f73f6 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 CPU updates from Ingo Molnar:
 "The changes in this development cycle were:

   - AMD CPU topology enhancements that are cleanups on current CPUs but
     which enable future Fam17 hardware. (Yazen Ghannam)

   - unify bugs.c and bugs_64.c (Borislav Petkov)

   - remove the show_msr= boot option (Borislav Petkov)

   - simplify a boot message (Borislav Petkov)"

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu/AMD: Clean up cpu_llc_id assignment per topology feature
  x86/cpu: Get rid of the show_msr= boot option
  x86/cpu: Merge bugs.c and bugs_64.c
  x86/cpu: Remove the printk format specifier in "CPU0: "
2016-12-12 14:25:21 -08:00
Borislav Petkov
07c94a3812 x86/amd: Simplify AMD E400 aware idle routine
Reorganize the E400 detection now that we have everything in place:
switch the CPUs to broadcast mode after the LAPIC has been initialized
and remove the facilities that were used previously on the idle path.

Unfortunately static_cpu_has_bug() cannpt be used in the E400 idle routine
because alternatives have been applied when the actual detection happens,
so the static switching does not take effect and the test will stay
false. Use boot_cpu_has_bug() instead which is definitely an improvement
over the RDMSR and the cpumask handling.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20161209182912.2726-5-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-09 21:23:21 +01:00
Thomas Gleixner
d49597fd3b x86/cpu: Deal with broken firmware (VMWare/XEN)
Both ACPI and MP specifications require that the APIC id in the respective
tables must be the same as the APIC id in CPUID.

The kernel retrieves the physical package id from the APIC id during the
ACPI/MP table scan and builds the physical to logical package map. The
physical package id which is used after a CPU comes up is retrieved from
CPUID. So we rely on ACPI/MP tables and CPUID agreeing in that respect.

There exist VMware and XEN implementations which violate the spec. As a
result the physical to logical package map, which relies on the ACPI/MP
tables does not work on those systems, because the CPUID initialized
physical package id does not match the firmware id. This causes system
crashes and malfunction due to invalid package mappings.

The only way to cure this is to sanitize the physical package id after the
CPUID enumeration and yell when the APIC ids are different. Fix up the
initial APIC id, which is fine as it is only used printout purposes.

If the physical package IDs differ yell and use the package information
from the ACPI/MP tables so the existing logical package map just works.

Chas provided the resulting dmesg output for his affected 4 virtual
sockets, 1 core per socket VM:

[Firmware Bug]: CPU1: APIC id mismatch. Firmware: 1 CPUID: 2
[Firmware Bug]: CPU1: Using firmware package id 1 instead of 2
....

Reported-and-tested-by: "Charles (Chas) Williams" <ciwillia@brocade.com>,
Reported-by: M. Vefa Bicakci <m.v.b@runbox.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: #4.6+ <stable@vger,kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1611091613540.3501@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-09 21:05:01 +01:00
Borislav Petkov
777284b66f x86/microcode: Run the AP-loading routine only on the application processors
cpu_init() is run also on the BSP (in addition to the APs):

 x86_64_start_kernel
 |-> x86_64_start_reservations
 |-> start_kernel
 |-> trap_init
 |-> cpu_init
 |-> load_ucode_ap
 ...

but we run the AP (Application Processors) microcode loading routine
there too even though we have a BSP-specific routine for that:
load_ucode_bsp().

Which is unnecessary. So let's limit the AP microcode loading routine to
the APs only.

Remove a useless comment while at it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161025095522.11964-2-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-25 12:28:57 +02:00
Borislav Petkov
59c6f278bd x86/cpu: Get rid of the show_msr= boot option
It is useless as it dumps the MSRs too early BUT(!) we do set MSRs later too.
Also, it dumps only BSP MSRs as it gets called only for CPU 0.

And the MSR range array would need constant updating anyway, and so on
and so on...

Oh, and we have msr.ko and msr-tools which are the much better solution
anyway. So off it goes...

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161024173844.23038-4-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-25 11:48:50 +02:00
Linus Torvalds
1a4a2bc460 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull low-level x86 updates from Ingo Molnar:
 "In this cycle this topic tree has become one of those 'super topics'
  that accumulated a lot of changes:

   - Add CONFIG_VMAP_STACK=y support to the core kernel and enable it on
     x86 - preceded by an array of changes. v4.8 saw preparatory changes
     in this area already - this is the rest of the work. Includes the
     thread stack caching performance optimization. (Andy Lutomirski)

   - switch_to() cleanups and all around enhancements. (Brian Gerst)

   - A large number of dumpstack infrastructure enhancements and an
     unwinder abstraction. The secret long term plan is safe(r) live
     patching plus maybe another attempt at debuginfo based unwinding -
     but all these current bits are standalone enhancements in a frame
     pointer based debug environment as well. (Josh Poimboeuf)

   - More __ro_after_init and const annotations. (Kees Cook)

   - Enable KASLR for the vmemmap memory region. (Thomas Garnier)"

[ The virtually mapped stack changes are pretty fundamental, and not
  x86-specific per se, even if they are only used on x86 right now. ]

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
  x86/asm: Get rid of __read_cr4_safe()
  thread_info: Use unsigned long for flags
  x86/alternatives: Add stack frame dependency to alternative_call_2()
  x86/dumpstack: Fix show_stack() task pointer regression
  x86/dumpstack: Remove dump_trace() and related callbacks
  x86/dumpstack: Convert show_trace_log_lvl() to use the new unwinder
  oprofile/x86: Convert x86_backtrace() to use the new unwinder
  x86/stacktrace: Convert save_stack_trace_*() to use the new unwinder
  perf/x86: Convert perf_callchain_kernel() to use the new unwinder
  x86/unwind: Add new unwind interface and implementations
  x86/dumpstack: Remove NULL task pointer convention
  fork: Optimize task creation by caching two thread stacks per CPU if CONFIG_VMAP_STACK=y
  sched/core: Free the stack early if CONFIG_THREAD_INFO_IN_TASK
  lib/syscall: Pin the task stack in collect_syscall()
  x86/process: Pin the target stack in get_wchan()
  x86/dumpstack: Pin the target stack when dumping it
  kthread: Pin the stack via try_get_task_stack()/put_task_stack() in to_live_kthread() function
  sched/core: Add try_get_task_stack() and put_task_stack()
  x86/entry/64: Fix a minor comment rebase error
  iommu/amd: Don't put completion-wait semaphore on stack
  ...
2016-10-03 16:13:28 -07:00
Andy Lutomirski
05fb3c199b x86/boot: Initialize FPU and X86_FEATURE_ALWAYS even if we don't have CPUID
Otherwise arch_task_struct_size == 0 and we die.  While we're at it,
set X86_FEATURE_ALWAYS, too.

Reported-by: David Saggiorato <david@saggiorato.net>
Tested-by: David Saggiorato <david@saggiorato.net>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: aaeb5c01c5b ("x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86")
Link: http://lkml.kernel.org/r/8de723afbf0811071185039f9088733188b606c9.1475103911.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30 13:53:04 +02:00
Ingo Molnar
2b3061c77c Merge branch 'x86/mm' into x86/asm, to unify the two branches for simplicity
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-08 08:41:52 +02:00
Borislav Petkov
556b672368 x86/entry: Remove outdated comment about SYSCALL targets
The comment probably meant some old AMD64 incarnation which most likely
never saw the light of day. STAR and LSTAR are two different registers
and STAR sets CS/SS(DS) selectors for *all* modes, not only 32-bit.

So simply remove that comment.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160823172356.15879-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-24 11:20:31 +02:00
Josh Poimboeuf
4950d6d48a x86/dumpstack: Remove 64-byte gap at end of irq stack
There has been a 64-byte gap at the end of the irq stack for at least 12
years.  It predates git history, and I can't find any good reason for
it.  Remove it.  What's the worst that could happen?

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/14f9281c5475cc44af95945ea7546bff2e3836db.1471535549.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-18 18:41:33 +02:00
Kees Cook
404f6aac9b x86: Apply more __ro_after_init and const
Guided by grsecurity's analogous __read_only markings in arch/x86,
this applies several uses of __ro_after_init to structures that are
only updated during __init, and const for some structures that are
never updated.  Additionally extends __init markings to some functions
that are only used during __init, and cleans up some missing C99 style
static initializers.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Brown <david.brown@linaro.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-hardening@lists.openwall.com
Link: http://lkml.kernel.org/r/20160808232906.GA29731@www.outflux.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 14:55:05 +02:00
Linus Torvalds
aeb35d6b74 Merge branch 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 header cleanups from Ingo Molnar:
 "This tree is a cleanup of the x86 tree reducing spurious uses of
  module.h - which should improve build performance a bit"

* 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, crypto: Restore MODULE_LICENSE() to glue_helper.c so it loads
  x86/apic: Remove duplicated include from probe_64.c
  x86/ce4100: Remove duplicated include from ce4100.c
  x86/headers: Include spinlock_types.h in x8664_ksyms_64.c for missing spinlock_t
  x86/platform: Delete extraneous MODULE_* tags fromm ts5500
  x86: Audit and remove any remaining unnecessary uses of module.h
  x86/kvm: Audit and remove any unnecessary uses of module.h
  x86/xen: Audit and remove any unnecessary uses of module.h
  x86/platform: Audit and remove any unnecessary uses of module.h
  x86/lib: Audit and remove any unnecessary uses of module.h
  x86/kernel: Audit and remove any unnecessary uses of module.h
  x86/mm: Audit and remove any unnecessary uses of module.h
  x86: Don't use module.h just for AUTHOR / LICENSE tags
2016-08-01 14:23:42 -04:00
Andy Lutomirski
fb59831b49 x86/smp: Remove stack_smp_processor_id()
It serves no purpose -- raw_smp_processor_id() works fine.  This
change will be needed to move thread_info off the stack.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a2bf4f07fbc30fb32f9f7f3f8f94ad3580823847.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:26:30 +02:00
Paul Gortmaker
186f43608a x86/kernel: Audit and remove any unnecessary uses of module.h
Historically a lot of these existed because we did not have
a distinction between what was modular code and what was providing
support to modules via EXPORT_SYMBOL and friends.  That changed
when we forked out support for the latter into the export.h file.

This means we should be able to reduce the usage of module.h
in code that is obj-y Makefile or bool Kconfig.  The advantage
in doing so is that module.h itself sources about 15 other headers;
adding significantly to what we feed cpp, and it can obscure what
headers we are effectively using.

Since module.h was the source for init.h (for __init) and for
export.h (for EXPORT_SYMBOL) we consider each obj-y/bool instance
for the presence of either and replace as needed.  Build testing
revealed some implicit header usage that was fixed up accordingly.

Note that some bool/obj-y instances remain since module.h is
the header for some exception table entry stuff, and for things
like __init_or_module (code that is tossed when MODULES=n).

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160714001901.31603-4-paul.gortmaker@windriver.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14 15:06:41 +02:00
Ingo Molnar
06cd3d8c14 Merge branch 'linus' into x86/urgent, to refresh the tree
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-20 09:09:26 +02:00
Dave Hansen
0f6ff2bce0 x86/mm/mpx: Work around MPX erratum SKD046
This erratum essentially causes the CPU to forget which privilege
level it is operating on (kernel vs. user) for the purposes of MPX.

This erratum can only be triggered when a system is not using
Supervisor Mode Execution Prevention (SMEP).  Our workaround for
the erratum is to ensure that MPX can only be used in cases where
SMEP is present in the processor and is enabled.

This erratum only affects Core processors.  Atom is unaffected.
But, there is no architectural way to determine Atom vs. Core.
So, we just apply this workaround to all processors.  It's
possible that it will mistakenly disable MPX on some Atom
processsors or future unaffected Core processors.  There are
currently no processors that have MPX and not SMEP.  It would
take something akin to a hypervisor masking SMEP out on an Atom
processor for this to present itself on current hardware.

More details can be found at:

  http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/desktop-6th-gen-core-family-spec-update.pdf

"
  SKD046 Branch Instructions May Initialize MPX Bound Registers Incorrectly

  Problem:

  Depending on the current Intel MPX (Memory Protection
  Extensions) configuration, execution of certain branch
  instructions (near CALL, near RET, near JMP, and Jcc
  instructions) without a BND prefix (F2H) initialize the MPX bound
  registers. Due to this erratum, such a branch instruction that is
  executed both with CPL = 3 and with CPL < 3 may not use the
  correct MPX configuration register (BNDCFGU or BNDCFGS,
  respectively) for determining whether to initialize the bound
  registers; it may thus initialize the bound registers when it
  should not, or fail to initialize them when it should.

  Implication:

  A branch instruction that has executed both in user mode and in
  supervisor mode (from the same linear address) may cause a #BR
  (bound range fault) when it should not have or may not cause a
  #BR when it should have.  Workaround An operating system can
  avoid this erratum by setting CR4.SMEP[bit 20] to enable
  supervisor-mode execution prevention (SMEP). When SMEP is
  enabled, no code can be executed both with CPL = 3 and with CPL < 3.
"

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160512220400.3B35F1BC@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-20 09:07:40 +02:00
Linus Torvalds
168f1a7163 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
 "The main changes in this cycle were:

   - MSR access API fixes and enhancements (Andy Lutomirski)

   - early exception handling improvements (Andy Lutomirski)

   - user-space FS/GS prctl usage fixes and improvements (Andy
     Lutomirski)

   - Remove the cpu_has_*() APIs and replace them with equivalents
     (Borislav Petkov)

   - task switch micro-optimization (Brian Gerst)

   - 32-bit entry code simplification (Denys Vlasenko)

   - enhance PAT handling in enumated CPUs (Toshi Kani)

  ... and lots of other cleanups/fixlets"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
  x86/arch_prctl/64: Restore accidentally removed put_cpu() in ARCH_SET_GS
  x86/entry/32: Remove asmlinkage_protect()
  x86/entry/32: Remove GET_THREAD_INFO() from entry code
  x86/entry, sched/x86: Don't save/restore EFLAGS on task switch
  x86/asm/entry/32: Simplify pushes of zeroed pt_regs->REGs
  selftests/x86/ldt_gdt: Test set_thread_area() deletion of an active segment
  x86/tls: Synchronize segment registers in set_thread_area()
  x86/asm/64: Rename thread_struct's fs and gs to fsbase and gsbase
  x86/arch_prctl/64: Remove FSBASE/GSBASE < 4G optimization
  x86/segments/64: When load_gs_index fails, clear the base
  x86/segments/64: When loadsegment(fs, ...) fails, clear the base
  x86/asm: Make asm/alternative.h safe from assembly
  x86/asm: Stop depending on ptrace.h in alternative.h
  x86/entry: Rename is_{ia32,x32}_task() to in_{ia32,x32}_syscall()
  x86/asm: Make sure verify_cpu() has a good stack
  x86/extable: Add a comment about early exception handlers
  x86/msr: Set the return value to zero when native_rdmsr_safe() fails
  x86/paravirt: Make "unsafe" MSR accesses unsafe even if PARAVIRT=y
  x86/paravirt: Add paravirt_{read,write}_msr()
  x86/msr: Carry on after a non-"safe" MSR access fails
  ...
2016-05-16 15:15:17 -07:00
Dave Hansen
e8df1a95b6 x86/cpufeature, x86/mm/pkeys: Fix broken compile-time disabling of pkeys
When I added support for the Memory Protection Keys processor
feature, I had to reindent the REQUIRED/DISABLED_MASK macros, and
also consult the later cpufeature words.

I'm not quite sure how I bungled it, but I consulted the wrong
word at the end.  This only affected required or disabled cpu
features in cpufeature words 14, 15 and 16.  So, only Protection
Keys itself was screwed over here.

The result was that if you disabled pkeys in your .config, you
might still see some code show up that should have been compiled
out.  There should be no functional problems, though.

In verifying this patch I also realized that the DISABLE_PKU/OSPKE
macros were defined backwards and that the cpu_has() check in
setup_pku() was not doing the compile-time disabled checks.

So also fix the macro for DISABLE_PKU/OSPKE and add a compile-time
check for pkeys being enabled in setup_pku().

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: dfb4a70f20 ("x86/cpufeature, x86/mm/pkeys: Add protection keys related CPUID definitions")
Link: http://lkml.kernel.org/r/20160513221328.C200930B@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-16 12:59:23 +02:00
Yazen Ghannam
71faad4306 x86/cpu: Add detection of AMD RAS Capabilities
Add a new CPUID leaf to hold the contents of CPUID 0x80000007_EBX (RasCap).

Define bits that are currently in use:

 Bit 0: McaOverflowRecov
 Bit 1: SUCCOR
 Bit 3: ScalableMca

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
[ Shorten comment. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1462971509-3856-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-12 09:08:22 +02:00
Andy Lutomirski
45e876f794 x86/segments/64: When loadsegment(fs, ...) fails, clear the base
On AMD CPUs, a failed loadsegment currently may not clear the FS
base.  Fix it.

While we're at it, prevent loadsegment(gs, xyz) from even compiling
on 64-bit kernels.  It shouldn't be used.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a084c1b93b7b1408b58d3fd0b5d6e47da8e7d7cf.1461698311.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-29 11:56:41 +02:00
Borislav Petkov
59e21e3d00 x86/cpufeature: Replace cpu_has_tsc with boot_cpu_has() usage
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Sailer <t.sailer@alumni.ethz.ch>
Link: http://lkml.kernel.org/r/1459801503-15600-7-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 11:37:41 +02:00
Andy Lutomirski
0230bb038f x86/cpu: Move X86_BUG_ESPFIX initialization to generic_identify()
It was in detect_nopl(), which was either a mistake by me or some kind
of mis-merge.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rudolf Marek <r.marek@assembler.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: ff236456f072 ("x86/cpu: Move X86_BUG_ESPFIX initialization to generic_identify")
Link: http://lkml.kernel.org/r/0949337f13660461edca08ab67d1a841441289c9.1460075211.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 10:20:42 +02:00
Andy Lutomirski
7a5d670487 x86/cpu: Probe the behavior of nulling out a segment at boot time
AMD and Intel do different things when writing zero to a segment
selector.  Since neither vendor documents the behavior well and it's
easy to test the behavior, try nulling fs to see what happens.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rudolf Marek <r.marek@assembler.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/61588ba0e0df35beafd363dc8b68a4c5878ef095.1460075211.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 10:20:41 +02:00
Borislav Petkov
b3edfda438 x86/cpu: Do the feature test first in enable_sep_cpu()
... before assigning local vars. Kill out label too and simplify.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1458130769-24963-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-29 12:57:12 +02:00
Linus Torvalds
3fa2fe2ce0 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "This tree contains various perf fixes on the kernel side, plus three
  hw/event-enablement late additions:

   - Intel Memory Bandwidth Monitoring events and handling
   - the AMD Accumulated Power Mechanism reporting facility
   - more IOMMU events

  ... and a final round of perf tooling updates/fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
  perf llvm: Use strerror_r instead of the thread unsafe strerror one
  perf llvm: Use realpath to canonicalize paths
  perf tools: Unexport some methods unused outside strbuf.c
  perf probe: No need to use formatting strbuf method
  perf help: Use asprintf instead of adhoc equivalents
  perf tools: Remove unused perf_pathdup, xstrdup functions
  perf tools: Do not include stringify.h from the kernel sources
  tools include: Copy linux/stringify.h from the kernel
  tools lib traceevent: Remove redundant CPU output
  perf tools: Remove needless 'extern' from function prototypes
  perf tools: Simplify die() mechanism
  perf tools: Remove unused DIE_IF macro
  perf script: Remove lots of unused arguments
  perf thread: Rename perf_event__preprocess_sample_addr to thread__resolve
  perf machine: Rename perf_event__preprocess_sample to machine__resolve
  perf tools: Add cpumode to struct perf_sample
  perf tests: Forward the perf_sample in the dwarf unwind test
  perf tools: Remove misplaced __maybe_unused
  perf list: Fix documentation of :ppp
  perf bench numa: Fix assertion for nodes bitfield
  ...
2016-03-24 10:02:14 -07:00
Linus Torvalds
d88f48e128 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes:

   - fix hotplug bugs
   - fix irq live lock
   - fix various topology handling bugs
   - fix APIC ACK ordering
   - fix PV iopl handling
   - fix speling
   - fix/tweak memcpy_mcsafe() return value
   - fix fbcon bug
   - remove stray prototypes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/msr: Remove unused native_read_tscp()
  x86/apic: Remove declaration of unused hw_nmi_is_cpu_stuck
  x86/oprofile/nmi: Add missing hotplug FROZEN handling
  x86/hpet: Use proper mask to modify hotplug action
  x86/apic/uv: Fix the hotplug notifier
  x86/apb/timer: Use proper mask to modify hotplug action
  x86/topology: Use total_cpus not nr_cpu_ids for logical packages
  x86/topology: Fix Intel HT disable
  x86/topology: Fix logical package mapping
  x86/irq: Cure live lock in fixup_irqs()
  x86/tsc: Prevent NULL pointer deref in calibrate_delay_is_known()
  x86/apic: Fix suspicious RCU usage in smp_trace_call_function_interrupt()
  x86/iopl: Fix iopl capability check on Xen PV
  x86/iopl/64: Properly context-switch IOPL on Xen PV
  selftests/x86: Add an iopl test
  x86/mm, x86/mce: Fix return type/value for memcpy_mcsafe()
  x86/video: Don't assume all FB devices are PCI devices
  arch/x86/irq: Purge useless handler declarations from hw_irq.h
  x86: Fix misspellings in comments
2016-03-24 09:47:32 -07:00
Vikas Shivappa
33c3cc7acf perf/x86/mbm: Add Intel Memory B/W Monitoring enumeration and init
The MBM init patch enumerates the Intel MBM (Memory b/w monitoring)
and initializes the perf events and datastructures for monitoring the
memory b/w.

Its based on original patch series by Tony Luck and Kanaka Juvva.

Memory bandwidth monitoring (MBM) provides OS/VMM a way to monitor
bandwidth from one level of cache to another. The current patches
support L3 external bandwidth monitoring. It supports both 'local
bandwidth' and 'total bandwidth' monitoring for the socket. Local
bandwidth measures the amount of data sent through the memory controller
on the socket and total b/w measures the total system bandwidth.

Extending the cache quality of service monitoring (CQM) we add two
more events to the perf infrastructure:

  intel_cqm_llc/local_bytes - bytes sent through local socket memory controller
  intel_cqm_llc/total_bytes - total L3 external bytes sent

The tasks are associated with a Resouce Monitoring ID (RMID) just like
in CQM and OS uses a MSR write to indicate the RMID of the task during
scheduling.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: fenghua.yu@intel.com
Cc: h.peter.anvin@intel.com
Cc: ravi.v.shankar@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/1457652732-4499-4-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21 09:08:19 +01:00
Linus Torvalds
643ad15d47 Merge branch 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 protection key support from Ingo Molnar:
 "This tree adds support for a new memory protection hardware feature
  that is available in upcoming Intel CPUs: 'protection keys' (pkeys).

  There's a background article at LWN.net:

      https://lwn.net/Articles/643797/

  The gist is that protection keys allow the encoding of
  user-controllable permission masks in the pte.  So instead of having a
  fixed protection mask in the pte (which needs a system call to change
  and works on a per page basis), the user can map a (handful of)
  protection mask variants and can change the masks runtime relatively
  cheaply, without having to change every single page in the affected
  virtual memory range.

  This allows the dynamic switching of the protection bits of large
  amounts of virtual memory, via user-space instructions.  It also
  allows more precise control of MMU permission bits: for example the
  executable bit is separate from the read bit (see more about that
  below).

  This tree adds the MM infrastructure and low level x86 glue needed for
  that, plus it adds a high level API to make use of protection keys -
  if a user-space application calls:

        mmap(..., PROT_EXEC);

  or

        mprotect(ptr, sz, PROT_EXEC);

  (note PROT_EXEC-only, without PROT_READ/WRITE), the kernel will notice
  this special case, and will set a special protection key on this
  memory range.  It also sets the appropriate bits in the Protection
  Keys User Rights (PKRU) register so that the memory becomes unreadable
  and unwritable.

  So using protection keys the kernel is able to implement 'true'
  PROT_EXEC on x86 CPUs: without protection keys PROT_EXEC implies
  PROT_READ as well.  Unreadable executable mappings have security
  advantages: they cannot be read via information leaks to figure out
  ASLR details, nor can they be scanned for ROP gadgets - and they
  cannot be used by exploits for data purposes either.

  We know about no user-space code that relies on pure PROT_EXEC
  mappings today, but binary loaders could start making use of this new
  feature to map binaries and libraries in a more secure fashion.

  There is other pending pkeys work that offers more high level system
  call APIs to manage protection keys - but those are not part of this
  pull request.

  Right now there's a Kconfig that controls this feature
  (CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) that is default enabled
  (like most x86 CPU feature enablement code that has no runtime
  overhead), but it's not user-configurable at the moment.  If there's
  any serious problem with this then we can make it configurable and/or
  flip the default"

* 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
  x86/mm/pkeys: Fix mismerge of protection keys CPUID bits
  mm/pkeys: Fix siginfo ABI breakage caused by new u64 field
  x86/mm/pkeys: Fix access_error() denial of writes to write-only VMA
  mm/core, x86/mm/pkeys: Add execute-only protection keys support
  x86/mm/pkeys: Create an x86 arch_calc_vm_prot_bits() for VMA flags
  x86/mm/pkeys: Allow kernel to modify user pkey rights register
  x86/fpu: Allow setting of XSAVE state
  x86/mm: Factor out LDT init from context init
  mm/core, x86/mm/pkeys: Add arch_validate_pkey()
  mm/core, arch, powerpc: Pass a protection key in to calc_vm_flag_bits()
  x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU
  x86/mm/pkeys: Add Kconfig prompt to existing config option
  x86/mm/pkeys: Dump pkey from VMA in /proc/pid/smaps
  x86/mm/pkeys: Dump PKRU with other kernel registers
  mm/core, x86/mm/pkeys: Differentiate instruction fetches
  x86/mm/pkeys: Optimize fault handling in access_error()
  mm/core: Do not enforce PKEY permissions on remote mm access
  um, pkeys: Add UML arch_*_access_permitted() methods
  mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys
  x86/mm/gup: Simplify get_user_pages() PTE bit handling
  ...
2016-03-20 19:08:56 -07:00
Ingo Molnar
00f5268501 Merge branch 'x86/cleanups' into x86/urgent
Pull in some merge window leftovers.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-17 09:44:57 +01:00
Linus Torvalds
13c76ad872 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Enable full ASLR randomization for 32-bit programs (Hector
     Marco-Gisbert)

   - Add initial minimal INVPCI support, to flush global mappings (Andy
     Lutomirski)

   - Add KASAN enhancements (Andrey Ryabinin)

   - Fix mmiotrace for huge pages (Karol Herbst)

   - ... misc cleanups and small enhancements"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm/32: Enable full randomization on i386 and X86_32
  x86/mm/kmmio: Fix mmiotrace for hugepages
  x86/mm: Avoid premature success when changing page attributes
  x86/mm/ptdump: Remove paravirt_enabled()
  x86/mm: Fix INVPCID asm constraint
  x86/dmi: Switch dmi_remap() from ioremap() [uncached] to ioremap_cache()
  x86/mm: If INVPCID is available, use it to flush global mappings
  x86/mm: Add a 'noinvpcid' boot option to turn off INVPCID
  x86/mm: Add INVPCID helpers
  x86/kasan: Write protect kasan zero shadow
  x86/kasan: Clear kasan_zero_page after TLB flush
  x86/mm/numa: Check for failures in numa_clear_kernel_node_hotplug()
  x86/mm/numa: Clean up numa_clear_kernel_node_hotplug()
  x86/mm: Make kmap_prot into a #define
  x86/mm/32: Set NX in __supported_pte_mask before enabling paging
  x86/mm: Streamline and restore probe_memory_block_size()
2016-03-15 10:45:39 -07:00
Linus Torvalds
ba33ea811e Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
 "This is another big update. Main changes are:

   - lots of x86 system call (and other traps/exceptions) entry code
     enhancements.  In particular the complex parts of the 64-bit entry
     code have been migrated to C code as well, and a number of dusty
     corners have been refreshed.  (Andy Lutomirski)

   - vDSO special mapping robustification and general cleanups (Andy
     Lutomirski)

   - cpufeature refactoring, cleanups and speedups (Borislav Petkov)

   - lots of other changes ..."

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits)
  x86/cpufeature: Enable new AVX-512 features
  x86/entry/traps: Show unhandled signal for i386 in do_trap()
  x86/entry: Call enter_from_user_mode() with IRQs off
  x86/entry/32: Change INT80 to be an interrupt gate
  x86/entry: Improve system call entry comments
  x86/entry: Remove TIF_SINGLESTEP entry work
  x86/entry/32: Add and check a stack canary for the SYSENTER stack
  x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup
  x86/entry: Only allocate space for tss_struct::SYSENTER_stack if needed
  x86/entry: Vastly simplify SYSENTER TF (single-step) handling
  x86/entry/traps: Clear DR6 early in do_debug() and improve the comment
  x86/entry/traps: Clear TIF_BLOCKSTEP on all debug exceptions
  x86/entry/32: Restore FLAGS on SYSEXIT
  x86/entry/32: Filter NT and speed up AC filtering in SYSENTER
  x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test
  selftests/x86: In syscall_nt, test NT|TF as well
  x86/asm-offsets: Remove PARAVIRT_enabled
  x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabled
  uprobes: __create_xol_area() must nullify xol_mapping.fault
  x86/cpufeature: Create a new synthetic cpu capability for machine check recovery
  ...
2016-03-15 09:32:27 -07:00
Andy Lutomirski
58a5aac533 x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabled
x86_64 has very clean espfix handling on paravirt: espfix64 is set
up in native_iret, so paravirt systems that override iret bypass
espfix64 automatically.  This is robust and straightforward.

x86_32 is messier.  espfix is set up before the IRET paravirt patch
point, so it can't be directly conditionalized on whether we use
native_iret.  We also can't easily move it into native_iret without
regressing performance due to a bizarre consideration.  Specifically,
on 64-bit kernels, the logic is:

  if (regs->ss & 0x4)
          setup_espfix;

On 32-bit kernels, the logic is:

  if ((regs->ss & 0x4) && (regs->cs & 0x3) == 3 &&
      (regs->flags & X86_EFLAGS_VM) == 0)
          setup_espfix;

The performance of setup_espfix itself is essentially irrelevant, but
the comparison happens on every IRET so its performance matters.  On
x86_64, there's no need for any registers except flags to implement
the comparison, so we fold the whole thing into native_iret.  On
x86_32, we don't do that because we need a free register to
implement the comparison efficiently.  We therefore do espfix setup
before restoring registers on x86_32.

This patch gets rid of the explicit paravirt_enabled check by
introducing X86_BUG_ESPFIX on 32-bit systems and using an ALTERNATIVE
to skip espfix on paravirt systems where iret != native_iret.  This is
also messy, but it's at least in line with other things we do.

This improves espfix performance by removing a branch, but no one
cares.  More importantly, it removes a paravirt_enabled user, which is
good because paravirt_enabled is ill-defined and is going away.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boris.ostrovsky@oracle.com
Cc: david.vrabel@citrix.com
Cc: konrad.wilk@oracle.com
Cc: lguest@lists.ozlabs.org
Cc: xen-devel@lists.xensource.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08 14:16:44 +01:00
Thomas Gleixner
1f12e32f4c x86/topology: Create logical package id
For per package oriented services we must be able to rely on the number of CPU
packages to be within bounds. Create a tracking facility, which

- calculates the number of possible packages depending on nr_cpu_ids after boot

- makes sure that the package id is within the number of possible packages. If
  the apic id is outside we map it to a logical package id if there is enough
  space available.

Provide interfaces for drivers to query the mapping and do translations from
physcial to logical ids.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160222221011.541071755@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:35:18 +01:00
Adam Buchbinder
6a6256f9e0 x86: Fix misspellings in comments
Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: trivial@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-24 08:44:58 +01:00
Dave Hansen
0697694564 x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU
This sets the bit in 'cr4' to actually enable the protection
keys feature.  We also include a boot-time disable for the
feature "nopku".

Seting X86_CR4_PKE will cause the X86_FEATURE_OSPKE cpuid
bit to appear set.  At this point in boot, identify_cpu()
has already run the actual CPUID instructions and populated
the "cpu features" structures.  We need to go back and
re-run identify_cpu() to make sure it gets updated values.

We *could* simply re-populate the 11th word of the cpuid
data, but this is probably quick enough.

Also note that with the cpu_has() check and X86_FEATURE_PKU
present in disabled-features.h, we do not need an #ifdef
for setup_pku().

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210229.6708027C@viggo.jf.intel.com
[ Small readability edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18 19:46:30 +01:00
Dave Hansen
dfb4a70f20 x86/cpufeature, x86/mm/pkeys: Add protection keys related CPUID definitions
There are two CPUID bits for protection keys.  One is for whether
the CPU contains the feature, and the other will appear set once
the OS enables protection keys.  Specifically:

	Bit 04: OSPKE. If 1, OS has set CR4.PKE to enable
	Protection keys (and the RDPKRU/WRPKRU instructions)

This is because userspace can not see CR4 contents, but it can
see CPUID contents.

X86_FEATURE_PKU is referred to as "PKU" in the hardware documentation:

	CPUID.(EAX=07H,ECX=0H):ECX.PKU [bit 3]

X86_FEATURE_OSPKE is "OSPKU":

	CPUID.(EAX=07H,ECX=0H):ECX.OSPKE [bit 4]

These are the first CPU features which need to look at the
ECX word in CPUID leaf 0x7, so this patch also includes
fetching that word in to the cpuinfo->x86_capability[] array.

Add it to the disabled-features mask when its config option is
off.  Even though we are not using it here, we also extend the
REQUIRED_MASK_BIT_SET() macro to keep it mirroring the
DISABLED_MASK_BIT_SET() version.

This means that in almost all code, you should use:

	cpu_has(c, X86_FEATURE_PKU)

and *not* the CONFIG option.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210201.7714C250@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-16 10:11:13 +01:00
Ingo Molnar
1fe3f29e4a Merge branches 'x86/fpu', 'x86/mm' and 'x86/asm' into x86/pkeys
Provide a stable basis for the pkeys patches, which touches various
x86 details.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-16 09:37:37 +01:00
Andy Lutomirski
d12a72b844 x86/mm: Add a 'noinvpcid' boot option to turn off INVPCID
This adds a chicken bit to turn off INVPCID in case something goes
wrong.  It's an early_param() because we do TLB flushes before we
parse __setup() parameters.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/f586317ed1bc2b87aee652267e515b90051af385.1454096309.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09 13:36:10 +01:00
Chen Yucong
1b74dde7c4 x86/cpu: Convert printk(KERN_<LEVEL> ...) to pr_<level>(...)
- Use the more current logging style pr_<level>(...) instead of the old
   printk(KERN_<LEVEL> ...).

 - Convert pr_warning() to pr_warn().

Signed-off-by: Chen Yucong <slaoub@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1454384702-21707-1-git-send-email-slaoub@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-03 10:30:03 +01:00
Brian Gerst
2476f2fa20 x86/alternatives: Discard dynamic check after init
Move the code to do the dynamic check to the altinstr_aux
section so that it is discarded after alternatives have run and
a static branch has been chosen.

This way we're changing the dynamic branch from C code to
assembly, which makes it *substantially* smaller while avoiding
a completely unnecessary call to an out of line function.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
[ Changed it to do TESTB, as hpa suggested. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1452972124-7380-1-git-send-email-brgerst@gmail.com
Link: http://lkml.kernel.org/r/20160127084525.GC30712@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-30 11:22:22 +01:00
Borislav Petkov
bc696ca05f x86/cpufeature: Replace the old static_cpu_has() with safe variant
So the old one didn't work properly before alternatives had run.
And it was supposed to provide an optimized JMP because the
assumption was that the offset it is jumping to is within a
signed byte and thus a two-byte JMP.

So I did an x86_64 allyesconfig build and dumped all possible
sites where static_cpu_has() was used. The optimization amounted
to all in all 12(!) places where static_cpu_has() had generated
a 2-byte JMP. Which has saved us a whopping 36 bytes!

This clearly is not worth the trouble so we can remove it. The
only place where the optimization might count - in __switch_to()
- we will handle differently. But that's not subject of this
patch.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1453842730-28463-6-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-30 11:22:18 +01:00
Linus Torvalds
671d5532aa Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Improved CPU ID handling code and related enhancements (Borislav
     Petkov)

   - RDRAND fix (Len Brown)"

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Replace RDRAND forced-reseed with simple sanity check
  x86/MSR: Chop off lower 32-bit value
  x86/cpu: Fix MSR value truncation issue
  x86/cpu/amd, kvm: Satisfy guest kernel reads of IC_CFG MSR
  kvm: Add accessors for guest CPU's family, model, stepping
  x86/cpu: Unify CPU family, model, stepping calculation
2016-01-11 16:46:20 -08:00
Borislav Petkov
362f924b64 x86/cpufeature: Remove unused and seldomly used cpu_has_xx macros
Those are stupid and code should use static_cpu_has_safe() or
boot_cpu_has() instead. Kill the least used and unused ones.

The remaining ones need more careful inspection before a conversion can
happen. On the TODO.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1449481182-27541-4-git-send-email-bp@alien8.de
Cc: David Sterba <dsterba@suse.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-19 11:49:55 +01:00
Borislav Petkov
39c06df4dc x86/cpufeature: Cleanup get_cpu_cap()
Add an enum for the ->x86_capability array indices and cleanup
get_cpu_cap() by killing some redundant local vars.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1449481182-27541-3-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-19 11:49:54 +01:00
Borislav Petkov
2ccd71f1b2 x86/cpufeature: Move some of the scattered feature bits to x86_capability
Turn the CPUID leafs which are proper CPUID feature bit leafs into
separate ->x86_capability words.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1449481182-27541-2-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-19 11:49:53 +01:00
Borislav Petkov
31ac34ca56 x86/cpu: Fix MSR value truncation issue
So sparse rightfully complains that the u64 MSR value we're
writing into the STAR MSR, i.e. 0xc0000081, is being truncated:

./arch/x86/include/asm/msr.h:193:36: warning: cast truncates
bits from constant value (23001000000000 becomes 0)

because the actual value doesn't fit into the unsigned 32-bit
quantity which are the @low and @high wrmsrl() parameters.

This is not a problem, practically, because gcc is actually
being smart enough here and does the right thing:

  .loc 3 87 0
  xorl    %esi, %esi		# we needz a 32-bit zero
  movl    $2293776, %edx	# 0x00230010 == (__USER32_CS << 16) | __KERNEL_CS go into the high bits
  movl    $-1073741695, %ecx	# MSR_STAR, i.e., 0xc0000081
  movl    %esi, %eax		# low order 32 bits in the MSR which are 0
  #APP
  # 87 "./arch/x86/include/asm/msr.h" 1
          wrmsr

More specifically, MSR_STAR[31:0] is being set to 0. That field
is reserved on Intel and on AMD it is 32-bit SYSCALL Target EIP.

I'd strongly guess because Intel doesn't have SYSCALL in
compat/legacy mode and we're using SYSENTER and INT80 there. And
for compat syscalls in long mode we use CSTAR.

So let's fix the sparse warning by writing SYSRET and SYSCALL CS
and SS into the high 32-bit half of STAR and 0 in the low half
explicitly.

 [ Actually, if we had to be precise, we would have to read what's in
   STAR[31:0] and write it back unchanged on Intel and write 0 on AMD. I
   guess the current writing to 0 is still ok since Intel can apparently
   stomach it. ]

The resulting code is identical to what we have above:

  .loc 3 87 0
  xorl    %esi, %esi      # tmp104
  movl    $2293776, %eax  #, tmp103
  movl    $-1073741695, %ecx      #, tmp102
  movl    %esi, %edx      # tmp104, tmp104

  ...

        wrmsr

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1448273546-2567-6-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:15:55 +01:00
Borislav Petkov
99f925ce92 x86/cpu: Unify CPU family, model, stepping calculation
Add generic functions which calc family, model and stepping from
the CPUID_1.EAX leaf and stick them into the library we have.

Rename those which do call CPUID with the prefix "x86_cpuid" as
suggested by Paolo Bonzini.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1448273546-2567-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:15:54 +01:00
Andrew Cooper
581b7f158f x86/cpu: Fix SMAP check in PVOPS environments
There appears to be no formal statement of what pv_irq_ops.save_fl() is
supposed to return precisely.  Native returns the full flags, while lguest and
Xen only return the Interrupt Flag, and both have comments by the
implementations stating that only the Interrupt Flag is looked at.  This may
have been true when initially implemented, but no longer is.

To make matters worse, the Xen PVOP leaves the upper bits undefined, making
the BUG_ON() undefined behaviour.  Experimentally, this now trips for 32bit PV
guests on Broadwell hardware.  The BUG_ON() is consistent for an individual
build, but not consistent for all builds.  It has also been a sitting timebomb
since SMAP support was introduced.

Use native_save_fl() instead, which will obtain an accurate view of the AC
flag.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Tested-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: <lguest@lists.ozlabs.org>
Cc: Xen-devel <xen-devel@lists.xen.org>
CC: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1433323874-6927-1-git-send-email-andrew.cooper3@citrix.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-19 11:07:49 +01:00
Wan Zongshun
2167ceabf3 x86/cpu: Add CLZERO detection
AMD Fam17h processors introduce support for the CLZERO
instruction. It zeroes out the 64 byte cache line specified in
RAX.

Add the bit here to allow /proc/cpuinfo to list the feature.

Boris: we're adding this as a separate ->x86_capability leaf
because CPUID_80000008_EBX is going to contain more feature bits
and it will fill out with time.

Signed-off-by: Wan Zongshun <Vincent.Wan@amd.com>
Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
[ Wrap code in patch form, fix comments. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1446207099-24948-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-01 11:26:23 +01:00
Borislav Petkov
7c5b190e11 x86/cpu: Print family/model/stepping in hex
924e101a7a ("x86/debug: Dump family, model, stepping of the
boot CPU") had its good intentions to dump the exact F/M/S as an
aid during debugging sessions but its output can be ambiguous.
Fix that:

-smpboot: CPU0: Intel Core Processor (Broadwell) (fam: 06, model: 47, stepping: 02)
+smpboot: CPU0: Intel Core Processor (Broadwell) (family: 0x6, model: 0x47, stepping: 0x2)

Also, spell out "family".

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1441914927-32037-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-13 09:30:07 +02:00
Linus Torvalds
6b2282aa37 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu updates from Ingo Molnar:
 "Two changes: a suspend/resume quirk and a new CPUID bit definition"

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpufeature: Add feature bit for Intel's Silicon Debug CPUID bit
  x86/cpu: Restore MSR_IA32_ENERGY_PERF_BIAS after resume
2015-09-01 09:41:03 -07:00
Andy Lutomirski
47edb65178 x86/asm/msr: Make wrmsrl() a function
As of cf991de2f6 ("x86/asm/msr: Make wrmsrl_safe() a
function"), wrmsrl_safe is a function, but wrmsrl is still a
macro.  The wrmsrl macro performs invalid shifts if the value
argument is 32 bits. This makes it unnecessarily awkward to
write code that puts an unsigned long into an MSR.

To make this work, syscall_init needs tweaking to stop passing
a function pointer to wrmsrl.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Willy Tarreau <w@1wt.eu>
Link: http://lkml.kernel.org/r/690f0c629a1085d054e2d1ef3da073cfb3f7db92.1437678821.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-23 13:25:38 +02:00
Andy Lutomirski
37868fe113 x86/ldt: Make modify_ldt synchronous
modify_ldt() has questionable locking and does not synchronize
threads.  Improve it: redesign the locking and synchronize all
threads' LDTs using an IPI on all modifications.

This will dramatically slow down modify_ldt in multithreaded
programs, but there shouldn't be any multithreaded programs that
care about modify_ldt's performance in the first place.

This fixes some fallout from the CVE-2015-5157 fixes.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: security@kernel.org <security@kernel.org>
Cc: <stable@vger.kernel.org>
Cc: xen-devel <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/4c6978476782160600471bd865b318db34c7b628.1438291540.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31 10:23:23 +02:00
Laura Abbott
b51ef52df7 x86/cpu: Restore MSR_IA32_ENERGY_PERF_BIAS after resume
MSR_IA32_ENERGY_PERF_BIAS is lost after suspend/resume:

	x86_energy_perf_policy -r before

	cpu0: 0x0000000000000006
	cpu1: 0x0000000000000006
	cpu2: 0x0000000000000006
	cpu3: 0x0000000000000006
	cpu4: 0x0000000000000006
	cpu5: 0x0000000000000006
	cpu6: 0x0000000000000006
	cpu7: 0x0000000000000006

	after

	cpu0: 0x0000000000000000
	cpu1: 0x0000000000000006
	cpu2: 0x0000000000000006
	cpu3: 0x0000000000000006
	cpu4: 0x0000000000000006
	cpu5: 0x0000000000000006
	cpu6: 0x0000000000000006
	cpu7: 0x0000000000000006

Resulting in inconsistent energy policy settings across CPUs.

This register is set via init_intel() at bootup. During resume,
the secondary CPUs are brought online again and init_intel() is
callled which re-initializes the register. The boot CPU however
never reinitializes the register.

Add a syscore callback to reinitialize the register for the boot CPU.

Signed-off-by: Laura Abbott <labbott@fedoraproject.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437428878-4105-1-git-send-email-labbott@fedoraproject.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-21 07:51:38 +02:00
Ingo Molnar
db52ef74b3 x86/fpu: Fix FPU related boot regression when CPUID masking BIOS feature is enabled
Mike Galbraith reported:

  " My i7-4790 box is having one hell of a time with this merge
    window, dead in the water.

    BIOS setting "Limit CPUID Maximum" upsets new fpu code
    mightily. "

It turns out that Linux does a double workaround here, as per:

  066941bd4e ("x86: unmask CPUID levels on Intel CPUs")

it undoes the BIOS workaround - but as a side effect the CPUID
state is not completely constant during early init anymore,
and the new FPU init code did not take this into account.

So what happened is that the xstate init code did not have full
CPUID available, which broke subsequent attempts to use xstate
features.

Fix this by ordering the early FPU init code to after we've
stabilized the CPUID state.

Reported-bisected-and-tested-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150627082514.GA10894@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-30 07:22:10 +02:00
Linus Torvalds
d70b3ef54c Merge branch 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 core updates from Ingo Molnar:
 "There were so many changes in the x86/asm, x86/apic and x86/mm topics
  in this cycle that the topical separation of -tip broke down somewhat -
  so the result is a more traditional architecture pull request,
  collected into the 'x86/core' topic.

  The topics were still maintained separately as far as possible, so
  bisectability and conceptual separation should still be pretty good -
  but there were a handful of merge points to avoid excessive
  dependencies (and conflicts) that would have been poorly tested in the
  end.

  The next cycle will hopefully be much more quiet (or at least will
  have fewer dependencies).

  The main changes in this cycle were:

   * x86/apic changes, with related IRQ core changes: (Jiang Liu, Thomas
     Gleixner)

     - This is the second and most intrusive part of changes to the x86
       interrupt handling - full conversion to hierarchical interrupt
       domains:

          [IOAPIC domain]   -----
                                 |
          [MSI domain]      --------[Remapping domain] ----- [ Vector domain ]
                                 |   (optional)          |
          [HPET MSI domain] -----                        |
                                                         |
          [DMAR domain]     -----------------------------
                                                         |
          [Legacy domain]   -----------------------------

       This now reflects the actual hardware and allowed us to distangle
       the domain specific code from the underlying parent domain, which
       can be optional in the case of interrupt remapping.  It's a clear
       separation of functionality and removes quite some duct tape
       constructs which plugged the remap code between ioapic/msi/hpet
       and the vector management.

     - Intel IOMMU IRQ remapping enhancements, to allow direct interrupt
       injection into guests (Feng Wu)

   * x86/asm changes:

     - Tons of cleanups and small speedups, micro-optimizations.  This
       is in preparation to move a good chunk of the low level entry
       code from assembly to C code (Denys Vlasenko, Andy Lutomirski,
       Brian Gerst)

     - Moved all system entry related code to a new home under
       arch/x86/entry/ (Ingo Molnar)

     - Removal of the fragile and ugly CFI dwarf debuginfo annotations.
       Conversion to C will reintroduce many of them - but meanwhile
       they are only getting in the way, and the upstream kernel does
       not rely on them (Ingo Molnar)

     - NOP handling refinements. (Borislav Petkov)

   * x86/mm changes:

     - Big PAT and MTRR rework: making the code more robust and
       preparing to phase out exposing direct MTRR interfaces to drivers -
       in favor of using PAT driven interfaces (Toshi Kani, Luis R
       Rodriguez, Borislav Petkov)

     - New ioremap_wt()/set_memory_wt() interfaces to support
       Write-Through cached memory mappings.  This is especially
       important for good performance on NVDIMM hardware (Toshi Kani)

   * x86/ras changes:

     - Add support for deferred errors on AMD (Aravind Gopalakrishnan)

       This is an important RAS feature which adds hardware support for
       poisoned data.  That means roughly that the hardware marks data
       which it has detected as corrupted but wasn't able to correct, as
       poisoned data and raises an APIC interrupt to signal that in the
       form of a deferred error.  It is the OS's responsibility then to
       take proper recovery action and thus prolonge system lifetime as
       far as possible.

     - Add support for Intel "Local MCE"s: upcoming CPUs will support
       CPU-local MCE interrupts, as opposed to the traditional system-
       wide broadcasted MCE interrupts (Ashok Raj)

     - Misc cleanups (Borislav Petkov)

   * x86/platform changes:

     - Intel Atom SoC updates

  ... and lots of other cleanups, fixlets and other changes - see the
  shortlog and the Git log for details"

* 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (222 commits)
  x86/hpet: Use proper hpet device number for MSI allocation
  x86/hpet: Check for irq==0 when allocating hpet MSI interrupts
  x86/mm/pat, drivers/infiniband/ipath: Use arch_phys_wc_add() and require PAT disabled
  x86/mm/pat, drivers/media/ivtv: Use arch_phys_wc_add() and require PAT disabled
  x86/platform/intel/baytrail: Add comments about why we disabled HPET on Baytrail
  genirq: Prevent crash in irq_move_irq()
  genirq: Enhance irq_data_to_desc() to support hierarchy irqdomain
  iommu, x86: Properly handle posted interrupts for IOMMU hotplug
  iommu, x86: Provide irq_remapping_cap() interface
  iommu, x86: Setup Posted-Interrupts capability for Intel iommu
  iommu, x86: Add cap_pi_support() to detect VT-d PI capability
  iommu, x86: Avoid migrating VT-d posted interrupts
  iommu, x86: Save the mode (posted or remapped) of an IRTE
  iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip
  iommu: dmar: Provide helper to copy shared irte fields
  iommu: dmar: Extend struct irte for VT-d Posted-Interrupts
  iommu: Add new member capability to struct irq_remap_ops
  x86/asm/entry/64: Disentangle error_entry/exit gsbase/ebx/usermode code
  x86/asm/entry/32: Shorten __audit_syscall_entry() args preparation
  x86/asm/entry/32: Explain reloading of registers after __audit_syscall_entry()
  ...
2015-06-22 17:59:09 -07:00
Linus Torvalds
e75c73ad64 Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 FPU updates from Ingo Molnar:
 "This tree contains two main changes:

   - The big FPU code rewrite: wide reaching cleanups and reorganization
     that pulls all the FPU code together into a clean base in
     arch/x86/fpu/.

     The resulting code is leaner and faster, and much easier to
     understand.  This enables future work to further simplify the FPU
     code (such as removing lazy FPU restores).

     By its nature these changes have a substantial regression risk: FPU
     code related bugs are long lived, because races are often subtle
     and bugs mask as user-space failures that are difficult to track
     back to kernel side backs.  I'm aware of no unfixed (or even
     suspected) FPU related regression so far.

   - MPX support rework/fixes.  As this is still not a released CPU
     feature, there were some buglets in the code - should be much more
     robust now (Dave Hansen)"

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (250 commits)
  x86/fpu: Fix double-increment in setup_xstate_features()
  x86/mpx: Allow 32-bit binaries on 64-bit kernels again
  x86/mpx: Do not count MPX VMAs as neighbors when unmapping
  x86/mpx: Rewrite the unmap code
  x86/mpx: Support 32-bit binaries on 64-bit kernels
  x86/mpx: Use 32-bit-only cmpxchg() for 32-bit apps
  x86/mpx: Introduce new 'directory entry' to 'addr' helper function
  x86/mpx: Add temporary variable to reduce masking
  x86: Make is_64bit_mm() widely available
  x86/mpx: Trace allocation of new bounds tables
  x86/mpx: Trace the attempts to find bounds tables
  x86/mpx: Trace entry to bounds exception paths
  x86/mpx: Trace #BR exceptions
  x86/mpx: Introduce a boot-time disable flag
  x86/mpx: Restrict the mmap() size check to bounds tables
  x86/mpx: Remove redundant MPX_BNDCFG_ADDR_MASK
  x86/mpx: Clean up the code by not passing a task pointer around when unnecessary
  x86/mpx: Use the new get_xsave_field_ptr()API
  x86/fpu/xstate: Wrap get_xsave_addr() to make it safer
  x86/fpu/xstate: Fix up bad get_xsave_addr() assumptions
  ...
2015-06-22 17:16:11 -07:00
Dave Hansen
8c3641e957 x86/mpx: Introduce a boot-time disable flag
MPX has the _potential_ to cause some issues.  Say part of your
init system tried to protect one of its components from buffer
overflows with MPX.  If there were a false positive, it's
possible that MPX could keep a system from booting.

MPX could also potentially cause performance issues since it is
present in hot paths like the unmap path.

Allow it to be disabled at boot time.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150607183702.2E8B77AB@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-09 12:24:31 +02:00
Ingo Molnar
9dda1658a9 Merge branch 'x86/asm' into x86/core, to prepare for new patch
Collect all changes to arch/x86/entry/entry_64.S, before applying
patch that changes most of the file.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-08 20:48:20 +02:00
Ingo Molnar
b2502b418e x86/asm/entry: Untangle 'system_call' into two entry points: entry_SYSCALL_64 and entry_INT80_32
The 'system_call' entry points differ starkly between native 32-bit and 64-bit
kernels: on 32-bit kernels it defines the INT 0x80 entry point, while on
64-bit it's the SYSCALL entry point.

This is pretty confusing when looking at generic code, and it also obscures
the nature of the entry point at the assembly level.

So unangle this by splitting the name into its two uses:

	system_call (32) -> entry_INT80_32
	system_call (64) -> entry_SYSCALL_64

As per the generic naming scheme for x86 system call entry points:

	entry_MNEMONIC_qualifier

where 'qualifier' is one of _32, _64 or _compat.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-08 09:14:21 +02:00
Ingo Molnar
4c8cd0c50d x86/asm/entry: Untangle 'ia32_sysenter_target' into two entry points: entry_SYSENTER_32 and entry_SYSENTER_compat
So the SYSENTER instruction is pretty quirky and it has different behavior
depending on bitness and CPU maker.

Yet we create a false sense of coherency by naming it 'ia32_sysenter_target'
in both of the cases.

Split the name into its two uses:

	ia32_sysenter_target (32)    -> entry_SYSENTER_32
	ia32_sysenter_target (64)    -> entry_SYSENTER_compat

As per the generic naming scheme for x86 system call entry points:

	entry_MNEMONIC_qualifier

where 'qualifier' is one of _32, _64 or _compat.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-08 08:47:46 +02:00
Ingo Molnar
2cd23553b4 x86/asm/entry: Rename compat syscall entry points
Rename the following system call entry points:

	ia32_cstar_target       -> entry_SYSCALL_compat
	ia32_syscall            -> entry_INT80_compat

The generic naming scheme for x86 system call entry points is:

	entry_MNEMONIC_qualifier

where 'qualifier' is one of _32, _64 or _compat.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-08 08:47:36 +02:00
Borislav Petkov
c8e56d20f2 x86: Kill CONFIG_X86_HT
In talking to Aravind recently about making certain AMD topology
attributes available to the MCE injection module, it seemed like
that CONFIG_X86_HT thing is more or less superfluous. It is
def_bool y, depends on SMP and gets enabled in the majority of
.configs - distro and otherwise - out there.

So let's kill it and make code behind it depend directly on SMP.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Daniel Walter <dwalter@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jacob Shin <jacob.w.shin@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1433436928-31903-18-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 15:33:44 +02:00
Borislav Petkov
ee098e1aed x86/cpu: Trim model ID whitespace
We did try trimming whitespace surrounding the 'model name'
field in /proc/cpuinfo since reportedly some userspace uses it
in string comparisons and there were discrepancies:

  [thetango@prarit ~]# grep "^model name" /proc/cpuinfo | uniq -c | sed 's/\ /_/g'
  ______1_model_name      :_AMD_Opteron(TM)_Processor_6272
  _____63_model_name      :_AMD_Opteron(TM)_Processor_6272_________________

However, there were issues with overlapping buffers, string
sizes and non-byte-sized copies in the previous proposed
solutions; see Link tags below for the whole farce.

So, instead of diddling with this more, let's simply extend what
was there originally with trimming any present trailing
whitespace. Final result is really simple and obvious.

Testing with the most insane model IDs qemu can generate, looks
good:

  .model_id = "            My funny model ID CPU          ",
  ______4_model_name      :_My_funny_model_ID_CPU

  .model_id = "My funny model ID CPU          ",
  ______4_model_name      :_My_funny_model_ID_CPU

  .model_id = "            My funny model ID CPU",
  ______4_model_name      :_My_funny_model_ID_CPU

  .model_id = "            ",
  ______4_model_name      :__

  .model_id = "",
  ______4_model_name      :_15/02

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1432050210-32036-1-git-send-email-prarit@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-02 10:38:11 +02:00
Prarit Bhargava
adafb98da6 x86/cpu: Strip any /proc/cpuinfo model name field whitespace
When comparing the 'model name' field of each core in
/proc/cpuinfo it was noticed that there is a whitespace
difference between the cores' model names.

After some quick investigation it was noticed that the model
name fields were actually different -- processor 0's model name
field had trailing whitespace removed, while the other
processors did not.

Another way of seeing this behaviour is to convert spaces into
underscores in the output of /proc/cpuinfo,

  [thetango@prarit ~]# grep "^model name" /proc/cpuinfo | uniq -c | sed 's/\ /_/g'
  ______1_model_name      :_AMD_Opteron(TM)_Processor_6272
  _____63_model_name      :_AMD_Opteron(TM)_Processor_6272_________________

which shows the discrepancy.

This occurs because the kernel calls strim() on cpu 0's
x86_model_id field to output a pretty message to the console in
print_cpu_info(), and as a result strips the whitespace at the
end of the ->x86_model_id field.

But, the ->x86_model_id field should be the same for the all
identical CPUs in the box. Thus, we need to remove both leading
and trailing whitespace.

As a result, the print_cpu_info() output looks like

  smpboot: CPU0: AMD Opteron(TM) Processor 6272 (fam: 15, model: 01, stepping: 02)

and the x86_model_id field is correct on all processors on AMD
platforms:

  _____64_model_name      :_AMD_Opteron(TM)_Processor_6272

Output is still correct on an Intel box:

  ____144_model_name      :_Intel(R)_Xeon(R)_CPU_E7-8890_v3_@_2.50GHz

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1432050210-32036-1-git-send-email-prarit@redhat.com
Link: http://lkml.kernel.org/r/1432628901-18044-15-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:38:24 +02:00
Ingo Molnar
7cf82d33b6 x86/fpu/init: Move __setup() functions to fpu/init.c
We had a number of FPU init related boot option handlers
in arch/x86/kernel/cpu/common.c - move them over into
arch/x86/kernel/fpu/init.c to have them all in a
single place.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-20 11:35:42 +02:00
Ingo Molnar
d364a7656c x86/fpu: Fix the 'nofxsr' boot parameter to also clear X86_FEATURE_FXSR_OPT
I tried to simulate an ancient CPU via this option, and
found that it still has fxsr_opt enabled, confusing the
FPU code.

Make the 'nofxsr' option also clear FXSR_OPT flag.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:12 +02:00
Ingo Molnar
c66e3f2823 x86/fpu: Remove the extra fpu__detect() layer
Now that fpu__detect() has become an empty layer around
fpu__init_system(), eliminate it and make fpu__init_system()
the main system initialization routine.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:46 +02:00
Ingo Molnar
21c4cd108a x86/fpu: Simplify fpu__cpu_init()
After the latest round of cleanups, fpu__cpu_init() has become
a simple call to fpu__init_cpu().

Rename fpu__init_cpu() to fpu__cpu_init() and remove the
extra layer.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:44 +02:00
Ingo Molnar
78f7f1e54b x86/fpu: Rename fpu-internal.h to fpu/internal.h
This unifies all the FPU related header files under a unified, hiearchical
naming scheme:

 - asm/fpu/types.h:      FPU related data types, needed for 'struct task_struct',
                         widely included in almost all kernel code, and hence kept
                         as small as possible.

 - asm/fpu/api.h:        FPU related 'public' methods exported to other subsystems.

 - asm/fpu/internal.h:   FPU subsystem internal methods

 - asm/fpu/xsave.h:      XSAVE support internal methods

(Also standardize the header guard in asm/fpu/internal.h.)

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:31 +02:00
Ingo Molnar
b0c050c5ba x86/fpu: Move 'PER_CPU(fpu_owner_task)' to fpu/core.c
Move it closer to other per-cpu FPU data structures.

This also unifies the 32-bit and 64-bit code.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:23 +02:00
Ingo Molnar
f89e32e0a3 x86/fpu: Fix header file dependencies of fpu-internal.h
Fix a minor header file dependency bug in asm/fpu-internal.h: it
relies on i387.h but does not include it. All users of fpu-internal.h
included it explicitly.

Also remove unnecessary includes, to reduce compilation time.

This also makes it easier to use it as a standalone header file
for FPU internals, such as an upcoming C module in arch/x86/kernel/fpu/.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:16 +02:00
Ingo Molnar
3a9c4b0d7e x86/fpu: Rename fpu_init() to fpu__cpu_init()
fpu_init() is a bit of a misnomer in that it (falsely) creates the
impression that it's related to the (old) fpu_finit() function,
which initializes FPU ctx state.

Rename it to fpu__cpu_init() to make its boot time initialization
clear, and to move it to the fpu__*() namespace.

Also fix and extend its comment block to point out that it's
called not only on the boot CPU, but on secondary CPUs as well.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:14 +02:00
Ingo Molnar
1a7dc0db71 x86/fpu: Rename fpu_detect() to fpu__detect()
Use the fpu__*() namespace to organize FPU ops better.

Also document fpu__detect() a bit.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:10 +02:00
Denys Vlasenko
fed7c3f0f7 x86/entry: Remove unused 'kernel_stack' per-cpu variable
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1429889495-27850-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08 13:49:43 +02:00
Linus Torvalds
6c8a53c9e6 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf changes from Ingo Molnar:
 "Core kernel changes:

   - One of the more interesting features in this cycle is the ability
     to attach eBPF programs (user-defined, sandboxed bytecode executed
     by the kernel) to kprobes.

     This allows user-defined instrumentation on a live kernel image
     that can never crash, hang or interfere with the kernel negatively.
     (Right now it's limited to root-only, but in the future we might
     allow unprivileged use as well.)

     (Alexei Starovoitov)

   - Another non-trivial feature is per event clockid support: this
     allows, amongst other things, the selection of different clock
     sources for event timestamps traced via perf.

     This feature is sought by people who'd like to merge perf generated
     events with external events that were measured with different
     clocks:

       - cluster wide profiling

       - for system wide tracing with user-space events,

       - JIT profiling events

     etc.  Matching perf tooling support is added as well, available via
     the -k, --clockid <clockid> parameter to perf record et al.

     (Peter Zijlstra)

  Hardware enablement kernel changes:

   - x86 Intel Processor Trace (PT) support: which is a hardware tracer
     on steroids, available on Broadwell CPUs.

     The hardware trace stream is directly output into the user-space
     ring-buffer, using the 'AUX' data format extension that was added
     to the perf core to support hardware constraints such as the
     necessity to have the tracing buffer physically contiguous.

     This patch-set was developed for two years and this is the result.
     A simple way to make use of this is to use BTS tracing, the PT
     driver emulates BTS output - available via the 'intel_bts' PMU.
     More explicit PT specific tooling support is in the works as well -
     will probably be ready by 4.2.

     (Alexander Shishkin, Peter Zijlstra)

   - x86 Intel Cache QoS Monitoring (CQM) support: this is a hardware
     feature of Intel Xeon CPUs that allows the measurement and
     allocation/partitioning of caches to individual workloads.

     These kernel changes expose the measurement side as a new PMU
     driver, which exposes various QoS related PMU events.  (The
     partitioning change is work in progress and is planned to be merged
     as a cgroup extension.)

     (Matt Fleming, Peter Zijlstra; CPU feature detection by Peter P
     Waskiewicz Jr)

   - x86 Intel Haswell LBR call stack support: this is a new Haswell
     feature that allows the hardware recording of call chains, plus
     tooling support.  To activate this feature you have to enable it
     via the new 'lbr' call-graph recording option:

        perf record --call-graph lbr
        perf report

     or:

        perf top --call-graph lbr

     This hardware feature is a lot faster than stack walk or dwarf
     based unwinding, but has some limitations:

       - It reuses the current LBR facility, so LBR call stack and
         branch record can not be enabled at the same time.

       - It is only available for user-space callchains.

     (Yan, Zheng)

   - x86 Intel Broadwell CPU support and various event constraints and
     event table fixes for earlier models.

     (Andi Kleen)

   - x86 Intel HT CPUs event scheduling workarounds.  This is a complex
     CPU bug affecting the SNB,IVB,HSW families that results in counter
     value corruption.  The mitigation code is automatically enabled and
     is transparent.

     (Maria Dimakopoulou, Stephane Eranian)

  The perf tooling side had a ton of changes in this cycle as well, so
  I'm only able to list the user visible changes here, in addition to
  the tooling changes outlined above:

  User visible changes affecting all tools:

      - Improve support of compressed kernel modules (Jiri Olsa)
      - Save DSO loading errno to better report errors (Arnaldo Carvalho de Melo)
      - Bash completion for subcommands (Yunlong Song)
      - Add 'I' event modifier for perf_event_attr.exclude_idle bit (Jiri Olsa)
      - Support missing -f to override perf.data file ownership. (Yunlong Song)
      - Show the first event with an invalid filter (David Ahern, Arnaldo Carvalho de Melo)

  User visible changes in individual tools:

    'perf data':

        New tool for converting perf.data to other formats, initially
        for the CTF (Common Trace Format) from LTTng (Jiri Olsa,
        Sebastian Siewior)

    'perf diff':

        Add --kallsyms option (David Ahern)

    'perf list':

        Allow listing events with 'tracepoint' prefix (Yunlong Song)

        Sort the output of the command (Yunlong Song)

    'perf kmem':

        Respect -i option (Jiri Olsa)

        Print big numbers using thousands' group (Namhyung Kim)

        Allow -v option (Namhyung Kim)

        Fix alignment of slab result table (Namhyung Kim)

    'perf probe':

        Support multiple probes on different binaries on the same command line (Masami Hiramatsu)

        Support unnamed union/structure members data collection. (Masami Hiramatsu)

        Check kprobes blacklist when adding new events. (Masami Hiramatsu)

    'perf record':

        Teach 'perf record' about perf_event_attr.clockid (Peter Zijlstra)

        Support recording running/enabled time (Andi Kleen)

    'perf sched':

        Improve the performance of 'perf sched replay' on high CPU core count machines (Yunlong Song)

    'perf report' and 'perf top':

        Allow annotating entries in callchains in the hists browser (Arnaldo Carvalho de Melo)

        Indicate which callchain entries are annotated in the
        TUI hists browser (Arnaldo Carvalho de Melo)

        Add pid/tid filtering to 'report' and 'script' commands (David Ahern)

        Consider PERF_RECORD_ events with cpumode == 0 in 'perf top', removing one
        cause of long term memory usage buildup, i.e. not processing PERF_RECORD_EXIT
        events (Arnaldo Carvalho de Melo)

    'perf stat':

        Report unsupported events properly (Suzuki K. Poulose)

        Output running time and run/enabled ratio in CSV mode (Andi Kleen)

    'perf trace':

        Handle legacy syscalls tracepoints (David Ahern, Arnaldo Carvalho de Melo)

        Only insert blank duration bracket when tracing syscalls (Arnaldo Carvalho de Melo)

        Filter out the trace pid when no threads are specified (Arnaldo Carvalho de Melo)

        Dump stack on segfaults (Arnaldo Carvalho de Melo)

        No need to explicitely enable evsels for workload started from perf, let it
        be enabled via perf_event_attr.enable_on_exec, removing some events that take
        place in the 'perf trace' before a workload is really started by it.
        (Arnaldo Carvalho de Melo)

        Allow mixing with tracepoints and suppressing plain syscalls. (Arnaldo Carvalho de Melo)

  There's also been a ton of infrastructure work done, such as the
  split-out of perf's build system into tools/build/ and other changes -
  see the shortlog and changelog for details"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (358 commits)
  perf/x86/intel/pt: Clean up the control flow in pt_pmu_hw_init()
  perf evlist: Fix type for references to data_head/tail
  perf probe: Check the orphaned -x option
  perf probe: Support multiple probes on different binaries
  perf buildid-list: Fix segfault when show DSOs with hits
  perf tools: Fix cross-endian analysis
  perf tools: Fix error path to do closedir() when synthesizing threads
  perf tools: Fix synthesizing fork_event.ppid for non-main thread
  perf tools: Add 'I' event modifier for exclude_idle bit
  perf report: Don't call map__kmap if map is NULL.
  perf tests: Fix attr tests
  perf probe: Fix ARM 32 building error
  perf tools: Merge all perf_event_attr print functions
  perf record: Add clockid parameter
  perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10
  perf sched replay: Support using -f to override perf.data file ownership
  perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files
  perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task
  perf sched replay: Fix the segmentation fault problem caused by pr_err in threads
  perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations
  ...
2015-04-14 14:37:47 -07:00
Borislav Petkov
6b51311c97 x86/asm/entry/64: Use a define for an invalid segment selector
... instead of a naked number, for better readability.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428054130-25847-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03 15:29:13 +02:00
Borislav Petkov
7c74d5b7b7 x86/asm/entry/64: Fix MSR_IA32_SYSENTER_CS MSR value
Commit:

  d56fe4bf5f ("x86/asm/entry/64: Always set up SYSENTER MSRs")

missed to add "ULL" to the 0 and wrmsrl_safe() complains:

  arch/x86/kernel/cpu/common.c: In function ‘syscall_init’:
  arch/x86/kernel/cpu/common.c:1226:2: warning: right shift count >= width of type wrmsrl_safe(MSR_IA32_SYSENTER_CS, 0);

Fix it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428054130-25847-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03 15:29:12 +02:00
Andy Lutomirski
cf9328cc99 x86/asm/entry/32: Stop caching MSR_IA32_SYSENTER_ESP in tss.sp1
We write a stack pointer to MSR_IA32_SYSENTER_ESP exactly once,
and we unnecessarily cache the value in tss.sp1.  We never
read the cached value.

Remove all of the caching.  It serves no purpose.

Suggested-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/05a0163eb33ef5208363f0015496855da7cebadd.1428002830.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03 08:30:44 +02:00
Denys Vlasenko
487d1edb9a x86/asm/entry/64: Fix comment about SYSENTER MSRs
The comment is ancient, it dates to the time when only AMD's
x86_64 implementation existed. AMD wasn't (and still isn't)
supporting SYSENTER, so these writes were "just in case" back
then.

This has changed: Intel's x86_64 appeared, and Intel does
support SYSENTER in long mode. "Some future 64-bit CPU" is here
already.

The code may appear "buggy" for AMD as it stands, since
MSR_IA32_SYSENTER_EIP is only 32-bit for AMD CPUs. Writing a
kernel function's address to it would drop high bits. Subsequent
use of this MSR for branch via SYSENTER seem to allow user to
transition to CPL0 while executing his code. Scary, eh?

Explain why that is not a bug: because SYSENTER insn would not
work on AMD CPU.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427453956-21931-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27 12:23:16 +01:00
Ingo Molnar
936c663aed Merge branch 'perf/x86' into perf/core, because it's ready
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27 09:46:19 +01:00
Ingo Molnar
d56fe4bf5f x86/asm/entry/64: Always set up SYSENTER MSRs
On CONFIG_IA32_EMULATION=y kernels we set up
MSR_IA32_SYSENTER_CS/ESP/EIP, but on !CONFIG_IA32_EMULATION
kernels we leave them unchanged.

Clear them to make sure the instruction is disabled properly.

SYSCALL is set up properly in both cases.

Acked-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-24 20:57:25 +01:00
Denys Vlasenko
ef593260f0 x86/asm/entry: Get rid of KERNEL_STACK_OFFSET
PER_CPU_VAR(kernel_stack) was set up in a way where it points
five stack slots below the top of stack.

Presumably, it was done to avoid one "sub $5*8,%rsp"
in syscall/sysenter code paths, where iret frame needs to be
created by hand.

Ironically, none of them benefits from this optimization,
since all of them need to allocate additional data on stack
(struct pt_regs), so they still have to perform subtraction.

This patch eliminates KERNEL_STACK_OFFSET.

PER_CPU_VAR(kernel_stack) now points directly to top of stack.
pt_regs allocations are adjusted to allocate iret frame as well.
Hopefully we can merge it later with 32-bit specific
PER_CPU_VAR(cpu_current_top_of_stack) variable...

Net result in generated code is that constants in several insns
are changed.

This change is necessary for changing struct pt_regs creation
in SYSCALL64 code path from MOV to PUSH instructions.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1426785469-15125-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-24 19:42:38 +01:00
Denys Vlasenko
a76c7f4604 x86/asm/entry/64: Fold syscall32_cpu_init() into its sole user
Having syscall32/sysenter32 initialization in a separate tiny
function, called from within a function that is already syscall
init specific, serves no real purpose.

Its existense also caused an unintended effect of having
wrmsrl(MSR_CSTAR) performed twice: once we set it to a dummy
function returning -ENOSYS, and immediately after
(if CONFIG_IA32_EMULATION), we set it to point to the proper
syscall32 entry point, ia32_cstar_target.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-24 08:20:51 +01:00
Ingo Molnar
8b6c0ab1a1 x86/asm/entry: Document and clean up the enable_sep_cpu() and syscall32_cpu_init() functions
Clean up the flow and document the functions a bit better.

Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17 09:25:29 +01:00
Denys Vlasenko
d828c71fba x86/asm/entry/32: Document the 32-bit SYSENTER "emergency stack" better
Before the patch, the 'tss_struct::stack' field was not referenced anywhere.

It was used only to set SYSENTER's stack to point after the last byte
of tss_struct, thus the trailing field, stack[64], was used.

But grep would not know it. You can comment it out, compile,
and kernel will even run until an unlucky NMI corrupts
io_bitmap[] (which is also not easily detectable).

This patch changes code so that the purpose and usage of this
field is not mysterious anymore, and can be easily grepped for.

This does change generated code, for a subtle reason:
since tss_struct is ____cacheline_aligned, there happens to be
5 longs of padding at the end. Old code was using the padding
too; new code will strictly use it only for SYSENTER_stack[].

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1425912738-559-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17 09:25:29 +01:00
Andy Lutomirski
a7fcf28d43 x86/asm/entry: Replace this_cpu_sp0() with current_top_of_stack() and fix it on x86_32
I broke 32-bit kernels.  The implementation of sp0 was correct
as far as I can tell, but sp0 was much weirder on x86_32 than I
realized.  It has the following issues:

 - Init's sp0 is inconsistent with everything else's: non-init tasks
   are offset by 8 bytes.  (I have no idea why, and the comment is unhelpful.)

 - vm86 does crazy things to sp0.

Fix it up by replacing this_cpu_sp0() with
current_top_of_stack() and using a new percpu variable to track
the top of the stack on x86_32.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 75182b1632 ("x86/asm/entry: Switch all C consumers of kernel_stack to this_cpu_sp0()")
Link: http://lkml.kernel.org/r/d09dbe270883433776e0cbee3c7079433349e96d.1425692936.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-07 09:34:03 +01:00
Andy Lutomirski
24933b82c0 x86/asm/entry: Rename 'init_tss' to 'cpu_tss'
It has nothing to do with init -- there's only one TSS per cpu.

Other names considered include:

 - current_tss: Confusing because we never switch the tss.
 - singleton_tss: Too long.

This patch was generated with 's/init_tss/cpu_tss/g'.  Followup
patches will fix INIT_TSS and INIT_TSS_IST by hand.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/da29fb2a793e4f649d93ce2d1ed320ebe8516262.1425611534.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-06 08:32:58 +01:00
Steven Rostedt
5b2bdbc845 x86: Init per-cpu shadow copy of CR4 on 32-bit CPUs too
Commit:

   1e02ce4ccc ("x86: Store a per-cpu shadow copy of CR4")

added a shadow CR4 such that reads and writes that do not
modify the CR4 execute much faster than always reading the
register itself.

The change modified cpu_init() in common.c, so that the
shadow CR4 gets initialized before anything uses it.

Unfortunately, there's two cpu_init()s in common.c. There's
one for 64-bit and one for 32-bit. The commit only added
the shadow init to the 64-bit path, but the 32-bit path
needs the init too.

Link: http://lkml.kernel.org/r/20150227125208.71c36402@gandalf.local.home Fixes: 1e02ce4ccc "x86: Store a per-cpu shadow copy of CR4"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20150227145019.2bdd4354@gandalf.local.home
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-28 08:04:20 +01:00
Peter P Waskiewicz Jr
cbc82b1726 x86: Add support for Intel Cache QoS Monitoring (CQM) detection
This patch adds support for the new Cache QoS Monitoring (CQM)
feature found in future Intel Xeon processors.  It includes the
new values to track CQM resources to the cpuinfo_x86 structure,
plus the CPUID detection routines for CQM.

CQM allows a process, or set of processes, to be tracked by the CPU
to determine the cache usage of that task group.  Using this data
from the CPU, software can be written to extract this data and
report cache usage and occupancy for a particular process, or
group of processes.

More information about Cache QoS Monitoring can be found in the
Intel (R) x86 Architecture Software Developer Manual, section 17.14.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Chris Webb <chris@arachsys.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jacob Shin <jacob.w.shin@gmail.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Honeyman <stevenhoneyman@gmail.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Link: http://lkml.kernel.org/r/1422038748-21397-5-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-25 13:53:31 +01:00
Linus Torvalds
37507717de Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf updates from Ingo Molnar:
 "This series tightens up RDPMC permissions: currently even highly
  sandboxed x86 execution environments (such as seccomp) have permission
  to execute RDPMC, which may leak various perf events / PMU state such
  as timing information and other CPU execution details.

  This 'all is allowed' RDPMC mode is still preserved as the
  (non-default) /sys/devices/cpu/rdpmc=2 setting.  The new default is
  that RDPMC access is only allowed if a perf event is mmap-ed (which is
  needed to correctly interpret RDPMC counter values in any case).

  As a side effect of these changes CR4 handling is cleaned up in the
  x86 code and a shadow copy of the CR4 value is added.

  The extra CR4 manipulation adds ~ <50ns to the context switch cost
  between rdpmc-capable and rdpmc-non-capable mms"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks
  perf/x86: Only allow rdpmc if a perf_event is mapped
  perf: Pass the event to arch_perf_update_userpage()
  perf: Add pmu callbacks to track event mapping and unmapping
  x86: Add a comment clarifying LDT context switching
  x86: Store a per-cpu shadow copy of CR4
  x86: Clean up cr4 manipulation
2015-02-16 14:58:12 -08:00
Linus Torvalds
80f33a5fdf Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
 "Misc cleanups"

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/rtc: Remove duplicate const specifier
  x86, early_serial_console: Remove unnecessary check
  x86, early_serial_console: Remove unused macro XMTRDY
  x86, setup: Rename BOOT_ISDIGIT_H to BOOT_CTYPE_H
  x86, CPU: Fix trivial printk formatting issues with dmesg
2015-02-09 17:50:09 -08:00
Andy Lutomirski
1e02ce4ccc x86: Store a per-cpu shadow copy of CR4
Context switches and TLB flushes can change individual bits of CR4.
CR4 reads take several cycles, so store a shadow copy of CR4 in a
per-cpu variable.

To avoid wasting a cache line, I added the CR4 shadow to
cpu_tlbstate, which is already touched in switch_mm.  The heaviest
users of the cr4 shadow will be switch_mm and __switch_to_xtra, and
__switch_to_xtra is called shortly after switch_mm during context
switch, so the cacheline is likely to be hot.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Vince Weaver <vince@deater.net>
Cc: "hillf.zj" <hillf.zj@alibaba-inc.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/3a54dd3353fffbf84804398e00dfdc5b7c1afd7d.1414190806.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04 12:10:42 +01:00
Andy Lutomirski
375074cc73 x86: Clean up cr4 manipulation
CR4 manipulation was split, seemingly at random, between direct
(write_cr4) and using a helper (set/clear_in_cr4).  Unfortunately,
the set_in_cr4 and clear_in_cr4 helpers also poke at the boot code,
which only a small subset of users actually wanted.

This patch replaces all cr4 access in functions that don't leave cr4
exactly the way they found it with new helpers cr4_set_bits,
cr4_clear_bits, and cr4_set_bits_and_update_boot.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Vince Weaver <vince@deater.net>
Cc: "hillf.zj" <hillf.zj@alibaba-inc.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/495a10bdc9e67016b8fd3945700d46cfd5c12c2f.1414190806.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04 12:10:41 +01:00
Thomas Gleixner
659006bf3a x86/x2apic: Split enable and setup function
enable_x2apic() is a convoluted unreadable mess because it is used for
both enablement in early boot and for setup in cpu_init().

Split the code into x2apic_enable() for enablement and x2apic_setup()
for setup of (secondary cpus). Make use of the new state tracking to
simplify the logic.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20150115211703.129287153@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-22 15:10:55 +01:00
Steven Honeyman
f94fe119f2 x86, CPU: Fix trivial printk formatting issues with dmesg
dmesg (from util-linux) currently has two methods for reading the kernel
message ring buffer: /dev/kmsg and syslog(2). Since kernel 3.5.0 kmsg
has been the default, which escapes control characters (e.g. new lines)
before they are shown.

This change means that when dmesg is using /dev/kmsg, a 2 line printk
makes the output messy, because the second line does not get a
timestamp.

For example:

[    0.012863] CPU0: Thermal monitoring enabled (TM1)
[    0.012869] Last level iTLB entries: 4KB 1024, 2MB 1024, 4MB 1024
Last level dTLB entries: 4KB 1024, 2MB 1024, 4MB 1024, 1GB 4
[    0.012958] Freeing SMP alternatives memory: 28K (ffffffff81d86000 - ffffffff81d8d000)
[    0.014961] dmar: Host address width 39

Because printk.c intentionally escapes control characters, they should
not be there in the first place. This patch fixes two occurrences of
this.

Signed-off-by: Steven Honeyman <stevenhoneyman@gmail.com>
Link: https://lkml.kernel.org/r/1414856696-8094-1-git-send-email-stevenhoneyman@gmail.com
[ Boris: make cpu_detect_tlb() static, while at it. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-01-11 01:54:54 +01:00
Linus Torvalds
3100e448e7 Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 vdso updates from Ingo Molnar:
 "Various vDSO updates from Andy Lutomirski, mostly cleanups and
  reorganization to improve maintainability, but also some
  micro-optimizations and robustization changes"

* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86_64/vsyscall: Restore orig_ax after vsyscall seccomp
  x86_64: Add a comment explaining the TASK_SIZE_MAX guard page
  x86_64,vsyscall: Make vsyscall emulation configurable
  x86_64, vsyscall: Rewrite comment and clean up headers in vsyscall code
  x86_64, vsyscall: Turn vsyscalls all the way off when vsyscall==none
  x86,vdso: Use LSL unconditionally for vgetcpu
  x86: vdso: Fix build with older gcc
  x86_64/vdso: Clean up vgetcpu init and merge the vdso initcalls
  x86_64/vdso: Remove jiffies from the vvar page
  x86/vdso: Make the PER_CPU segment 32 bits
  x86/vdso: Make the PER_CPU segment start out accessed
  x86/vdso: Change the PER_CPU segment to use struct desc_struct
  x86_64/vdso: Move getcpu code from vsyscall_64.c to vdso/vma.c
  x86_64/vsyscall: Move all of the gate_area code to vsyscall_64.c
2014-12-10 14:24:20 -08:00
Dave Hansen
2cd3949f70 x86: Require exact match for 'noxsave' command line option
We have some very similarly named command-line options:

arch/x86/kernel/cpu/common.c:__setup("noxsave", x86_xsave_setup);
arch/x86/kernel/cpu/common.c:__setup("noxsaveopt", x86_xsaveopt_setup);
arch/x86/kernel/cpu/common.c:__setup("noxsaves", x86_xsaves_setup);

__setup() is designed to match options that take arguments, like
"foo=bar" where you would have:

	__setup("foo", x86_foo_func...);

The problem is that "noxsave" actually _matches_ "noxsaves" in
the same way that "foo" matches "foo=bar".  If you boot an old
kernel that does not know about "noxsaves" with "noxsaves" on the
command line, it will interpret the argument as "noxsave", which
is not what you want at all.

This makes the "noxsave" handler only return success when it finds
an *exact* match.

[ tglx: We really need to make __setup() more robust. ]

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: x86@kernel.org
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20141111220133.FE053984@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 12:13:16 +01:00
Andy Lutomirski
e76b027e64 x86,vdso: Use LSL unconditionally for vgetcpu
LSL is faster than RDTSCP and works everywhere; there's no need to
switch between them depending on CPU.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Andi Kleen <andi@firstfloor.org>
Link: http://lkml.kernel.org/r/72f73d5ec4514e02bba345b9759177ef03742efb.1414706021.git.luto@amacapital.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-03 13:41:53 +01:00
Linus Torvalds
0429fbc0bd Merge branch 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu consistent-ops changes from Tejun Heo:
 "Way back, before the current percpu allocator was implemented, static
  and dynamic percpu memory areas were allocated and handled separately
  and had their own accessors.  The distinction has been gone for many
  years now; however, the now duplicate two sets of accessors remained
  with the pointer based ones - this_cpu_*() - evolving various other
  operations over time.  During the process, we also accumulated other
  inconsistent operations.

  This pull request contains Christoph's patches to clean up the
  duplicate accessor situation.  __get_cpu_var() uses are replaced with
  with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr().

  Unfortunately, the former sometimes is tricky thanks to C being a bit
  messy with the distinction between lvalues and pointers, which led to
  a rather ugly solution for cpumask_var_t involving the introduction of
  this_cpu_cpumask_var_ptr().

  This converts most of the uses but not all.  Christoph will follow up
  with the remaining conversions in this merge window and hopefully
  remove the obsolete accessors"

* 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits)
  irqchip: Properly fetch the per cpu offset
  percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix
  ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write.
  percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t
  Revert "powerpc: Replace __get_cpu_var uses"
  percpu: Remove __this_cpu_ptr
  clocksource: Replace __this_cpu_ptr with raw_cpu_ptr
  sparc: Replace __get_cpu_var uses
  avr32: Replace __get_cpu_var with __this_cpu_write
  blackfin: Replace __get_cpu_var uses
  tile: Use this_cpu_ptr() for hardware counters
  tile: Replace __get_cpu_var uses
  powerpc: Replace __get_cpu_var uses
  alpha: Replace __get_cpu_var
  ia64: Replace __get_cpu_var uses
  s390: cio driver &__get_cpu_var replacements
  s390: Replace __get_cpu_var uses
  mips: Replace __get_cpu_var uses
  MIPS: Replace __get_cpu_var uses in FPU emulator.
  arm: Replace __this_cpu_ptr with raw_cpu_ptr
  ...
2014-10-15 07:48:18 +02:00
Linus Torvalds
dfe2c6dcc8 Merge branch 'akpm' (patches from Andrew Morton)
Merge second patch-bomb from Andrew Morton:
 - a few hotfixes
 - drivers/dma updates
 - MAINTAINERS updates
 - Quite a lot of lib/ updates
 - checkpatch updates
 - binfmt updates
 - autofs4
 - drivers/rtc/
 - various small tweaks to less used filesystems
 - ipc/ updates
 - kernel/watchdog.c changes

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (135 commits)
  mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
  kernel/param: consolidate __{start,stop}___param[] in <linux/moduleparam.h>
  ia64: remove duplicate declarations of __per_cpu_start[] and __per_cpu_end[]
  frv: remove unused declarations of __start___ex_table and __stop___ex_table
  kvm: ensure hard lockup detection is disabled by default
  kernel/watchdog.c: control hard lockup detection default
  staging: rtl8192u: use %*pEn to escape buffer
  staging: rtl8192e: use %*pEn to escape buffer
  staging: wlan-ng: use %*pEhp to print SN
  lib80211: remove unused print_ssid()
  wireless: hostap: proc: print properly escaped SSID
  wireless: ipw2x00: print SSID via %*pE
  wireless: libertas: print esaped string via %*pE
  lib/vsprintf: add %*pE[achnops] format specifier
  lib / string_helpers: introduce string_escape_mem()
  lib / string_helpers: refactoring the test suite
  lib / string_helpers: move documentation to c-file
  include/linux: remove strict_strto* definitions
  arch/x86/mm/numa.c: fix boot failure when all nodes are hotpluggable
  fs: check bh blocknr earlier when searching lru
  ...
2014-10-14 03:54:50 +02:00
Linus Torvalds
2fd7476de9 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc smaller fixes that missed the v3.17 cycle"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/build: Add arch/x86/purgatory/ make generated files to gitignore
  x86: Fix section conflict for numachip
  x86: Reject x32 executables if x32 ABI not supported
  x86_64, entry: Filter RFLAGS.NT on entry from userspace
  x86, boot, kaslr: Fix nuisance warning on 32-bit builds
2014-10-14 02:28:16 +02:00
Linus Torvalds
708d0b41a2 Merge branch 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpufeature updates from Ingo Molnar:
 "This tree includes the following changes:

   - Introduce DISABLED_MASK to list disabled CPU features, to simplify
     CPU feature handling and avoid excessive #ifdefs

   - Remove the lightly used cpu_has_pae() primitive"

* 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Add more disabled features
  x86: Introduce disabled-features
  x86: Axe the lightly-used cpu_has_pae
2014-10-14 02:19:47 +02:00
Andrew Morton
e48510f451 arch/x86/kernel/cpu/common.c: fix unused symbol warning
x86_64 allnoconfig:

arch/x86/kernel/cpu/common.c:968: warning: 'syscall32_cpu_init' defined but not used

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-14 02:18:23 +02:00
Linus Torvalds
19e00d593e Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 bootup updates from Ingo Molnar:
 "The changes in this cycle were:

   - Fix rare SMP-boot hang (mostly in virtual environments)

   - Fix build warning with certain (rare) toolchains"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/relocs: Make per_cpu_load_addr static
  x86/smpboot: Initialize secondary CPU only if master CPU will wait for it
2014-10-13 18:16:32 +02:00
Andy Lutomirski
8c7aa698ba x86_64, entry: Filter RFLAGS.NT on entry from userspace
The NT flag doesn't do anything in long mode other than causing IRET
to #GP.  Oddly, CPL3 code can still set NT using popf.

Entry via hardware or software interrupt clears NT automatically, so
the only relevant entries are fast syscalls.

If user code causes kernel code to run with NT set, then there's at
least some (small) chance that it could cause trouble.  For example,
user code could cause a call to EFI code with NT set, and who knows
what would happen?  Apparently some games on Wine sometimes do
this (!), and, if an IRET return happens, they will segfault.  That
segfault cannot be handled, because signal delivery fails, too.

This patch programs the CPU to clear NT on entry via SYSCALL (both
32-bit and 64-bit, by my reading of the AMD APM), and it clears NT
in software on entry via SYSENTER.

To save a few cycles, this borrows a trick from Jan Beulich in Xen:
it checks whether NT is set before trying to clear it.  As a result,
it seems to have very little effect on SYSENTER performance on my
machine.

There's another minor bug fix in here: it looks like the CFI
annotations were wrong if CONFIG_AUDITSYSCALL=n.

Testers beware: on Xen, SYSENTER with NT set turns into a GPF.

I haven't touched anything on 32-bit kernels.

The syscall mask change comes from a variant of this patch by Anish
Bhatt.

Note to stable maintainers: there is no known security issue here.
A misguided program can set NT and cause the kernel to try and fail
to deliver SIGSEGV, crashing the program.  This patch fixes Far Cry
on Wine: https://bugs.winehq.org/show_bug.cgi?id=33275

Cc: <stable@vger.kernel.org>
Reported-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/395749a5d39a29bd3e4b35899cf3a3c1340e5595.1412189265.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-10-06 10:53:26 -07:00
Igor Mammedov
ce4b1b1650 x86/smpboot: Initialize secondary CPU only if master CPU will wait for it
Hang is observed on virtual machines during CPU hotplug,
especially in big guests with many CPUs. (It reproducible
more often if host is over-committed).

It happens because master CPU gives up waiting on
secondary CPU and allows it to run wild. As result
AP causes locking or crashing system. For example
as described here:

  https://lkml.org/lkml/2014/3/6/257

If master CPU have sent STARTUP IPI successfully,
and AP signalled to master CPU that it's ready
to start initialization, make master CPU wait
indefinitely till AP is onlined.

To ensure that AP won't ever run wild, make it
wait at early startup till master CPU confirms its
intention to wait for AP. If AP doesn't respond in 10
seconds, the master CPU will timeout and cancel
AP onlining.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1403266991-12233-1-git-send-email-imammedo@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-16 11:11:32 +02:00
Dave Hansen
9298b815ef x86: Add more disabled features
The original motivation for these patches was for an Intel CPU
feature called MPX.  The patch to add a disabled feature for it
will go in with the other parts of the support.

But, in the meantime, there are a few other features than MPX
that we can make assumptions about at compile-time based on
compile options.  Add them to disabled-features.h and check them
with cpu_feature_enabled().

Note that this gets rid of the last things that needed an #ifdef
CONFIG_X86_64 in cpufeature.h.  Yay!

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/r/20140911211524.C0EC332A@viggo.jf.intel.com
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-09-11 14:30:17 -07:00
Christoph Lameter
89cbc76768 x86: Replace __get_cpu_var uses
__get_cpu_var() is used for multiple purposes in the kernel source. One of
them is address calculation via the form &__get_cpu_var(x).  This calculates
the address for the instance of the percpu variable of the current processor
based on an offset.

Other use cases are for storing and retrieving data from the current
processors percpu area.  __get_cpu_var() can be used as an lvalue when
writing data or on the right side of an assignment.

__get_cpu_var() is defined as :

#define __get_cpu_var(var) (*this_cpu_ptr(&(var)))

__get_cpu_var() always only does an address determination. However, store
and retrieve operations could use a segment prefix (or global register on
other platforms) to avoid the address calculation.

this_cpu_write() and this_cpu_read() can directly take an offset into a
percpu area and use optimized assembly code to read and write per cpu
variables.

This patch converts __get_cpu_var into either an explicit address
calculation using this_cpu_ptr() or into a use of this_cpu operations that
use the offset.  Thereby address calculations are avoided and less registers
are used when code is generated.

Transformations done to __get_cpu_var()

1. Determine the address of the percpu instance of the current processor.

	DEFINE_PER_CPU(int, y);
	int *x = &__get_cpu_var(y);

    Converts to

	int *x = this_cpu_ptr(&y);

2. Same as #1 but this time an array structure is involved.

	DEFINE_PER_CPU(int, y[20]);
	int *x = __get_cpu_var(y);

    Converts to

	int *x = this_cpu_ptr(y);

3. Retrieve the content of the current processors instance of a per cpu
variable.

	DEFINE_PER_CPU(int, y);
	int x = __get_cpu_var(y)

   Converts to

	int x = __this_cpu_read(y);

4. Retrieve the content of a percpu struct

	DEFINE_PER_CPU(struct mystruct, y);
	struct mystruct x = __get_cpu_var(y);

   Converts to

	memcpy(&x, this_cpu_ptr(&y), sizeof(x));

5. Assignment to a per cpu variable

	DEFINE_PER_CPU(int, y)
	__get_cpu_var(y) = x;

   Converts to

	__this_cpu_write(y, x);

6. Increment/Decrement etc of a per cpu variable

	DEFINE_PER_CPU(int, y);
	__get_cpu_var(y)++

   Converts to

	__this_cpu_inc(y)

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-08-26 13:45:49 -04:00
Josh Triplett
9def39be4e x86: Support compiling out human-friendly processor feature names
The table mapping CPUID bits to human-readable strings takes up a
non-trivial amount of space, and only exists to support /proc/cpuinfo
and a couple of kernel messages.  Since programs depend on the format of
/proc/cpuinfo, force inclusion of the table when building with /proc
support; otherwise, support omitting that table to save space, in which
case the kernel messages will print features numerically instead.

In addition to saving 1408 bytes out of vmlinux, this also saves 1373
bytes out of the uncompressed setup code, which contributes directly to
the size of bzImage.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
2014-08-17 15:54:00 -07:00
Linus Torvalds
7453f33b2e Merge branch 'x86-xsave-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/xsave changes from Peter Anvin:
 "This is a patchset to support the XSAVES instruction required to
  support context switch of supervisor-only features in upcoming
  silicon.

  This patchset missed the 3.16 merge window, which is why it is based
  on 3.15-rc7"

* 'x86-xsave-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, xsave: Add forgotten inline annotation
  x86/xsaves: Clean up code in xstate offsets computation in xsave area
  x86/xsave: Make it clear that the XSAVE macros use (%edi)/(%rdi)
  Define kernel API to get address of each state in xsave area
  x86/xsaves: Enable xsaves/xrstors
  x86/xsaves: Call booting time xsaves and xrstors in setup_init_fpu_buf
  x86/xsaves: Save xstate to task's xsave area in __save_fpu during booting time
  x86/xsaves: Add xsaves and xrstors support for booting time
  x86/xsaves: Clear reserved bits in xsave header
  x86/xsaves: Use xsave/xrstor for saving and restoring user space context
  x86/xsaves: Use xsaves/xrstors for context switch
  x86/xsaves: Use xsaves/xrstors to save and restore xsave area
  x86/xsaves: Define a macro for handling xsave/xrstor instruction fault
  x86/xsaves: Define macros for xsave instructions
  x86/xsaves: Change compacted format xsave area header
  x86/alternative: Add alternative_input_2 to support alternative with two features and input
  x86/xsaves: Add a kernel parameter noxsaves to disable xsaves/xrstors
2014-08-13 18:20:04 -06:00
Linus Torvalds
ce47479632 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm changes from Ingo Molnar:
 "The main change in this cycle is the rework of the TLB range flushing
  code, to simplify, fix and consolidate the code.  By Dave Hansen"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Set TLB flush tunable to sane value (33)
  x86/mm: New tunable for single vs full TLB flush
  x86/mm: Add tracepoints for TLB flushes
  x86/mm: Unify remote INVLPG code
  x86/mm: Fix missed global TLB flush stat
  x86/mm: Rip out complicated, out-of-date, buggy TLB flushing
  x86/mm: Clean up the TLB flushing code
  x86/smep: Be more informative when signalling an SMEP fault
2014-08-04 17:15:45 -07:00
Dave Hansen
e9f4e0a9fe x86/mm: Rip out complicated, out-of-date, buggy TLB flushing
I think the flush_tlb_mm_range() code that tries to tune the
flush sizes based on the CPU needs to get ripped out for
several reasons:

1. It is obviously buggy.  It uses mm->total_vm to judge the
   task's footprint in the TLB.  It should certainly be using
   some measure of RSS, *NOT* ->total_vm since only resident
   memory can populate the TLB.
2. Haswell, and several other CPUs are missing from the
   intel_tlb_flushall_shift_set() function.  Thus, it has been
   demonstrated to bitrot quickly in practice.
3. It is plain wrong in my vm:
	[    0.037444] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
	[    0.037444] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0
	[    0.037444] tlb_flushall_shift: 6
   Which leads to it to never use invlpg.
4. The assumptions about TLB refill costs are wrong:
	http://lkml.kernel.org/r/1337782555-8088-3-git-send-email-alex.shi@intel.com
    (more on this in later patches)
5. I can not reproduce the original data: https://lkml.org/lkml/2012/5/17/59
   I believe the sample times were too short.  Running the
   benchmark in a loop yields times that vary quite a bit.

Note that this leaves us with a static ceiling of 1 page.  This
is a conservative, dumb setting, and will be revised in a later
patch.

This also removes the code which attempts to predict whether we
are flushing data or instructions.  We expect instruction flushes
to be relatively rare and not worth tuning for explicitly.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/r/20140731154055.ABC88E89@viggo.jf.intel.com
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-07-31 08:48:50 -07:00
H. Peter Anvin
03ab3da3b2 Linux 3.16-rc1
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJTnmhkAAoJEHm+PkMAQRiGYyoIAJ+dCxQjx0Jmu5VLs48yvUkx
 AVbnJeEq0ScgFPBAlfrGnnXczOyZGD+wveNRwb3bN4f6hGkWHPncfExAeTU35QQ+
 92kCLScPU6kn4tnTqjac4ZrUhBCfgg5hFFamXOPidVDG4f5wYwql3IvP4O3eMOcZ
 IOx7kgweqoPm4A/LjXQ3L21kghs88NXySWMwwfWFdXaV++ey07slCWLzGF5rMDh/
 xfBDfgTS4gdrbGeE+1z/qkoWyHwKnCad8Uh2Fu5CcprElwCjLLhrLPccYSRKO2IR
 2ZSj/mcMb1FhH7AOyXBYMVbjhOH5MCIlHvJYYp7kwHfs66UTmnkczOJxq1ynKP0=
 =Whty
 -----END PGP SIGNATURE-----

Merge tag 'v3.16-rc1' into x86/cpufeature

Linux 3.16-rc1

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-06-18 15:26:19 -07:00
Linus Torvalds
3737a12761 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more perf updates from Ingo Molnar:
 "A second round of perf updates:

   - wide reaching kprobes sanitization and robustization, with the hope
     of fixing all 'probe this function crashes the kernel' bugs, by
     Masami Hiramatsu.

   - uprobes updates from Oleg Nesterov: tmpfs support, corner case
     fixes and robustization work.

   - perf tooling updates and fixes from Jiri Olsa, Namhyung Ki, Arnaldo
     et al:
        * Add support to accumulate hist periods (Namhyung Kim)
        * various fixes, refactorings and enhancements"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (101 commits)
  perf: Differentiate exec() and non-exec() comm events
  perf: Fix perf_event_comm() vs. exec() assumption
  uprobes/x86: Rename arch_uprobe->def to ->defparam, minor comment updates
  perf/documentation: Add description for conditional branch filter
  perf/x86: Add conditional branch filtering support
  perf/tool: Add conditional branch filter 'cond' to perf record
  perf: Add new conditional branch filter 'PERF_SAMPLE_BRANCH_COND'
  uprobes: Teach copy_insn() to support tmpfs
  uprobes: Shift ->readpage check from __copy_insn() to uprobe_register()
  perf/x86: Use common PMU interrupt disabled code
  perf/ARM: Use common PMU interrupt disabled code
  perf: Disable sampled events if no PMU interrupt
  perf: Fix use after free in perf_remove_from_context()
  perf tools: Fix 'make help' message error
  perf record: Fix poll return value propagation
  perf tools: Move elide bool into perf_hpp_fmt struct
  perf tools: Remove elide setup for SORT_MODE__MEMORY mode
  perf tools: Fix "==" into "=" in ui_browser__warning assignment
  perf tools: Allow overriding sysfs and proc finding with env var
  perf tools: Consider header files outside perf directory in tags target
  ...
2014-06-12 19:18:49 -07:00
Fenghua Yu
b6f42a4a3c x86/xsaves: Add a kernel parameter noxsaves to disable xsaves/xrstors
This patch adds a kernel parameter noxsaves to disable xsaves/xrstors feature.
The kernel will fall back to use xsaveopt and xrstor to save and restor
xstates. By using this parameter, xsave area occupies more memory because
standard form of xsave area in xsaveopt/xrstor occupies more memory than
compacted form of xsave area.

This patch adds a description of the kernel parameter noxsaveopt in doc.
The code to support the parameter noxsaveopt has been in the kernel before.
This patch just adds the description of this parameter in the doc.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1401387164-43416-4-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-29 14:24:52 -07:00
Fenghua Yu
6229ad278c x86/xsaves: Detect xsaves/xrstors feature
Detect the xsaveopt, xsavec, xgetbv, and xsaves features in processor extended
state enumberation sub-leaf (eax=0x0d, ecx=1):
Bit 00: XSAVEOPT is available
Bit 01: Supports XSAVEC and the compacted form of XRSTOR if set
Bit 02: Supports XGETBV with ECX = 1 if set
Bit 03: Supports XSAVES/XRSTORS and IA32_XSS if set

The above features are defined in the new word 10 in cpu features.

The IA32_XSS MSR (index DA0H) contains a state-component bitmap that specifies
the state components that software has enabled xsaves and xrstors to manage.
If the bit corresponding to a state component is clear in XCR0 | IA32_XSS,
xsaves and xrstors will not operate on that state component, regardless of
the value of the instruction mask.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1401387164-43416-3-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-29 14:24:28 -07:00
Andy Lutomirski
f40c330091 x86, vdso: Move the vvar and hpet mappings next to the 64-bit vDSO
This makes the 64-bit and x32 vdsos use the same mechanism as the
32-bit vdso.  Most of the churn is deleting all the old fixmap code.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/8af87023f57f6bb96ec8d17fce3f88018195b49b.1399317206.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05 13:19:01 -07:00
Andy Lutomirski
cfda7bb9ec x86, vdso: Move syscall and sysenter setup into kernel/cpu/common.c
This code is used during CPU setup, and it isn't strictly speaking
related to the 32-bit vdso.  It's easier to understand how this
works when the code is closer to its callers.

This also lets syscall32_cpu_init be static, which might save some
trivial amount of kernel text.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/4e466987204e232d7b55a53ff6b9739f12237461.1399317206.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05 13:18:47 -07:00
Masami Hiramatsu
0f46efeb44 kprobes, x86: Prohibit probing on debug_stack_*()
Prohibit probing on debug_stack_reset and debug_stack_set_zero.
Since the both functions are called from TRACE_IRQS_ON/OFF_DEBUG
macros which run in int3 ist entry, probing it may cause a soft
lockup.

This happens when the kernel built with CONFIG_DYNAMIC_FTRACE=y
and CONFIG_TRACE_IRQFLAGS=y.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Link: http://lkml.kernel.org/r/20140417081712.26341.32994.stgit@ltc230.yrl.intra.hitachi.co.jp
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-24 10:02:57 +02:00
Linus Torvalds
99f7b025bf Merge branch 'x86-threadinfo-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 threadinfo changes from Ingo Molnar:
 "The main change here is the consolidation/unification of 32 and 64 bit
  thread_info handling methods, from Steve Rostedt"

* 'x86-threadinfo-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, threadinfo: Redo "x86: Use inline assembler to get sp"
  x86: Clean up dumpstack_64.c code
  x86: Keep thread_info on thread stack in x86_32
  x86: Prepare removal of previous_esp from i386 thread_info structure
  x86: Nuke GET_THREAD_INFO_WITH_ESP() macro for i386
  x86: Nuke the supervisor_stack field in i386 thread_info
2014-04-01 10:17:18 -07:00
Steven Rostedt
198d208df4 x86: Keep thread_info on thread stack in x86_32
x86_64 uses a per_cpu variable kernel_stack to always point to
the thread stack of current. This is where the thread_info is stored
and is accessed from this location even when the irq or exception stack
is in use. This removes the complexity of having to maintain the
thread info on the stack when interrupts are running and having to
copy the preempt_count and other fields to the interrupt stack.

x86_32 uses the old method of copying the thread_info from the thread
stack to the exception stack just before executing the exception.

Having the two different requires #ifdefs and also the x86_32 way
is a bit of a pain to maintain. By converting x86_32 to the same
method of x86_64, we can remove #ifdefs, clean up the x86_32 code
a little, and remove the overhead of the copy.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20110806012354.263834829@goodmis.org
Link: http://lkml.kernel.org/r/20140206144321.852942014@goodmis.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-06 16:56:55 -08:00
H. Peter Anvin
da4aaa7d86 x86, cpufeature: If we disable CLFLUSH, we should disable CLFLUSHOPT
If we explicitly disable the use of CLFLUSH, we should disable the use
of CLFLUSHOPT as well.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-jtdv7btppr4jgzxm3sxx1e74@git.kernel.org
2014-02-27 08:36:31 -08:00
H. Peter Anvin
840d2830e6 x86, cpufeature: Rename X86_FEATURE_CLFLSH to X86_FEATURE_CLFLUSH
We call this "clflush" in /proc/cpuinfo, and have
cpu_has_clflush()... let's be consistent and just call it that.

Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Alan Cox <alan@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-mlytfzjkvuf739okyn40p8a5@git.kernel.org
2014-02-27 08:31:30 -08:00
H. Peter Anvin
03bbd596ac x86, smap: Don't enable SMAP if CONFIG_X86_SMAP is disabled
If SMAP support is not compiled into the kernel, don't enable SMAP in
CR4 -- in fact, we should clear it, because the kernel doesn't contain
the proper STAC/CLAC instructions for SMAP support.

Found by Fengguang Wu's test system.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Link: http://lkml.kernel.org/r/20140213124550.GA30497@localhost
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> # v3.7+
2014-02-13 07:50:25 -08:00
Kirill A. Shutemov
dd360393f4 x86, cpu: Detect more TLB configuration
The Intel Software Developer’s Manual covers few more TLB
configurations exposed as CPUID 2 descriptors:

61H Instruction TLB: 4 KByte pages, fully associative, 48 entries
63H Data TLB: 1 GByte pages, 4-way set associative, 4 entries
76H Instruction TLB: 2M/4M pages, fully associative, 8 entries
B5H Instruction TLB: 4KByte pages, 8-way set associative, 64 entries
B6H Instruction TLB: 4KByte pages, 8-way set associative, 128 entries
C1H Shared 2nd-Level TLB: 4 KByte/2MByte pages, 8-way associative, 1024 entries
C2H DTLB DTLB: 2 MByte/$MByte pages, 4-way associative, 16 entries

Let's detect them as well.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: http://lkml.kernel.org/r/1387801018-14499-1-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-01-03 14:35:42 -08:00
Linus Torvalds
6df1e7f2e9 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu changes from Ingo Molnar:
 "The biggest change that stands out is the increase of the
  CONFIG_NR_CPUS range from 4096 to 8192 - as real hardware out there
  already went beyond 4k CPUs ...

  We only allow more than 512 CPUs if offstack cpumasks are enabled.

  CONFIG_MAXSMP=y remains to be the 'you are nuts!' extreme testcase,
  which now means a max of 8192 CPUs"

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Increase max CPU count to 8192
  x86/cpu: Allow higher NR_CPUS values
  x86/cpu: Always print SMP information in /proc/cpuinfo
  x86/cpu: Track legacy CPU model data only on 32-bit kernels
2013-11-12 10:46:43 +09:00
Jan Beulich
09dc68d958 x86/cpu: Track legacy CPU model data only on 32-bit kernels
struct cpu_dev's c_models is only ever set inside CONFIG_X86_32
conditionals (or code that's being built for 32-bit only), so
there's no use of reserving the (empty) space for the model
names in a 64-bit kernel.

Similarly, c_size_cache is only used in the #else of a
CONFIG_X86_64 conditional, so reserving space for (and in one
case even initializing) that field is pointless for 64-bit
kernels too.

While moving both fields to the end of the structure, I also
noticed that:

 - the c_models array size was one too small, potentially causing
   table_lookup_model() to return garbage on Intel CPUs (intel.c's
   instance was lacking the sentinel with family being zero), so the
   patch bumps that by one,

 - c_models' vendor sub-field was unused (and anyway redundant
   with the base structure's c_x86_vendor field), so the patch deletes it.

Also rename the legacy fields so that their legacy nature stands out
and comment their declarations.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/5265036802000078000FC4DB@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-26 13:34:39 +02:00