Files
linux/arch/mips/include/asm/page.h
David Hildenbrand 8e38607aa4 treewide: provide a generic clear_user_page() variant
Patch series "mm: folio_zero_user: clear page ranges", v11.

This series adds clearing of contiguous page ranges for hugepages.

The series improves on the current discontiguous clearing approach in two
ways:

  - clear pages in a contiguous fashion.
  - use batched clearing via clear_pages() wherever exposed.

The first is useful because it allows us to make much better use of
hardware prefetchers.

The second, enables advertising the real extent to the processor.  Where
specific instructions support it (ex.  string instructions on x86; "mops"
on arm64 etc), a processor can optimize based on this because, instead of
seeing a sequence of 8-byte stores, or a sequence of 4KB pages, it sees a
larger unit being operated on.

For instance, AMD Zen uarchs (for extents larger than LLC-size) switch to
a mode where they start eliding cacheline allocation.  This is helpful not
just because it results in higher bandwidth, but also because now the
cache is not evicting useful cachelines and replacing them with zeroes.

Demand faulting a 64GB region shows performance improvement:

 $ perf bench mem mmap -p $pg-sz -f demand -s 64GB -l 5

                       baseline              +series
                   (GBps +- %stdev)      (GBps +- %stdev)

   pg-sz=2MB       11.76 +- 1.10%        25.34 +- 1.18% [*]   +115.47%  	preempt=*

   pg-sz=1GB       24.85 +- 2.41%        39.22 +- 2.32%       + 57.82%  	preempt=none|voluntary
   pg-sz=1GB         (similar)           52.73 +- 0.20% [#]   +112.19%  	preempt=full|lazy

 [*] This improvement is because switching to sequential clearing
  allows the hardware prefetchers to do a much better job.

 [#] For pg-sz=1GB a large part of the improvement is because of the
  cacheline elision mentioned above. preempt=full|lazy improves upon
  that because, not needing explicit invocations of cond_resched() to
  ensure reasonable preemption latency, it can clear the full extent
  as a single unit. In comparison the maximum extent used for
  preempt=none|voluntary is PROCESS_PAGES_NON_PREEMPT_BATCH (32MB).

  When provided the full extent the processor forgoes allocating
  cachelines on this path almost entirely.

  (The hope is that eventually, in the fullness of time, the lazy
   preemption model will be able to do the same job that none or
   voluntary models are used for, allowing us to do away with
   cond_resched().)

Raghavendra also tested previous version of the series on AMD Genoa and
sees similar improvement [1] with preempt=lazy.

  $ perf bench mem map -p $page-size -f populate -s 64GB -l 10

                    base               patched              change
   pg-sz=2MB       12.731939 GB/sec    26.304263 GB/sec     106.6%
   pg-sz=1GB       26.232423 GB/sec    61.174836 GB/sec     133.2%


This patch (of 8):

Let's drop all variants that effectively map to clear_page() and provide
it in a generic variant instead.

We'll use the macro clear_user_page to indicate whether an architecture
provides it's own variant.

Also, clear_user_page() is only called from the generic variant of
clear_user_highpage(), so define it only if the architecture does not
provide a clear_user_highpage().  And, for simplicity define it in
linux/highmem.h.

Note that for parisc, clear_page() and clear_user_page() map to
clear_page_asm(), so we can just get rid of the custom clear_user_page()
implementation.  There is a clear_user_page_asm() function on parisc, that
seems to be unused.  Not sure what's up with that.

Link: https://lkml.kernel.org/r/20260107072009.1615991-1-ankur.a.arora@oracle.com
Link: https://lkml.kernel.org/r/20260107072009.1615991-2-ankur.a.arora@oracle.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Co-developed-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ankur Arora <ankur.a.arora@oracle.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Konrad Rzessutek Wilk <konrad.wilk@oracle.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Li Zhe <lizhe.67@bytedance.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Raghavendra K T <raghavendra.kt@amd.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20 19:24:39 -08:00

228 lines
6.5 KiB
C

/*
* This file is subject to the terms and conditions of the GNU General Public
* License. See the file "COPYING" in the main directory of this archive
* for more details.
*
* Copyright (C) 1994 - 1999, 2000, 03 Ralf Baechle
* Copyright (C) 1999, 2000 Silicon Graphics, Inc.
*/
#ifndef _ASM_PAGE_H
#define _ASM_PAGE_H
#include <spaces.h>
#include <linux/const.h>
#include <linux/kernel.h>
#include <asm/mipsregs.h>
#include <vdso/page.h>
/*
* This is used for calculating the real page sizes
* for FTLB or VTLB + FTLB configurations.
*/
static inline unsigned int page_size_ftlb(unsigned int mmuextdef)
{
switch (mmuextdef) {
case MIPS_CONF4_MMUEXTDEF_FTLBSIZEEXT:
if (PAGE_SIZE == (1 << 30))
return 5;
if (PAGE_SIZE == (1llu << 32))
return 6;
if (PAGE_SIZE > (256 << 10))
return 7; /* reserved */
fallthrough;
case MIPS_CONF4_MMUEXTDEF_VTLBSIZEEXT:
return (PAGE_SHIFT - 10) / 2;
default:
panic("Invalid FTLB configuration with Conf4_mmuextdef=%d value\n",
mmuextdef >> 14);
}
}
#ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
#define HPAGE_SHIFT (PAGE_SHIFT + PAGE_SHIFT - 3)
#define HPAGE_SIZE (_AC(1,UL) << HPAGE_SHIFT)
#define HPAGE_MASK (~(HPAGE_SIZE - 1))
#define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT)
#else /* !CONFIG_MIPS_HUGE_TLB_SUPPORT */
#define HPAGE_SHIFT ({BUILD_BUG(); 0; })
#define HPAGE_SIZE ({BUILD_BUG(); 0; })
#define HPAGE_MASK ({BUILD_BUG(); 0; })
#define HUGETLB_PAGE_ORDER ({BUILD_BUG(); 0; })
#endif /* CONFIG_MIPS_HUGE_TLB_SUPPORT */
#include <linux/pfn.h>
extern void build_clear_page(void);
extern void build_copy_page(void);
/*
* It's normally defined only for FLATMEM config but it's
* used in our early mem init code for all memory models.
* So always define it.
*/
#ifdef CONFIG_MIPS_AUTO_PFN_OFFSET
extern unsigned long ARCH_PFN_OFFSET;
# define ARCH_PFN_OFFSET ARCH_PFN_OFFSET
#else
# define ARCH_PFN_OFFSET PFN_UP(PHYS_OFFSET)
#endif
extern void clear_page(void * page);
extern void copy_page(void * to, void * from);
extern unsigned long shm_align_mask;
static inline unsigned long pages_do_alias(unsigned long addr1,
unsigned long addr2)
{
return (addr1 ^ addr2) & shm_align_mask;
}
struct page;
static inline void clear_user_page(void *addr, unsigned long vaddr,
struct page *page)
{
extern void (*flush_data_cache_page)(unsigned long addr);
clear_page(addr);
if (pages_do_alias((unsigned long) addr, vaddr & PAGE_MASK))
flush_data_cache_page((unsigned long)addr);
}
#define clear_user_page clear_user_page
struct vm_area_struct;
extern void copy_user_highpage(struct page *to, struct page *from,
unsigned long vaddr, struct vm_area_struct *vma);
#define __HAVE_ARCH_COPY_USER_HIGHPAGE
/*
* These are used to make use of C type-checking..
*/
#ifdef CONFIG_PHYS_ADDR_T_64BIT
#ifdef CONFIG_CPU_MIPS32
typedef struct { unsigned long pte_low, pte_high; } pte_t;
#define pte_val(x) ((x).pte_low | ((unsigned long long)(x).pte_high << 32))
#define __pte(x) ({ pte_t __pte = {(x), ((unsigned long long)(x)) >> 32}; __pte; })
#else
typedef struct { unsigned long long pte; } pte_t;
#define pte_val(x) ((x).pte)
#define __pte(x) ((pte_t) { (x) } )
#endif
#else
typedef struct { unsigned long pte; } pte_t;
#define pte_val(x) ((x).pte)
#define __pte(x) ((pte_t) { (x) } )
#endif
typedef struct page *pgtable_t;
/*
* Right now we don't support 4-level pagetables, so all pud-related
* definitions come from <asm-generic/pgtable-nopud.h>.
*/
/*
* Finall the top of the hierarchy, the pgd
*/
typedef struct { unsigned long pgd; } pgd_t;
#define pgd_val(x) ((x).pgd)
#define __pgd(x) ((pgd_t) { (x) } )
/*
* Manipulate page protection bits
*/
typedef struct { unsigned long pgprot; } pgprot_t;
#define pgprot_val(x) ((x).pgprot)
#define __pgprot(x) ((pgprot_t) { (x) } )
#define pte_pgprot(x) __pgprot(pte_val(x) & ~_PFN_MASK)
/*
* On R4000-style MMUs where a TLB entry is mapping a adjacent even / odd
* pair of pages we only have a single global bit per pair of pages. When
* writing to the TLB make sure we always have the bit set for both pages
* or none. This macro is used to access the `buddy' of the pte we're just
* working on.
*/
#define ptep_buddy(x) ((pte_t *)((unsigned long)(x) ^ sizeof(pte_t)))
/*
* __pa()/__va() should be used only during mem init.
*/
static inline unsigned long ___pa(unsigned long x)
{
if (IS_ENABLED(CONFIG_64BIT)) {
/*
* For MIPS64 the virtual address may either be in one of
* the compatibility segments ckseg0 or ckseg1, or it may
* be in xkphys.
*/
return x < CKSEG0 ? XPHYSADDR(x) : CPHYSADDR(x);
}
if (!IS_ENABLED(CONFIG_EVA)) {
/*
* We're using the standard MIPS32 legacy memory map, ie.
* the address x is going to be in kseg0 or kseg1. We can
* handle either case by masking out the desired bits using
* CPHYSADDR.
*/
return CPHYSADDR(x);
}
/*
* EVA is in use so the memory map could be anything, making it not
* safe to just mask out bits.
*/
return x - PAGE_OFFSET + PHYS_OFFSET;
}
#define __pa(x) ___pa((unsigned long)(x))
#define __va(x) ((void *)((unsigned long)(x) + PAGE_OFFSET - PHYS_OFFSET))
#include <asm/io.h>
/*
* RELOC_HIDE was originally added by 6007b903dfe5f1d13e0c711ac2894bdd4a61b1ad
* (lmo) rsp. 8431fd094d625b94d364fe393076ccef88e6ce18 (kernel.org). The
* discussion can be found in
* https://lore.kernel.org/lkml/a2ebde260608230500o3407b108hc03debb9da6e62c@mail.gmail.com
*
* It is unclear if the misscompilations mentioned in
* https://lore.kernel.org/lkml/1281303490-390-1-git-send-email-namhyung@gmail.com
* also affect MIPS so we keep this one until GCC 3.x has been retired
* before we can apply https://patchwork.linux-mips.org/patch/1541/
*/
#define __pa_symbol_nodebug(x) __pa(RELOC_HIDE((unsigned long)(x), 0))
#ifdef CONFIG_DEBUG_VIRTUAL
extern phys_addr_t __phys_addr_symbol(unsigned long x);
#else
#define __phys_addr_symbol(x) __pa_symbol_nodebug(x)
#endif
#ifndef __pa_symbol
#define __pa_symbol(x) __phys_addr_symbol((unsigned long)(x))
#endif
#define pfn_to_kaddr(pfn) __va((pfn) << PAGE_SHIFT)
#define virt_to_pfn(kaddr) PFN_DOWN(virt_to_phys((void *)(kaddr)))
#define virt_to_page(kaddr) pfn_to_page(virt_to_pfn(kaddr))
extern bool __virt_addr_valid(const volatile void *kaddr);
#define virt_addr_valid(kaddr) \
__virt_addr_valid((const volatile void *) (kaddr))
#define VM_DATA_DEFAULT_FLAGS VM_DATA_FLAGS_TSK_EXEC
extern unsigned long __kaslr_offset;
static inline unsigned long kaslr_offset(void)
{
return __kaslr_offset;
}
#include <asm-generic/memory_model.h>
#include <asm-generic/getorder.h>
#endif /* _ASM_PAGE_H */