2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00

- The 11 patch series "Add folio_mk_pte()" from Matthew Wilcox

simplifies the act of creating a pte which addresses the first page in a
   folio and reduces the amount of plumbing which architecture must
   implement to provide this.
 
 - The 8 patch series "Misc folio patches for 6.16" from Matthew Wilcox
   is a shower of largely unrelated folio infrastructure changes which
   clean things up and better prepare us for future work.
 
 - The 3 patch series "memory,x86,acpi: hotplug memory alignment
   advisement" from Gregory Price adds early-init code to prevent x86 from
   leaving physical memory unused when physical address regions are not
   aligned to memory block size.
 
 - The 2 patch series "mm/compaction: allow more aggressive proactive
   compaction" from Michal Clapinski provides some tuning of the (sadly,
   hard-coded (more sadly, not auto-tuned)) thresholds for our invokation
   of proactive compaction.  In a simple test case, the reduction of a guest
   VM's memory consumption was dramatic.
 
 - The 8 patch series "Minor cleanups and improvements to swap freeing
   code" from Kemeng Shi provides some code cleaups and a small efficiency
   improvement to this part of our swap handling code.
 
 - The 6 patch series "ptrace: introduce PTRACE_SET_SYSCALL_INFO API"
   from Dmitry Levin adds the ability for a ptracer to modify syscalls
   arguments.  At this time we can alter only "system call information that
   are used by strace system call tampering, namely, syscall number,
   syscall arguments, and syscall return value.
 
   This series should have been incorporated into mm.git's "non-MM"
   branch, but I goofed.
 
 - The 3 patch series "fs/proc: extend the PAGEMAP_SCAN ioctl to report
   guard regions" from Andrei Vagin extends the info returned by the
   PAGEMAP_SCAN ioctl against /proc/pid/pagemap.  This permits CRIU to more
   efficiently get at the info about guard regions.
 
 - The 2 patch series "Fix parameter passed to page_mapcount_is_type()"
   from Gavin Shan implements that fix.  No runtime effect is expected
   because validate_page_before_insert() happens to fix up this error.
 
 - The 3 patch series "kernel/events/uprobes: uprobe_write_opcode()
   rewrite" from David Hildenbrand basically brings uprobe text poking into
   the current decade.  Remove a bunch of hand-rolled implementation in
   favor of using more current facilities.
 
 - The 3 patch series "mm/ptdump: Drop assumption that pxd_val() is u64"
   from Anshuman Khandual provides enhancements and generalizations to the
   pte dumping code.  This might be needed when 128-bit Page Table
   Descriptors are enabled for ARM.
 
 - The 12 patch series "Always call constructor for kernel page tables"
   from Kevin Brodsky "ensures that the ctor/dtor is always called for
   kernel pgtables, as it already is for user pgtables".  This permits the
   addition of more functionality such as "insert hooks to protect page
   tables".  This change does result in various architectures performing
   unnecesary work, but this is fixed up where it is anticipated to occur.
 
 - The 9 patch series "Rust support for mm_struct, vm_area_struct, and
   mmap" from Alice Ryhl adds plumbing to permit Rust access to core MM
   structures.
 
 - The 3 patch series "fix incorrectly disallowed anonymous VMA merges"
   from Lorenzo Stoakes takes advantage of some VMA merging opportunities
   which we've been missing for 15 years.
 
 - The 4 patch series "mm/madvise: batch tlb flushes for MADV_DONTNEED
   and MADV_FREE" from SeongJae Park optimizes process_madvise()'s TLB
   flushing.  Instead of flushing each address range in the provided iovec,
   we batch the flushing across all the iovec entries.  The syscall's cost
   was approximately halved with a microbenchmark which was designed to
   load this particular operation.
 
 - The 6 patch series "Track node vacancy to reduce worst case allocation
   counts" from Sidhartha Kumar makes the maple tree smarter about its node
   preallocation.  stress-ng mmap performance increased by single-digit
   percentages and the amount of unnecessarily preallocated memory was
   dramaticelly reduced.
 
 - The 3 patch series "mm/gup: Minor fix, cleanup and improvements" from
   Baoquan He removes a few unnecessary things which Baoquan noted when
   reading the code.
 
 - The 3 patch series ""Enhance sysfs handling for memory hotplug in
   weighted interleave" from Rakie Kim "enhances the weighted interleave
   policy in the memory management subsystem by improving sysfs handling,
   fixing memory leaks, and introducing dynamic sysfs updates for memory
   hotplug support".  Fixes things on error paths which we are unlikely to
   hit.
 
 - The 7 patch series "mm/damon: auto-tune DAMOS for NUMA setups
   including tiered memory" from SeongJae Park introduces new DAMOS quota
   goal metrics which eliminate the manual tuning which is required when
   utilizing DAMON for memory tiering.
 
 - The 5 patch series "mm/vmalloc.c: code cleanup and improvements" from
   Baoquan He provides cleanups and small efficiency improvements which
   Baoquan found via code inspection.
 
 - The 2 patch series "vmscan: enforce mems_effective during demotion"
   from Gregory Price "changes reclaim to respect cpuset.mems_effective
   during demotion when possible".  because "presently, reclaim explicitly
   ignores cpuset.mems_effective when demoting, which may cause the cpuset
   settings to violated." "This is useful for isolating workloads on a
   multi-tenant system from certain classes of memory more consistently."
 
 - The 2 patch series ""Clean up split_huge_pmd_locked() and remove
   unnecessary folio pointers" from Gavin Guo provides minor cleanups and
   efficiency gains in in the huge page splitting and migrating code.
 
 - The 3 patch series "Use kmem_cache for memcg alloc" from Huan Yang
   creates a slab cache for `struct mem_cgroup', yielding improved memory
   utilization.
 
 - The 4 patch series "add max arg to swappiness in memory.reclaim and
   lru_gen" from Zhongkun He adds a new "max" argument to the "swappiness="
   argument for memory.reclaim MGLRU's lru_gen.  This directs proactive
   reclaim to reclaim from only anon folios rather than file-backed folios.
 
 - The 17 patch series "kexec: introduce Kexec HandOver (KHO)" from Mike
   Rapoport is the first step on the path to permitting the kernel to
   maintain existing VMs while replacing the host kernel via file-based
   kexec.  At this time only memblock's reserve_mem is preserved.
 
 - The 7 patch series "mm: Introduce for_each_valid_pfn()" from David
   Woodhouse provides and uses a smarter way of looping over a pfn range.
   By skipping ranges of invalid pfns.
 
 - The 2 patch series "sched/numa: Skip VMA scanning on memory pinned to
   one NUMA node via cpuset.mems" from Libo Chen removes a lot of pointless
   VMA scanning when a task is pinned a single NUMA mode.  Dramatic
   performance benefits were seen in some real world cases.
 
 - The 2 patch series "JFS: Implement migrate_folio for
   jfs_metapage_aops" from Shivank Garg addresses a warning which occurs
   during memory compaction when using JFS.
 
 - The 4 patch series "move all VMA allocation, freeing and duplication
   logic to mm" from Lorenzo Stoakes moves some VMA code from kernel/fork.c
   into the more appropriate mm/vma.c.
 
 - The 6 patch series "mm, swap: clean up swap cache mapping helper" from
   Kairui Song provides code consolidation and cleanups related to the
   folio_index() function.
 
 - The 2 patch series "mm/gup: Cleanup memfd_pin_folios()" from Vishal
   Moola does that.
 
 - The 8 patch series "memcg: Fix test_memcg_min/low test failures" from
   Waiman Long addresses some bogus failures which are being reported by
   the test_memcontrol selftest.
 
 - The 3 patch series "eliminate mmap() retry merge, add .mmap_prepare
   hook" from Lorenzo Stoakes commences the deprecation of
   file_operations.mmap() in favor of the new
   file_operations.mmap_prepare().  The latter is more restrictive and
   prevents drivers from messing with things in ways which, amongst other
   problems, may defeat VMA merging.
 
 - The 4 patch series "memcg: decouple memcg and objcg stocks"" from
   Shakeel Butt decouples the per-cpu memcg charge cache from the objcg's
   one.  This is a step along the way to making memcg and objcg charging
   NMI-safe, which is a BPF requirement.
 
 - The 6 patch series "mm/damon: minor fixups and improvements for code,
   tests, and documents" from SeongJae Park is "yet another batch of
   miscellaneous DAMON changes.  Fix and improve minor problems in code,
   tests and documents."
 
 - The 7 patch series "memcg: make memcg stats irq safe" from Shakeel
   Butt converts memcg stats to be irq safe.  Another step along the way to
   making memcg charging and stats updates NMI-safe, a BPF requirement.
 
 - The 4 patch series "Let unmap_hugepage_range() and several related
   functions take folio instead of page" from Fan Ni provides folio
   conversions in the hugetlb code.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaDt5qgAKCRDdBJ7gKXxA
 ju6XAP9nTiSfRz8Cz1n5LJZpFKEGzLpSihCYyR6P3o1L9oe3mwEAlZ5+XAwk2I5x
 Qqb/UGMEpilyre1PayQqOnct3aSL9Ao=
 =tYYm
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - "Add folio_mk_pte()" from Matthew Wilcox simplifies the act of
   creating a pte which addresses the first page in a folio and reduces
   the amount of plumbing which architecture must implement to provide
   this.

 - "Misc folio patches for 6.16" from Matthew Wilcox is a shower of
   largely unrelated folio infrastructure changes which clean things up
   and better prepare us for future work.

 - "memory,x86,acpi: hotplug memory alignment advisement" from Gregory
   Price adds early-init code to prevent x86 from leaving physical
   memory unused when physical address regions are not aligned to memory
   block size.

 - "mm/compaction: allow more aggressive proactive compaction" from
   Michal Clapinski provides some tuning of the (sadly, hard-coded (more
   sadly, not auto-tuned)) thresholds for our invokation of proactive
   compaction. In a simple test case, the reduction of a guest VM's
   memory consumption was dramatic.

 - "Minor cleanups and improvements to swap freeing code" from Kemeng
   Shi provides some code cleaups and a small efficiency improvement to
   this part of our swap handling code.

 - "ptrace: introduce PTRACE_SET_SYSCALL_INFO API" from Dmitry Levin
   adds the ability for a ptracer to modify syscalls arguments. At this
   time we can alter only "system call information that are used by
   strace system call tampering, namely, syscall number, syscall
   arguments, and syscall return value.

   This series should have been incorporated into mm.git's "non-MM"
   branch, but I goofed.

 - "fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions" from
   Andrei Vagin extends the info returned by the PAGEMAP_SCAN ioctl
   against /proc/pid/pagemap. This permits CRIU to more efficiently get
   at the info about guard regions.

 - "Fix parameter passed to page_mapcount_is_type()" from Gavin Shan
   implements that fix. No runtime effect is expected because
   validate_page_before_insert() happens to fix up this error.

 - "kernel/events/uprobes: uprobe_write_opcode() rewrite" from David
   Hildenbrand basically brings uprobe text poking into the current
   decade. Remove a bunch of hand-rolled implementation in favor of
   using more current facilities.

 - "mm/ptdump: Drop assumption that pxd_val() is u64" from Anshuman
   Khandual provides enhancements and generalizations to the pte dumping
   code. This might be needed when 128-bit Page Table Descriptors are
   enabled for ARM.

 - "Always call constructor for kernel page tables" from Kevin Brodsky
   ensures that the ctor/dtor is always called for kernel pgtables, as
   it already is for user pgtables.

   This permits the addition of more functionality such as "insert hooks
   to protect page tables". This change does result in various
   architectures performing unnecesary work, but this is fixed up where
   it is anticipated to occur.

 - "Rust support for mm_struct, vm_area_struct, and mmap" from Alice
   Ryhl adds plumbing to permit Rust access to core MM structures.

 - "fix incorrectly disallowed anonymous VMA merges" from Lorenzo
   Stoakes takes advantage of some VMA merging opportunities which we've
   been missing for 15 years.

 - "mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE" from
   SeongJae Park optimizes process_madvise()'s TLB flushing.

   Instead of flushing each address range in the provided iovec, we
   batch the flushing across all the iovec entries. The syscall's cost
   was approximately halved with a microbenchmark which was designed to
   load this particular operation.

 - "Track node vacancy to reduce worst case allocation counts" from
   Sidhartha Kumar makes the maple tree smarter about its node
   preallocation.

   stress-ng mmap performance increased by single-digit percentages and
   the amount of unnecessarily preallocated memory was dramaticelly
   reduced.

 - "mm/gup: Minor fix, cleanup and improvements" from Baoquan He removes
   a few unnecessary things which Baoquan noted when reading the code.

 - ""Enhance sysfs handling for memory hotplug in weighted interleave"
   from Rakie Kim "enhances the weighted interleave policy in the memory
   management subsystem by improving sysfs handling, fixing memory
   leaks, and introducing dynamic sysfs updates for memory hotplug
   support". Fixes things on error paths which we are unlikely to hit.

 - "mm/damon: auto-tune DAMOS for NUMA setups including tiered memory"
   from SeongJae Park introduces new DAMOS quota goal metrics which
   eliminate the manual tuning which is required when utilizing DAMON
   for memory tiering.

 - "mm/vmalloc.c: code cleanup and improvements" from Baoquan He
   provides cleanups and small efficiency improvements which Baoquan
   found via code inspection.

 - "vmscan: enforce mems_effective during demotion" from Gregory Price
   changes reclaim to respect cpuset.mems_effective during demotion when
   possible. because presently, reclaim explicitly ignores
   cpuset.mems_effective when demoting, which may cause the cpuset
   settings to violated.

   This is useful for isolating workloads on a multi-tenant system from
   certain classes of memory more consistently.

 - "Clean up split_huge_pmd_locked() and remove unnecessary folio
   pointers" from Gavin Guo provides minor cleanups and efficiency gains
   in in the huge page splitting and migrating code.

 - "Use kmem_cache for memcg alloc" from Huan Yang creates a slab cache
   for `struct mem_cgroup', yielding improved memory utilization.

 - "add max arg to swappiness in memory.reclaim and lru_gen" from
   Zhongkun He adds a new "max" argument to the "swappiness=" argument
   for memory.reclaim MGLRU's lru_gen.

   This directs proactive reclaim to reclaim from only anon folios
   rather than file-backed folios.

 - "kexec: introduce Kexec HandOver (KHO)" from Mike Rapoport is the
   first step on the path to permitting the kernel to maintain existing
   VMs while replacing the host kernel via file-based kexec. At this
   time only memblock's reserve_mem is preserved.

 - "mm: Introduce for_each_valid_pfn()" from David Woodhouse provides
   and uses a smarter way of looping over a pfn range. By skipping
   ranges of invalid pfns.

 - "sched/numa: Skip VMA scanning on memory pinned to one NUMA node via
   cpuset.mems" from Libo Chen removes a lot of pointless VMA scanning
   when a task is pinned a single NUMA mode.

   Dramatic performance benefits were seen in some real world cases.

 - "JFS: Implement migrate_folio for jfs_metapage_aops" from Shivank
   Garg addresses a warning which occurs during memory compaction when
   using JFS.

 - "move all VMA allocation, freeing and duplication logic to mm" from
   Lorenzo Stoakes moves some VMA code from kernel/fork.c into the more
   appropriate mm/vma.c.

 - "mm, swap: clean up swap cache mapping helper" from Kairui Song
   provides code consolidation and cleanups related to the folio_index()
   function.

 - "mm/gup: Cleanup memfd_pin_folios()" from Vishal Moola does that.

 - "memcg: Fix test_memcg_min/low test failures" from Waiman Long
   addresses some bogus failures which are being reported by the
   test_memcontrol selftest.

 - "eliminate mmap() retry merge, add .mmap_prepare hook" from Lorenzo
   Stoakes commences the deprecation of file_operations.mmap() in favor
   of the new file_operations.mmap_prepare().

   The latter is more restrictive and prevents drivers from messing with
   things in ways which, amongst other problems, may defeat VMA merging.

 - "memcg: decouple memcg and objcg stocks"" from Shakeel Butt decouples
   the per-cpu memcg charge cache from the objcg's one.

   This is a step along the way to making memcg and objcg charging
   NMI-safe, which is a BPF requirement.

 - "mm/damon: minor fixups and improvements for code, tests, and
   documents" from SeongJae Park is yet another batch of miscellaneous
   DAMON changes. Fix and improve minor problems in code, tests and
   documents.

 - "memcg: make memcg stats irq safe" from Shakeel Butt converts memcg
   stats to be irq safe. Another step along the way to making memcg
   charging and stats updates NMI-safe, a BPF requirement.

 - "Let unmap_hugepage_range() and several related functions take folio
   instead of page" from Fan Ni provides folio conversions in the
   hugetlb code.

* tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (285 commits)
  mm: pcp: increase pcp->free_count threshold to trigger free_high
  mm/hugetlb: convert use of struct page to folio in __unmap_hugepage_range()
  mm/hugetlb: refactor __unmap_hugepage_range() to take folio instead of page
  mm/hugetlb: refactor unmap_hugepage_range() to take folio instead of page
  mm/hugetlb: pass folio instead of page to unmap_ref_private()
  memcg: objcg stock trylock without irq disabling
  memcg: no stock lock for cpu hot-unplug
  memcg: make __mod_memcg_lruvec_state re-entrant safe against irqs
  memcg: make count_memcg_events re-entrant safe against irqs
  memcg: make mod_memcg_state re-entrant safe against irqs
  memcg: move preempt disable to callers of memcg_rstat_updated
  memcg: memcg_rstat_updated re-entrant safe against irqs
  mm: khugepaged: decouple SHMEM and file folios' collapse
  selftests/eventfd: correct test name and improve messages
  alloc_tag: check mem_profiling_support in alloc_tag_init
  Docs/damon: update titles and brief introductions to explain DAMOS
  selftests/damon/_damon_sysfs: read tried regions directories in order
  mm/damon/tests/core-kunit: add a test for damos_set_filters_default_reject()
  mm/damon/paddr: remove unused variable, folio_list, in damon_pa_stat()
  mm/damon/sysfs-schemes: fix wrong comment on damons_sysfs_quota_goal_metric_strs
  ...
This commit is contained in:
Linus Torvalds 2025-05-31 15:44:16 -07:00
commit 00c010e130
318 changed files with 11621 additions and 4277 deletions

View File

@ -2336,7 +2336,7 @@ D: Author of the dialog utility, foundation
D: for Menuconfig's lxdialog. D: for Menuconfig's lxdialog.
N: Christoph Lameter N: Christoph Lameter
E: christoph@lameter.com E: cl@gentwo.org
D: Digiboard PC/Xe and PC/Xi, Digiboard EPCA D: Digiboard PC/Xe and PC/Xi, Digiboard EPCA
D: NUMA support, Slab allocators, Page migration D: NUMA support, Slab allocators, Page migration
D: Scalability, Time subsystem D: Scalability, Time subsystem

View File

@ -283,6 +283,12 @@ Contact: SeongJae Park <sj@kernel.org>
Description: Writing to and reading from this file sets and gets the current Description: Writing to and reading from this file sets and gets the current
value of the goal metric. value of the goal metric.
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/goals/<G>/nid
Date: Apr 2025
Contact: SeongJae Park <sj@kernel.org>
Description: Writing to and reading from this file sets and gets the nid
parameter of the goal.
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/weights/sz_permil What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/weights/sz_permil
Date: Mar 2022 Date: Mar 2022
Contact: SeongJae Park <sj@kernel.org> Contact: SeongJae Park <sj@kernel.org>

View File

@ -20,6 +20,35 @@ Description: Weight configuration interface for nodeN
Minimum weight: 1 Minimum weight: 1
Maximum weight: 255 Maximum weight: 255
Writing an empty string or `0` will reset the weight to the Writing invalid values (i.e. any values not in [1,255],
system default. The system default may be set by the kernel empty string, ...) will return -EINVAL.
or drivers at boot or during hotplug events.
Changing the weight to a valid value will automatically
switch the system to manual mode as well.
What: /sys/kernel/mm/mempolicy/weighted_interleave/auto
Date: May 2025
Contact: Linux memory management mailing list <linux-mm@kvack.org>
Description: Auto-weighting configuration interface
Configuration mode for weighted interleave. 'true' indicates
that the system is in auto mode, and a 'false' indicates that
the system is in manual mode.
In auto mode, all node weights are re-calculated and overwritten
(visible via the nodeN interfaces) whenever new bandwidth data
is made available during either boot or hotplug events.
In manual mode, node weights can only be updated by the user.
Note that nodes that are onlined with previously set weights
will reuse those weights. If they were not previously set or
are onlined with missing bandwidth data, the weights will use
a default weight of 1.
Writing any true value string (e.g. Y or 1) will enable auto
mode, while writing any false value string (e.g. N or 0) will
enable manual mode. All other strings are ignored and will
return -EINVAL.
Writing a new weight to a node directly via the nodeN interface
will also automatically switch the system to manual mode.

View File

@ -16,9 +16,13 @@ Description: Enable/disable demoting pages during reclaim
Allowing page migration during reclaim enables these Allowing page migration during reclaim enables these
systems to migrate pages from fast tiers to slow tiers systems to migrate pages from fast tiers to slow tiers
when the fast tier is under pressure. This migration when the fast tier is under pressure. This migration
is performed before swap. It may move data to a NUMA is performed before swap if an eligible numa node is
node that does not fall into the cpuset of the present in cpuset.mems for the cgroup (or if cpuset v1
allocating process which might be construed to violate is being used). If cpusets.mems changes at runtime, it
the guarantees of cpusets. This should not be enabled may move data to a NUMA node that does not fall into the
on systems which need strict cpuset location cpuset of the new cpusets.mems, which might be construed
guarantees. to violate the guarantees of cpusets. Shared memory,
such as libraries, owned by another cgroup may still be
demoted and result in memory use on a node not present
in cpusets.mem. This should not be enabled on systems
which need strict cpuset location guarantees.

View File

@ -2,7 +2,7 @@ What: /sys/kernel/slab
Date: May 2007 Date: May 2007
KernelVersion: 2.6.22 KernelVersion: 2.6.22
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The /sys/kernel/slab directory contains a snapshot of the The /sys/kernel/slab directory contains a snapshot of the
internal state of the SLUB allocator for each cache. Certain internal state of the SLUB allocator for each cache. Certain
@ -14,7 +14,7 @@ What: /sys/kernel/slab/<cache>/aliases
Date: May 2007 Date: May 2007
KernelVersion: 2.6.22 KernelVersion: 2.6.22
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The aliases file is read-only and specifies how many caches The aliases file is read-only and specifies how many caches
have merged into this cache. have merged into this cache.
@ -23,7 +23,7 @@ What: /sys/kernel/slab/<cache>/align
Date: May 2007 Date: May 2007
KernelVersion: 2.6.22 KernelVersion: 2.6.22
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The align file is read-only and specifies the cache's object The align file is read-only and specifies the cache's object
alignment in bytes. alignment in bytes.
@ -32,7 +32,7 @@ What: /sys/kernel/slab/<cache>/alloc_calls
Date: May 2007 Date: May 2007
KernelVersion: 2.6.22 KernelVersion: 2.6.22
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The alloc_calls file is read-only and lists the kernel code The alloc_calls file is read-only and lists the kernel code
locations from which allocations for this cache were performed. locations from which allocations for this cache were performed.
@ -43,7 +43,7 @@ What: /sys/kernel/slab/<cache>/alloc_fastpath
Date: February 2008 Date: February 2008
KernelVersion: 2.6.25 KernelVersion: 2.6.25
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The alloc_fastpath file shows how many objects have been The alloc_fastpath file shows how many objects have been
allocated using the fast path. It can be written to clear the allocated using the fast path. It can be written to clear the
@ -54,7 +54,7 @@ What: /sys/kernel/slab/<cache>/alloc_from_partial
Date: February 2008 Date: February 2008
KernelVersion: 2.6.25 KernelVersion: 2.6.25
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The alloc_from_partial file shows how many times a cpu slab has The alloc_from_partial file shows how many times a cpu slab has
been full and it has been refilled by using a slab from the list been full and it has been refilled by using a slab from the list
@ -66,7 +66,7 @@ What: /sys/kernel/slab/<cache>/alloc_refill
Date: February 2008 Date: February 2008
KernelVersion: 2.6.25 KernelVersion: 2.6.25
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The alloc_refill file shows how many times the per-cpu freelist The alloc_refill file shows how many times the per-cpu freelist
was empty but there were objects available as the result of was empty but there were objects available as the result of
@ -77,7 +77,7 @@ What: /sys/kernel/slab/<cache>/alloc_slab
Date: February 2008 Date: February 2008
KernelVersion: 2.6.25 KernelVersion: 2.6.25
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The alloc_slab file is shows how many times a new slab had to The alloc_slab file is shows how many times a new slab had to
be allocated from the page allocator. It can be written to be allocated from the page allocator. It can be written to
@ -88,7 +88,7 @@ What: /sys/kernel/slab/<cache>/alloc_slowpath
Date: February 2008 Date: February 2008
KernelVersion: 2.6.25 KernelVersion: 2.6.25
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The alloc_slowpath file shows how many objects have been The alloc_slowpath file shows how many objects have been
allocated using the slow path because of a refill or allocated using the slow path because of a refill or
@ -100,7 +100,7 @@ What: /sys/kernel/slab/<cache>/cache_dma
Date: May 2007 Date: May 2007
KernelVersion: 2.6.22 KernelVersion: 2.6.22
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The cache_dma file is read-only and specifies whether objects The cache_dma file is read-only and specifies whether objects
are from ZONE_DMA. are from ZONE_DMA.
@ -110,7 +110,7 @@ What: /sys/kernel/slab/<cache>/cpu_slabs
Date: May 2007 Date: May 2007
KernelVersion: 2.6.22 KernelVersion: 2.6.22
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The cpu_slabs file is read-only and displays how many cpu slabs The cpu_slabs file is read-only and displays how many cpu slabs
are active and their NUMA locality. are active and their NUMA locality.
@ -119,7 +119,7 @@ What: /sys/kernel/slab/<cache>/cpuslab_flush
Date: April 2009 Date: April 2009
KernelVersion: 2.6.31 KernelVersion: 2.6.31
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The file cpuslab_flush shows how many times a cache's cpu slabs The file cpuslab_flush shows how many times a cache's cpu slabs
have been flushed as the result of destroying or shrinking a have been flushed as the result of destroying or shrinking a
@ -132,7 +132,7 @@ What: /sys/kernel/slab/<cache>/ctor
Date: May 2007 Date: May 2007
KernelVersion: 2.6.22 KernelVersion: 2.6.22
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The ctor file is read-only and specifies the cache's object The ctor file is read-only and specifies the cache's object
constructor function, which is invoked for each object when a constructor function, which is invoked for each object when a
@ -142,7 +142,7 @@ What: /sys/kernel/slab/<cache>/deactivate_empty
Date: February 2008 Date: February 2008
KernelVersion: 2.6.25 KernelVersion: 2.6.25
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The deactivate_empty file shows how many times an empty cpu slab The deactivate_empty file shows how many times an empty cpu slab
was deactivated. It can be written to clear the current count. was deactivated. It can be written to clear the current count.
@ -152,7 +152,7 @@ What: /sys/kernel/slab/<cache>/deactivate_full
Date: February 2008 Date: February 2008
KernelVersion: 2.6.25 KernelVersion: 2.6.25
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The deactivate_full file shows how many times a full cpu slab The deactivate_full file shows how many times a full cpu slab
was deactivated. It can be written to clear the current count. was deactivated. It can be written to clear the current count.
@ -162,7 +162,7 @@ What: /sys/kernel/slab/<cache>/deactivate_remote_frees
Date: February 2008 Date: February 2008
KernelVersion: 2.6.25 KernelVersion: 2.6.25
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The deactivate_remote_frees file shows how many times a cpu slab The deactivate_remote_frees file shows how many times a cpu slab
has been deactivated and contained free objects that were freed has been deactivated and contained free objects that were freed
@ -173,7 +173,7 @@ What: /sys/kernel/slab/<cache>/deactivate_to_head
Date: February 2008 Date: February 2008
KernelVersion: 2.6.25 KernelVersion: 2.6.25
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The deactivate_to_head file shows how many times a partial cpu The deactivate_to_head file shows how many times a partial cpu
slab was deactivated and added to the head of its node's partial slab was deactivated and added to the head of its node's partial
@ -184,7 +184,7 @@ What: /sys/kernel/slab/<cache>/deactivate_to_tail
Date: February 2008 Date: February 2008
KernelVersion: 2.6.25 KernelVersion: 2.6.25
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The deactivate_to_tail file shows how many times a partial cpu The deactivate_to_tail file shows how many times a partial cpu
slab was deactivated and added to the tail of its node's partial slab was deactivated and added to the tail of its node's partial
@ -195,7 +195,7 @@ What: /sys/kernel/slab/<cache>/destroy_by_rcu
Date: May 2007 Date: May 2007
KernelVersion: 2.6.22 KernelVersion: 2.6.22
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The destroy_by_rcu file is read-only and specifies whether The destroy_by_rcu file is read-only and specifies whether
slabs (not objects) are freed by rcu. slabs (not objects) are freed by rcu.
@ -204,7 +204,7 @@ What: /sys/kernel/slab/<cache>/free_add_partial
Date: February 2008 Date: February 2008
KernelVersion: 2.6.25 KernelVersion: 2.6.25
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The free_add_partial file shows how many times an object has The free_add_partial file shows how many times an object has
been freed in a full slab so that it had to added to its node's been freed in a full slab so that it had to added to its node's
@ -215,7 +215,7 @@ What: /sys/kernel/slab/<cache>/free_calls
Date: May 2007 Date: May 2007
KernelVersion: 2.6.22 KernelVersion: 2.6.22
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The free_calls file is read-only and lists the locations of The free_calls file is read-only and lists the locations of
object frees if slab debugging is enabled (see object frees if slab debugging is enabled (see
@ -225,7 +225,7 @@ What: /sys/kernel/slab/<cache>/free_fastpath
Date: February 2008 Date: February 2008
KernelVersion: 2.6.25 KernelVersion: 2.6.25
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The free_fastpath file shows how many objects have been freed The free_fastpath file shows how many objects have been freed
using the fast path because it was an object from the cpu slab. using the fast path because it was an object from the cpu slab.
@ -236,7 +236,7 @@ What: /sys/kernel/slab/<cache>/free_frozen
Date: February 2008 Date: February 2008
KernelVersion: 2.6.25 KernelVersion: 2.6.25
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The free_frozen file shows how many objects have been freed to The free_frozen file shows how many objects have been freed to
a frozen slab (i.e. a remote cpu slab). It can be written to a frozen slab (i.e. a remote cpu slab). It can be written to
@ -247,7 +247,7 @@ What: /sys/kernel/slab/<cache>/free_remove_partial
Date: February 2008 Date: February 2008
KernelVersion: 2.6.25 KernelVersion: 2.6.25
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The free_remove_partial file shows how many times an object has The free_remove_partial file shows how many times an object has
been freed to a now-empty slab so that it had to be removed from been freed to a now-empty slab so that it had to be removed from
@ -259,7 +259,7 @@ What: /sys/kernel/slab/<cache>/free_slab
Date: February 2008 Date: February 2008
KernelVersion: 2.6.25 KernelVersion: 2.6.25
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The free_slab file shows how many times an empty slab has been The free_slab file shows how many times an empty slab has been
freed back to the page allocator. It can be written to clear freed back to the page allocator. It can be written to clear
@ -270,7 +270,7 @@ What: /sys/kernel/slab/<cache>/free_slowpath
Date: February 2008 Date: February 2008
KernelVersion: 2.6.25 KernelVersion: 2.6.25
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The free_slowpath file shows how many objects have been freed The free_slowpath file shows how many objects have been freed
using the slow path (i.e. to a full or partial slab). It can using the slow path (i.e. to a full or partial slab). It can
@ -281,7 +281,7 @@ What: /sys/kernel/slab/<cache>/hwcache_align
Date: May 2007 Date: May 2007
KernelVersion: 2.6.22 KernelVersion: 2.6.22
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The hwcache_align file is read-only and specifies whether The hwcache_align file is read-only and specifies whether
objects are aligned on cachelines. objects are aligned on cachelines.
@ -301,7 +301,7 @@ What: /sys/kernel/slab/<cache>/object_size
Date: May 2007 Date: May 2007
KernelVersion: 2.6.22 KernelVersion: 2.6.22
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The object_size file is read-only and specifies the cache's The object_size file is read-only and specifies the cache's
object size. object size.
@ -310,7 +310,7 @@ What: /sys/kernel/slab/<cache>/objects
Date: May 2007 Date: May 2007
KernelVersion: 2.6.22 KernelVersion: 2.6.22
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The objects file is read-only and displays how many objects are The objects file is read-only and displays how many objects are
active and from which nodes they are from. active and from which nodes they are from.
@ -319,7 +319,7 @@ What: /sys/kernel/slab/<cache>/objects_partial
Date: April 2008 Date: April 2008
KernelVersion: 2.6.26 KernelVersion: 2.6.26
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The objects_partial file is read-only and displays how many The objects_partial file is read-only and displays how many
objects are on partial slabs and from which nodes they are objects are on partial slabs and from which nodes they are
@ -329,7 +329,7 @@ What: /sys/kernel/slab/<cache>/objs_per_slab
Date: May 2007 Date: May 2007
KernelVersion: 2.6.22 KernelVersion: 2.6.22
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The file objs_per_slab is read-only and specifies how many The file objs_per_slab is read-only and specifies how many
objects may be allocated from a single slab of the order objects may be allocated from a single slab of the order
@ -339,7 +339,7 @@ What: /sys/kernel/slab/<cache>/order
Date: May 2007 Date: May 2007
KernelVersion: 2.6.22 KernelVersion: 2.6.22
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The order file specifies the page order at which new slabs are The order file specifies the page order at which new slabs are
allocated. It is writable and can be changed to increase the allocated. It is writable and can be changed to increase the
@ -356,7 +356,7 @@ What: /sys/kernel/slab/<cache>/order_fallback
Date: April 2008 Date: April 2008
KernelVersion: 2.6.26 KernelVersion: 2.6.26
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The order_fallback file shows how many times an allocation of a The order_fallback file shows how many times an allocation of a
new slab has not been possible at the cache's order and instead new slab has not been possible at the cache's order and instead
@ -369,7 +369,7 @@ What: /sys/kernel/slab/<cache>/partial
Date: May 2007 Date: May 2007
KernelVersion: 2.6.22 KernelVersion: 2.6.22
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The partial file is read-only and displays how long many The partial file is read-only and displays how long many
partial slabs there are and how long each node's list is. partial slabs there are and how long each node's list is.
@ -378,7 +378,7 @@ What: /sys/kernel/slab/<cache>/poison
Date: May 2007 Date: May 2007
KernelVersion: 2.6.22 KernelVersion: 2.6.22
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The poison file specifies whether objects should be poisoned The poison file specifies whether objects should be poisoned
when a new slab is allocated. when a new slab is allocated.
@ -387,7 +387,7 @@ What: /sys/kernel/slab/<cache>/reclaim_account
Date: May 2007 Date: May 2007
KernelVersion: 2.6.22 KernelVersion: 2.6.22
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The reclaim_account file specifies whether the cache's objects The reclaim_account file specifies whether the cache's objects
are reclaimable (and grouped by their mobility). are reclaimable (and grouped by their mobility).
@ -396,7 +396,7 @@ What: /sys/kernel/slab/<cache>/red_zone
Date: May 2007 Date: May 2007
KernelVersion: 2.6.22 KernelVersion: 2.6.22
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The red_zone file specifies whether the cache's objects are red The red_zone file specifies whether the cache's objects are red
zoned. zoned.
@ -405,7 +405,7 @@ What: /sys/kernel/slab/<cache>/remote_node_defrag_ratio
Date: January 2008 Date: January 2008
KernelVersion: 2.6.25 KernelVersion: 2.6.25
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The file remote_node_defrag_ratio specifies the percentage of The file remote_node_defrag_ratio specifies the percentage of
times SLUB will attempt to refill the cpu slab with a partial times SLUB will attempt to refill the cpu slab with a partial
@ -419,7 +419,7 @@ What: /sys/kernel/slab/<cache>/sanity_checks
Date: May 2007 Date: May 2007
KernelVersion: 2.6.22 KernelVersion: 2.6.22
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The sanity_checks file specifies whether expensive checks The sanity_checks file specifies whether expensive checks
should be performed on free and, at minimum, enables double free should be performed on free and, at minimum, enables double free
@ -430,7 +430,7 @@ What: /sys/kernel/slab/<cache>/shrink
Date: May 2007 Date: May 2007
KernelVersion: 2.6.22 KernelVersion: 2.6.22
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The shrink file is used to reclaim unused slab cache The shrink file is used to reclaim unused slab cache
memory from a cache. Empty per-cpu or partial slabs memory from a cache. Empty per-cpu or partial slabs
@ -446,7 +446,7 @@ What: /sys/kernel/slab/<cache>/slab_size
Date: May 2007 Date: May 2007
KernelVersion: 2.6.22 KernelVersion: 2.6.22
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The slab_size file is read-only and specifies the object size The slab_size file is read-only and specifies the object size
with metadata (debugging information and alignment) in bytes. with metadata (debugging information and alignment) in bytes.
@ -455,7 +455,7 @@ What: /sys/kernel/slab/<cache>/slabs
Date: May 2007 Date: May 2007
KernelVersion: 2.6.22 KernelVersion: 2.6.22
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The slabs file is read-only and displays how long many slabs The slabs file is read-only and displays how long many slabs
there are (both cpu and partial) and from which nodes they are there are (both cpu and partial) and from which nodes they are
@ -465,7 +465,7 @@ What: /sys/kernel/slab/<cache>/store_user
Date: May 2007 Date: May 2007
KernelVersion: 2.6.22 KernelVersion: 2.6.22
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The store_user file specifies whether the location of The store_user file specifies whether the location of
allocation or free should be tracked for a cache. allocation or free should be tracked for a cache.
@ -474,7 +474,7 @@ What: /sys/kernel/slab/<cache>/total_objects
Date: April 2008 Date: April 2008
KernelVersion: 2.6.26 KernelVersion: 2.6.26
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The total_objects file is read-only and displays how many total The total_objects file is read-only and displays how many total
objects a cache has and from which nodes they are from. objects a cache has and from which nodes they are from.
@ -483,7 +483,7 @@ What: /sys/kernel/slab/<cache>/trace
Date: May 2007 Date: May 2007
KernelVersion: 2.6.22 KernelVersion: 2.6.22
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
The trace file specifies whether object allocations and frees The trace file specifies whether object allocations and frees
should be traced. should be traced.
@ -492,7 +492,7 @@ What: /sys/kernel/slab/<cache>/validate
Date: May 2007 Date: May 2007
KernelVersion: 2.6.22 KernelVersion: 2.6.22
Contact: Pekka Enberg <penberg@cs.helsinki.fi>, Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Christoph Lameter <cl@linux-foundation.org> Christoph Lameter <cl@gentwo.org>
Description: Description:
Writing to the validate file causes SLUB to traverse all of its Writing to the validate file causes SLUB to traverse all of its
cache's objects and check the validity of metadata. cache's objects and check the validity of metadata.
@ -506,14 +506,14 @@ Description:
What: /sys/kernel/slab/<cache>/slabs_cpu_partial What: /sys/kernel/slab/<cache>/slabs_cpu_partial
Date: Aug 2011 Date: Aug 2011
Contact: Christoph Lameter <cl@linux.com> Contact: Christoph Lameter <cl@gentwo.org>
Description: Description:
This read-only file shows the number of partialli allocated This read-only file shows the number of partialli allocated
frozen slabs. frozen slabs.
What: /sys/kernel/slab/<cache>/cpu_partial What: /sys/kernel/slab/<cache>/cpu_partial
Date: Aug 2011 Date: Aug 2011
Contact: Christoph Lameter <cl@linux.com> Contact: Christoph Lameter <cl@gentwo.org>
Description: Description:
This read-only file shows the number of per cpu partial This read-only file shows the number of per cpu partial
pages to keep around. pages to keep around.

View File

@ -317,6 +317,26 @@ a single line of text and contains the following stats separated by whitespace:
Optional Feature Optional Feature
================ ================
IDLE pages tracking
-------------------
zram has built-in support for idle pages tracking (that is, allocated but
not used pages). This feature is useful for e.g. zram writeback and
recompression. In order to mark pages as idle, execute the following command::
echo all > /sys/block/zramX/idle
This will mark all allocated zram pages as idle. The idle mark will be
removed only when the page (block) is accessed (e.g. overwritten or freed).
Additionally, when CONFIG_ZRAM_TRACK_ENTRY_ACTIME is enabled, pages can be
marked as idle based on how many seconds have passed since the last access to
a particular zram page::
echo 86400 > /sys/block/zramX/idle
In this example, all pages which haven't been accessed in more than 86400
seconds (one day) will be marked idle.
writeback writeback
--------- ---------
@ -331,24 +351,7 @@ If admin wants to use incompressible page writeback, they could do it via::
echo huge > /sys/block/zramX/writeback echo huge > /sys/block/zramX/writeback
To use idle page writeback, first, user need to declare zram pages Admin can request writeback of idle pages at right timing via::
as idle::
echo all > /sys/block/zramX/idle
From now on, any pages on zram are idle pages. The idle mark
will be removed until someone requests access of the block.
IOW, unless there is access request, those pages are still idle pages.
Additionally, when CONFIG_ZRAM_TRACK_ENTRY_ACTIME is enabled pages can be
marked as idle based on how long (in seconds) it's been since they were
last accessed::
echo 86400 > /sys/block/zramX/idle
In this example all pages which haven't been accessed in more than 86400
seconds (one day) will be marked idle.
Admin can request writeback of those idle pages at right timing via::
echo idle > /sys/block/zramX/writeback echo idle > /sys/block/zramX/writeback
@ -369,6 +372,23 @@ they could write a page index into the interface::
echo "page_index=1251" > /sys/block/zramX/writeback echo "page_index=1251" > /sys/block/zramX/writeback
In Linux 6.16 this interface underwent some rework. First, the interface
now supports `key=value` format for all of its parameters (`type=huge_idle`,
etc.) Second, the support for `page_indexes` was introduced, which specify
`LOW-HIGH` range (or ranges) of pages to be written-back. This reduces the
number of syscalls, but more importantly this enables optimal post-processing
target selection strategy. Usage example::
echo "type=idle" > /sys/block/zramX/writeback
echo "page_indexes=1-100 page_indexes=200-300" > \
/sys/block/zramX/writeback
We also now permit multiple page_index params per call and a mix of
single pages and page ranges::
echo page_index=42 page_index=99 page_indexes=100-200 \
page_indexes=500-700 > /sys/block/zramX/writeback
If there are lots of write IO with flash device, potentially, it has If there are lots of write IO with flash device, potentially, it has
flash wearout problem so that admin needs to design write limitation flash wearout problem so that admin needs to design write limitation
to guarantee storage health for entire product life. to guarantee storage health for entire product life.
@ -482,8 +502,6 @@ attempt to recompress:::
echo "type=huge_idle max_pages=42" > /sys/block/zramX/recompress echo "type=huge_idle max_pages=42" > /sys/block/zramX/recompress
Recompression of idle pages requires memory tracking.
During re-compression for every page, that matches re-compression criteria, During re-compression for every page, that matches re-compression criteria,
ZRAM iterates the list of registered alternative compression algorithms in ZRAM iterates the list of registered alternative compression algorithms in
order of their priorities. ZRAM stops either when re-compression was order of their priorities. ZRAM stops either when re-compression was

View File

@ -13,7 +13,7 @@ Portions Copyright (c) 2004-2006 Silicon Graphics, Inc.
Modified by Paul Jackson <pj@sgi.com> Modified by Paul Jackson <pj@sgi.com>
Modified by Christoph Lameter <cl@linux.com> Modified by Christoph Lameter <cl@gentwo.org>
.. CONTENTS: .. CONTENTS:

View File

@ -10,7 +10,7 @@ Written by Simon.Derr@bull.net
- Portions Copyright (c) 2004-2006 Silicon Graphics, Inc. - Portions Copyright (c) 2004-2006 Silicon Graphics, Inc.
- Modified by Paul Jackson <pj@sgi.com> - Modified by Paul Jackson <pj@sgi.com>
- Modified by Christoph Lameter <cl@linux.com> - Modified by Christoph Lameter <cl@gentwo.org>
- Modified by Paul Menage <menage@google.com> - Modified by Paul Menage <menage@google.com>
- Modified by Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> - Modified by Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>

View File

@ -1334,6 +1334,18 @@ PAGE_SIZE multiple when read back.
monitors the limited cgroup to alleviate heavy reclaim monitors the limited cgroup to alleviate heavy reclaim
pressure. pressure.
If memory.high is opened with O_NONBLOCK then the synchronous
reclaim is bypassed. This is useful for admin processes that
need to dynamically adjust the job's memory limits without
expending their own CPU resources on memory reclamation. The
job will trigger the reclaim and/or get throttled on its
next charge request.
Please note that with O_NONBLOCK, there is a chance that the
target memory cgroup may take indefinite amount of time to
reduce usage below the limit due to delayed charge request or
busy-hitting its memory to slow down reclaim.
memory.max memory.max
A read-write single value file which exists on non-root A read-write single value file which exists on non-root
cgroups. The default is "max". cgroups. The default is "max".
@ -1351,6 +1363,18 @@ PAGE_SIZE multiple when read back.
Caller could retry them differently, return into userspace Caller could retry them differently, return into userspace
as -ENOMEM or silently ignore in cases like disk readahead. as -ENOMEM or silently ignore in cases like disk readahead.
If memory.max is opened with O_NONBLOCK, then the synchronous
reclaim and oom-kill are bypassed. This is useful for admin
processes that need to dynamically adjust the job's memory limits
without expending their own CPU resources on memory reclamation.
The job will trigger the reclaim and/or oom-kill on its next
charge request.
Please note that with O_NONBLOCK, there is a chance that the
target memory cgroup may take indefinite amount of time to
reduce usage below the limit due to delayed charge request or
busy-hitting its memory to slow down reclaim.
memory.reclaim memory.reclaim
A write-only nested-keyed file which exists for all cgroups. A write-only nested-keyed file which exists for all cgroups.
@ -1383,6 +1407,9 @@ The following nested keys are defined.
same semantics as vm.swappiness applied to memcg reclaim with same semantics as vm.swappiness applied to memcg reclaim with
all the existing limitations and potential future extensions. all the existing limitations and potential future extensions.
The valid range for swappiness is [0-200, max], setting
swappiness=max exclusively reclaims anonymous memory.
memory.peak memory.peak
A read-write single value file which exists on non-root cgroups. A read-write single value file which exists on non-root cgroups.

View File

@ -2749,6 +2749,31 @@
kgdbwait [KGDB,EARLY] Stop kernel execution and enter the kgdbwait [KGDB,EARLY] Stop kernel execution and enter the
kernel debugger at the earliest opportunity. kernel debugger at the earliest opportunity.
kho= [KEXEC,EARLY]
Format: { "0" | "1" | "off" | "on" | "y" | "n" }
Enables or disables Kexec HandOver.
"0" | "off" | "n" - kexec handover is disabled
"1" | "on" | "y" - kexec handover is enabled
kho_scratch= [KEXEC,EARLY]
Format: ll[KMG],mm[KMG],nn[KMG] | nn%
Defines the size of the KHO scratch region. The KHO
scratch regions are physically contiguous memory
ranges that can only be used for non-kernel
allocations. That way, even when memory is heavily
fragmented with handed over memory, the kexeced
kernel will always have enough contiguous ranges to
bootstrap itself.
It is possible to specify the exact amount of
memory in the form of "ll[KMG],mm[KMG],nn[KMG]"
where the first parameter defines the size of a low
memory scratch area, the second parameter defines
the size of a global scratch area and the third
parameter defines the size of additional per-node
scratch areas. The form "nn%" defines scale factor
(in percents) of memory that was used during boot.
kmac= [MIPS] Korina ethernet MAC address. kmac= [MIPS] Korina ethernet MAC address.
Configure the RouterBoard 532 series on-chip Configure the RouterBoard 532 series on-chip
Ethernet adapter MAC address. Ethernet adapter MAC address.

View File

@ -1,12 +1,11 @@
.. SPDX-License-Identifier: GPL-2.0 .. SPDX-License-Identifier: GPL-2.0
========================== ================================================================
DAMON: Data Access MONitor DAMON: Data Access MONitoring and Access-aware System Operations
========================== ================================================================
:doc:`DAMON </mm/damon/index>` allows light-weight data access monitoring. :doc:`DAMON </mm/damon/index>` is a Linux kernel subsystem for efficient data
Using DAMON, users can analyze the memory access patterns of their systems and access monitoring and access-aware system operations.
optimize those.
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 2

View File

@ -81,7 +81,7 @@ comma (",").
│ │ │ │ │ │ │ :ref:`quotas <sysfs_quotas>`/ms,bytes,reset_interval_ms,effective_bytes │ │ │ │ │ │ │ :ref:`quotas <sysfs_quotas>`/ms,bytes,reset_interval_ms,effective_bytes
│ │ │ │ │ │ │ │ weights/sz_permil,nr_accesses_permil,age_permil │ │ │ │ │ │ │ │ weights/sz_permil,nr_accesses_permil,age_permil
│ │ │ │ │ │ │ │ :ref:`goals <sysfs_schemes_quota_goals>`/nr_goals │ │ │ │ │ │ │ │ :ref:`goals <sysfs_schemes_quota_goals>`/nr_goals
│ │ │ │ │ │ │ │ │ 0/target_metric,target_value,current_value │ │ │ │ │ │ │ │ │ 0/target_metric,target_value,current_value,nid
│ │ │ │ │ │ │ :ref:`watermarks <sysfs_watermarks>`/metric,interval_us,high,mid,low │ │ │ │ │ │ │ :ref:`watermarks <sysfs_watermarks>`/metric,interval_us,high,mid,low
│ │ │ │ │ │ │ :ref:`{core_,ops_,}filters <sysfs_filters>`/nr_filters │ │ │ │ │ │ │ :ref:`{core_,ops_,}filters <sysfs_filters>`/nr_filters
│ │ │ │ │ │ │ │ 0/type,matching,allow,memcg_path,addr_start,addr_end,target_idx,min,max │ │ │ │ │ │ │ │ 0/type,matching,allow,memcg_path,addr_start,addr_end,target_idx,min,max
@ -390,11 +390,11 @@ number (``N``) to the file creates the number of child directories named ``0``
to ``N-1``. Each directory represents each goal and current achievement. to ``N-1``. Each directory represents each goal and current achievement.
Among the multiple feedback, the best one is used. Among the multiple feedback, the best one is used.
Each goal directory contains three files, namely ``target_metric``, Each goal directory contains four files, namely ``target_metric``,
``target_value`` and ``current_value``. Users can set and get the three ``target_value``, ``current_value`` and ``nid``. Users can set and get the
parameters for the quota auto-tuning goals that specified on the :ref:`design four parameters for the quota auto-tuning goals that specified on the
doc <damon_design_damos_quotas_auto_tuning>` by writing to and reading from each :ref:`design doc <damon_design_damos_quotas_auto_tuning>` by writing to and
of the files. Note that users should further write reading from each of the files. Note that users should further write
``commit_schemes_quota_goals`` to the ``state`` file of the :ref:`kdamond ``commit_schemes_quota_goals`` to the ``state`` file of the :ref:`kdamond
directory <sysfs_kdamond>` to pass the feedback to DAMON. directory <sysfs_kdamond>` to pass the feedback to DAMON.

View File

@ -42,3 +42,4 @@ the Linux memory management.
transhuge transhuge
userfaultfd userfaultfd
zswap zswap
kho

View File

@ -0,0 +1,115 @@
.. SPDX-License-Identifier: GPL-2.0-or-later
====================
Kexec Handover Usage
====================
Kexec HandOver (KHO) is a mechanism that allows Linux to preserve memory
regions, which could contain serialized system states, across kexec.
This document expects that you are familiar with the base KHO
:ref:`concepts <kho-concepts>`. If you have not read
them yet, please do so now.
Prerequisites
=============
KHO is available when the kernel is compiled with ``CONFIG_KEXEC_HANDOVER``
set to y. Every KHO producer may have its own config option that you
need to enable if you would like to preserve their respective state across
kexec.
To use KHO, please boot the kernel with the ``kho=on`` command line
parameter. You may use ``kho_scratch`` parameter to define size of the
scratch regions. For example ``kho_scratch=16M,512M,256M`` will reserve a
16 MiB low memory scratch area, a 512 MiB global scratch region, and 256 MiB
per NUMA node scratch regions on boot.
Perform a KHO kexec
===================
First, before you perform a KHO kexec, you need to move the system into
the :ref:`KHO finalization phase <kho-finalization-phase>` ::
$ echo 1 > /sys/kernel/debug/kho/out/finalize
After this command, the KHO FDT is available in
``/sys/kernel/debug/kho/out/fdt``. Other subsystems may also register
their own preserved sub FDTs under
``/sys/kernel/debug/kho/out/sub_fdts/``.
Next, load the target payload and kexec into it. It is important that you
use the ``-s`` parameter to use the in-kernel kexec file loader, as user
space kexec tooling currently has no support for KHO with the user space
based file loader ::
# kexec -l /path/to/bzImage --initrd /path/to/initrd -s
# kexec -e
The new kernel will boot up and contain some of the previous kernel's state.
For example, if you used ``reserve_mem`` command line parameter to create
an early memory reservation, the new kernel will have that memory at the
same physical address as the old kernel.
Abort a KHO exec
================
You can move the system out of KHO finalization phase again by calling ::
$ echo 0 > /sys/kernel/debug/kho/out/active
After this command, the KHO FDT is no longer available in
``/sys/kernel/debug/kho/out/fdt``.
debugfs Interfaces
==================
Currently KHO creates the following debugfs interfaces. Notice that these
interfaces may change in the future. They will be moved to sysfs once KHO is
stabilized.
``/sys/kernel/debug/kho/out/finalize``
Kexec HandOver (KHO) allows Linux to transition the state of
compatible drivers into the next kexec'ed kernel. To do so,
device drivers will instruct KHO to preserve memory regions,
which could contain serialized kernel state.
While the state is serialized, they are unable to perform
any modifications to state that was serialized, such as
handed over memory allocations.
When this file contains "1", the system is in the transition
state. When contains "0", it is not. To switch between the
two states, echo the respective number into this file.
``/sys/kernel/debug/kho/out/fdt``
When KHO state tree is finalized, the kernel exposes the
flattened device tree blob that carries its current KHO
state in this file. Kexec user space tooling can use this
as input file for the KHO payload image.
``/sys/kernel/debug/kho/out/scratch_len``
Lengths of KHO scratch regions, which are physically contiguous
memory regions that will always stay available for future kexec
allocations. Kexec user space tools can use this file to determine
where it should place its payload images.
``/sys/kernel/debug/kho/out/scratch_phys``
Physical locations of KHO scratch regions. Kexec user space tools
can use this file in conjunction to scratch_phys to determine where
it should place its payload images.
``/sys/kernel/debug/kho/out/sub_fdts/``
In the KHO finalization phase, KHO producers register their own
FDT blob under this directory.
``/sys/kernel/debug/kho/in/fdt``
When the kernel was booted with Kexec HandOver (KHO),
the state tree that carries metadata about the previous
kernel's state is in this file in the format of flattened
device tree. This file may disappear when all consumers of
it finished to interpret their metadata.
``/sys/kernel/debug/kho/in/sub_fdts/``
Similar to ``kho/out/sub_fdts/``, but contains sub FDT blobs
of KHO producers passed from the old kernel.

View File

@ -151,8 +151,9 @@ generations less than or equal to ``min_gen_nr``.
``min_gen_nr`` should be less than ``max_gen_nr-1``, since ``min_gen_nr`` should be less than ``max_gen_nr-1``, since
``max_gen_nr`` and ``max_gen_nr-1`` are not fully aged (equivalent to ``max_gen_nr`` and ``max_gen_nr-1`` are not fully aged (equivalent to
the active list) and therefore cannot be evicted. ``swappiness`` the active list) and therefore cannot be evicted. ``swappiness``
overrides the default value in ``/proc/sys/vm/swappiness``. overrides the default value in ``/proc/sys/vm/swappiness`` and the valid
``nr_to_reclaim`` limits the number of pages to evict. range is [0-200, max], with max being exclusively used for the reclamation
of anonymous memory. ``nr_to_reclaim`` limits the number of pages to evict.
A typical use case is that a job scheduler runs this command before it A typical use case is that a job scheduler runs this command before it
tries to land a new job on a server. If it fails to materialize enough tries to land a new job on a server. If it fails to materialize enough

View File

@ -250,6 +250,7 @@ Following flags about pages are currently supported:
- ``PAGE_IS_PFNZERO`` - Page has zero PFN - ``PAGE_IS_PFNZERO`` - Page has zero PFN
- ``PAGE_IS_HUGE`` - Page is PMD-mapped THP or Hugetlb backed - ``PAGE_IS_HUGE`` - Page is PMD-mapped THP or Hugetlb backed
- ``PAGE_IS_SOFT_DIRTY`` - Page is soft-dirty - ``PAGE_IS_SOFT_DIRTY`` - Page is soft-dirty
- ``PAGE_IS_GUARD`` - Page is a part of a guard region
The ``struct pm_scan_arg`` is used as the argument of the IOCTL. The ``struct pm_scan_arg`` is used as the argument of the IOCTL.

View File

@ -132,6 +132,12 @@ to latency spikes in unsuspecting applications. The kernel employs
various heuristics to avoid wasting CPU cycles if it detects that various heuristics to avoid wasting CPU cycles if it detects that
proactive compaction is not being effective. proactive compaction is not being effective.
Setting the value above 80 will, in addition to lowering the acceptable level
of fragmentation, make the compaction code more sensitive to increases in
fragmentation, i.e. compaction will trigger more often, but reduce
fragmentation by a smaller amount.
This makes the fragmentation level more stable over time.
Be careful when setting it to extreme values like 100, as that may Be careful when setting it to extreme values like 100, as that may
cause excessive background compaction activity. cause excessive background compaction activity.

View File

@ -115,6 +115,7 @@ more memory-management documentation in Documentation/mm/index.rst.
pin_user_pages pin_user_pages
boot-time-mm boot-time-mm
gfp_mask-from-fs-io gfp_mask-from-fs-io
kho/index
Interfaces for kernel debugging Interfaces for kernel debugging
=============================== ===============================

View File

@ -0,0 +1,43 @@
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
%YAML 1.2
---
title: Kexec HandOver (KHO) root tree
maintainers:
- Mike Rapoport <rppt@kernel.org>
- Changyuan Lyu <changyuanl@google.com>
description: |
System memory preserved by KHO across kexec.
properties:
compatible:
enum:
- kho-v1
preserved-memory-map:
description: |
physical address (u64) of an in-memory structure describing all preserved
folios and memory ranges.
patternProperties:
"$[0-9a-f_]+^":
$ref: sub-fdt.yaml#
description: physical address of a KHO user's own FDT.
required:
- compatible
- preserved-memory-map
additionalProperties: false
examples:
- |
kho {
compatible = "kho-v1";
preserved-memory-map = <0xf0be16 0x1000000>;
memblock {
fdt = <0x80cc16 0x1000000>;
};
};

View File

@ -0,0 +1,39 @@
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
%YAML 1.2
---
title: Memblock reserved memory
maintainers:
- Mike Rapoport <rppt@kernel.org>
description: |
Memblock can serialize its current memory reservations created with
reserve_mem command line option across kexec through KHO.
The post-KHO kernel can then consume these reservations and they are
guaranteed to have the same physical address.
properties:
compatible:
enum:
- reserve-mem-v1
patternProperties:
"$[0-9a-f_]+^":
$ref: reserve-mem.yaml#
description: reserved memory regions
required:
- compatible
additionalProperties: false
examples:
- |
memblock {
compatible = "memblock-v1";
n1 {
compatible = "reserve-mem-v1";
start = <0xc06b 0x4000000>;
size = <0x04 0x00>;
};
};

View File

@ -0,0 +1,40 @@
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
%YAML 1.2
---
title: Memblock reserved memory regions
maintainers:
- Mike Rapoport <rppt@kernel.org>
description: |
Memblock can serialize its current memory reservations created with
reserve_mem command line option across kexec through KHO.
This object describes each such region.
properties:
compatible:
enum:
- reserve-mem-v1
start:
description: |
physical address (u64) of the reserved memory region.
size:
description: |
size (u64) of the reserved memory region.
required:
- compatible
- start
- size
additionalProperties: false
examples:
- |
n1 {
compatible = "reserve-mem-v1";
start = <0xc06b 0x4000000>;
size = <0x04 0x00>;
};

View File

@ -0,0 +1,27 @@
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
%YAML 1.2
---
title: KHO users' FDT address
maintainers:
- Mike Rapoport <rppt@kernel.org>
- Changyuan Lyu <changyuanl@google.com>
description: |
Physical address of an FDT blob registered by a KHO user.
properties:
fdt:
description: |
physical address (u64) of an FDT blob.
required:
- fdt
additionalProperties: false
examples:
- |
memblock {
fdt = <0x80cc16 0x1000000>;
};

View File

@ -0,0 +1,74 @@
.. SPDX-License-Identifier: GPL-2.0-or-later
.. _kho-concepts:
=======================
Kexec Handover Concepts
=======================
Kexec HandOver (KHO) is a mechanism that allows Linux to preserve memory
regions, which could contain serialized system states, across kexec.
It introduces multiple concepts:
KHO FDT
=======
Every KHO kexec carries a KHO specific flattened device tree (FDT) blob
that describes preserved memory regions. These regions contain either
serialized subsystem states, or in-memory data that shall not be touched
across kexec. After KHO, subsystems can retrieve and restore preserved
memory regions from KHO FDT.
KHO only uses the FDT container format and libfdt library, but does not
adhere to the same property semantics that normal device trees do: Properties
are passed in native endianness and standardized properties like ``regs`` and
``ranges`` do not exist, hence there are no ``#...-cells`` properties.
KHO is still under development. The FDT schema is unstable and would change
in the future.
Scratch Regions
===============
To boot into kexec, we need to have a physically contiguous memory range that
contains no handed over memory. Kexec then places the target kernel and initrd
into that region. The new kernel exclusively uses this region for memory
allocations before during boot up to the initialization of the page allocator.
We guarantee that we always have such regions through the scratch regions: On
first boot KHO allocates several physically contiguous memory regions. Since
after kexec these regions will be used by early memory allocations, there is a
scratch region per NUMA node plus a scratch region to satisfy allocations
requests that do not require particular NUMA node assignment.
By default, size of the scratch region is calculated based on amount of memory
allocated during boot. The ``kho_scratch`` kernel command line option may be
used to explicitly define size of the scratch regions.
The scratch regions are declared as CMA when page allocator is initialized so
that their memory can be used during system lifetime. CMA gives us the
guarantee that no handover pages land in that region, because handover pages
must be at a static physical memory location and CMA enforces that only
movable pages can be located inside.
After KHO kexec, we ignore the ``kho_scratch`` kernel command line option and
instead reuse the exact same region that was originally allocated. This allows
us to recursively execute any amount of KHO kexecs. Because we used this region
for boot memory allocations and as target memory for kexec blobs, some parts
of that memory region may be reserved. These reservations are irrelevant for
the next KHO, because kexec can overwrite even the original kernel.
.. _kho-finalization-phase:
KHO finalization phase
======================
To enable user space based kexec file loader, the kernel needs to be able to
provide the FDT that describes the current kernel's state before
performing the actual kexec. The process of generating that FDT is
called serialization. When the FDT is generated, some properties
of the system may become immutable because they are already written down
in the FDT. That state is called the KHO finalization phase.
Public API
==========
.. kernel-doc:: kernel/kexec_handover.c
:export:

View File

@ -0,0 +1,80 @@
.. SPDX-License-Identifier: GPL-2.0-or-later
=======
KHO FDT
=======
KHO uses the flattened device tree (FDT) container format and libfdt
library to create and parse the data that is passed between the
kernels. The properties in KHO FDT are stored in native format.
It includes the physical address of an in-memory structure describing
all preserved memory regions, as well as physical addresses of KHO users'
own FDTs. Interpreting those sub FDTs is the responsibility of KHO users.
KHO nodes and properties
========================
Property ``preserved-memory-map``
---------------------------------
KHO saves a special property named ``preserved-memory-map`` under the root node.
This node contains the physical address of an in-memory structure for KHO to
preserve memory regions across kexec.
Property ``compatible``
-----------------------
The ``compatible`` property determines compatibility between the kernel
that created the KHO FDT and the kernel that attempts to load it.
If the kernel that loads the KHO FDT is not compatible with it, the entire
KHO process will be bypassed.
Property ``fdt``
----------------
Generally, a KHO user serialize its state into its own FDT and instructs
KHO to preserve the underlying memory, such that after kexec, the new kernel
can recover its state from the preserved FDT.
A KHO user thus can create a node in KHO root tree and save the physical address
of its own FDT in that node's property ``fdt`` .
Examples
========
The following example demonstrates KHO FDT that preserves two memory
regions created with ``reserve_mem`` kernel command line parameter::
/dts-v1/;
/ {
compatible = "kho-v1";
preserved-memory-map = <0x40be16 0x1000000>;
memblock {
fdt = <0x1517 0x1000000>;
};
};
where the ``memblock`` node contains an FDT that is requested by the
subsystem memblock for preservation. The FDT contains the following
serialized data::
/dts-v1/;
/ {
compatible = "memblock-v1";
n1 {
compatible = "reserve-mem-v1";
start = <0xc06b 0x4000000>;
size = <0x04 0x00>;
};
n2 {
compatible = "reserve-mem-v1";
start = <0xc067 0x4000000>;
size = <0x04 0x00>;
};
};

View File

@ -0,0 +1,13 @@
.. SPDX-License-Identifier: GPL-2.0-or-later
========================
Kexec Handover Subsystem
========================
.. toctree::
:maxdepth: 1
concepts
fdt
.. only:: subproject and html

View File

@ -54,7 +54,7 @@ monitoring are address-space dependent.
DAMON consolidates these implementations in a layer called DAMON Operations DAMON consolidates these implementations in a layer called DAMON Operations
Set, and defines the interface between it and the upper layer. The upper layer Set, and defines the interface between it and the upper layer. The upper layer
is dedicated for DAMON's core logics including the mechanism for control of the is dedicated for DAMON's core logics including the mechanism for control of the
monitoring accruracy and the overhead. monitoring accuracy and the overhead.
Hence, DAMON can easily be extended for any address space and/or available Hence, DAMON can easily be extended for any address space and/or available
hardware features by configuring the core logic to use the appropriate hardware features by configuring the core logic to use the appropriate
@ -550,10 +550,10 @@ aggressiveness (the quota) of the corresponding scheme. For example, if DAMOS
is under achieving the goal, DAMOS automatically increases the quota. If DAMOS is under achieving the goal, DAMOS automatically increases the quota. If DAMOS
is over achieving the goal, it decreases the quota. is over achieving the goal, it decreases the quota.
The goal can be specified with three parameters, namely ``target_metric``, The goal can be specified with four parameters, namely ``target_metric``,
``target_value``, and ``current_value``. The auto-tuning mechanism tries to ``target_value``, ``current_value`` and ``nid``. The auto-tuning mechanism
make ``current_value`` of ``target_metric`` be same to ``target_value``. tries to make ``current_value`` of ``target_metric`` be same to
Currently, two ``target_metric`` are provided. ``target_value``.
- ``user_input``: User-provided value. Users could use any metric that they - ``user_input``: User-provided value. Users could use any metric that they
has interest in for the value. Use space main workload's latency or has interest in for the value. Use space main workload's latency or
@ -565,6 +565,11 @@ Currently, two ``target_metric`` are provided.
in microseconds that measured from last quota reset to next quota reset. in microseconds that measured from last quota reset to next quota reset.
DAMOS does the measurement on its own, so only ``target_value`` need to be DAMOS does the measurement on its own, so only ``target_value`` need to be
set by users at the initial time. In other words, DAMOS does self-feedback. set by users at the initial time. In other words, DAMOS does self-feedback.
- ``node_mem_used_bp``: Specific NUMA node's used memory ratio in bp (1/10,000).
- ``node_mem_free_bp``: Specific NUMA node's free memory ratio in bp (1/10,000).
``nid`` is optionally required for only ``node_mem_used_bp`` and
``node_mem_free_bp`` to point the specific NUMA node.
To know how user-space can set the tuning goal metric, the target value, and/or To know how user-space can set the tuning goal metric, the target value, and/or
the current value via :ref:`DAMON sysfs interface <sysfs_interface>`, refer to the current value via :ref:`DAMON sysfs interface <sysfs_interface>`, refer to

View File

@ -1,8 +1,8 @@
.. SPDX-License-Identifier: GPL-2.0 .. SPDX-License-Identifier: GPL-2.0
========================== ================================================================
DAMON: Data Access MONitor DAMON: Data Access MONitoring and Access-aware System Operations
========================== ================================================================
DAMON is a Linux kernel subsystem that provides a framework for data access DAMON is a Linux kernel subsystem that provides a framework for data access
monitoring and the monitoring results based system operations. The core monitoring and the monitoring results based system operations. The core

View File

@ -3152,7 +3152,7 @@ Tiara
(model unknown) (model unknown)
--------------- ---------------
- from Christoph Lameter <christoph@lameter.com> - from Christoph Lameter <cl@gentwo.org>
Here is information about my card as far as I could figure it out:: Here is information about my card as far as I could figure it out::

View File

@ -8946,6 +8946,8 @@ F: include/linux/elf.h
F: include/uapi/linux/auxvec.h F: include/uapi/linux/auxvec.h
F: include/uapi/linux/binfmts.h F: include/uapi/linux/binfmts.h
F: include/uapi/linux/elf.h F: include/uapi/linux/elf.h
F: kernel/fork.c
F: mm/vma_exec.c
F: tools/testing/selftests/exec/ F: tools/testing/selftests/exec/
N: asm/elf.h N: asm/elf.h
N: binfmt N: binfmt
@ -13271,6 +13273,17 @@ F: include/linux/kexec.h
F: include/uapi/linux/kexec.h F: include/uapi/linux/kexec.h
F: kernel/kexec* F: kernel/kexec*
KEXEC HANDOVER (KHO)
M: Alexander Graf <graf@amazon.com>
M: Mike Rapoport <rppt@kernel.org>
M: Changyuan Lyu <changyuanl@google.com>
L: kexec@lists.infradead.org
S: Maintained
F: Documentation/admin-guide/mm/kho.rst
F: Documentation/core-api/kho/*
F: include/linux/kexec_handover.h
F: kernel/kexec_handover.c
KEYS-ENCRYPTED KEYS-ENCRYPTED
M: Mimi Zohar <zohar@linux.ibm.com> M: Mimi Zohar <zohar@linux.ibm.com>
L: linux-integrity@vger.kernel.org L: linux-integrity@vger.kernel.org
@ -15585,6 +15598,7 @@ M: Mike Rapoport <rppt@kernel.org>
L: linux-mm@kvack.org L: linux-mm@kvack.org
S: Maintained S: Maintained
F: Documentation/core-api/boot-time-mm.rst F: Documentation/core-api/boot-time-mm.rst
F: Documentation/core-api/kho/bindings/memblock/*
F: include/linux/memblock.h F: include/linux/memblock.h
F: mm/memblock.c F: mm/memblock.c
F: mm/mm_init.c F: mm/mm_init.c
@ -15675,6 +15689,7 @@ F: include/linux/mm.h
F: include/linux/mm_*.h F: include/linux/mm_*.h
F: include/linux/mmdebug.h F: include/linux/mmdebug.h
F: include/linux/pagewalk.h F: include/linux/pagewalk.h
F: kernel/fork.c
F: mm/Kconfig F: mm/Kconfig
F: mm/debug.c F: mm/debug.c
F: mm/init-mm.c F: mm/init-mm.c
@ -15835,6 +15850,19 @@ F: include/uapi/linux/userfaultfd.h
F: mm/userfaultfd.c F: mm/userfaultfd.c
F: tools/testing/selftests/mm/uffd-*.[ch] F: tools/testing/selftests/mm/uffd-*.[ch]
MEMORY MANAGEMENT - RUST
M: Alice Ryhl <aliceryhl@google.com>
R: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
R: Liam R. Howlett <Liam.Howlett@oracle.com>
L: linux-mm@kvack.org
L: rust-for-linux@vger.kernel.org
S: Maintained
W: http://www.linux-mm.org
T: git git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
F: rust/helpers/mm.c
F: rust/kernel/mm.rs
F: rust/kernel/mm/
MEMORY MAPPING MEMORY MAPPING
M: Andrew Morton <akpm@linux-foundation.org> M: Andrew Morton <akpm@linux-foundation.org>
M: Liam R. Howlett <Liam.Howlett@oracle.com> M: Liam R. Howlett <Liam.Howlett@oracle.com>
@ -15854,7 +15882,10 @@ F: mm/mremap.c
F: mm/mseal.c F: mm/mseal.c
F: mm/vma.c F: mm/vma.c
F: mm/vma.h F: mm/vma.h
F: mm/vma_exec.c
F: mm/vma_init.c
F: mm/vma_internal.h F: mm/vma_internal.h
F: tools/testing/selftests/mm/merge.c
F: tools/testing/vma/ F: tools/testing/vma/
MEMORY MAPPING - LOCKING MEMORY MAPPING - LOCKING
@ -19263,7 +19294,7 @@ F: drivers/net/ethernet/pensando/
PER-CPU MEMORY ALLOCATOR PER-CPU MEMORY ALLOCATOR
M: Dennis Zhou <dennis@kernel.org> M: Dennis Zhou <dennis@kernel.org>
M: Tejun Heo <tj@kernel.org> M: Tejun Heo <tj@kernel.org>
M: Christoph Lameter <cl@linux.com> M: Christoph Lameter <cl@gentwo.org>
L: linux-mm@kvack.org L: linux-mm@kvack.org
S: Maintained S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu.git T: git git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu.git
@ -22011,6 +22042,7 @@ F: include/linux/preempt.h
F: include/linux/sched.h F: include/linux/sched.h
F: include/linux/wait.h F: include/linux/wait.h
F: include/uapi/linux/sched.h F: include/uapi/linux/sched.h
F: kernel/fork.c
F: kernel/sched/ F: kernel/sched/
SCHEDULER - SCHED_EXT SCHEDULER - SCHED_EXT
@ -22664,7 +22696,7 @@ F: Documentation/devicetree/bindings/nvmem/layouts/kontron,sl28-vpd.yaml
F: drivers/nvmem/layouts/sl28vpd.c F: drivers/nvmem/layouts/sl28vpd.c
SLAB ALLOCATOR SLAB ALLOCATOR
M: Christoph Lameter <cl@linux.com> M: Christoph Lameter <cl@gentwo.org>
M: David Rientjes <rientjes@google.com> M: David Rientjes <rientjes@google.com>
M: Andrew Morton <akpm@linux-foundation.org> M: Andrew Morton <akpm@linux-foundation.org>
M: Vlastimil Babka <vbabka@suse.cz> M: Vlastimil Babka <vbabka@suse.cz>
@ -26260,6 +26292,7 @@ W: http://www.linux-mm.org
T: git git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm T: git git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
F: include/linux/vmalloc.h F: include/linux/vmalloc.h
F: mm/vmalloc.c F: mm/vmalloc.c
F: lib/test_vmalloc.c
VME SUBSYSTEM VME SUBSYSTEM
L: linux-kernel@vger.kernel.org L: linux-kernel@vger.kernel.org

View File

@ -192,13 +192,6 @@ extern unsigned long __zero_page(void);
#define pte_pfn(pte) (pte_val(pte) >> PFN_PTE_SHIFT) #define pte_pfn(pte) (pte_val(pte) >> PFN_PTE_SHIFT)
#define pte_page(pte) pfn_to_page(pte_pfn(pte)) #define pte_page(pte) pfn_to_page(pte_pfn(pte))
#define mk_pte(page, pgprot) \
({ \
pte_t pte; \
\
pte_val(pte) = (page_to_pfn(page) << 32) | pgprot_val(pgprot); \
pte; \
})
extern inline pte_t pfn_pte(unsigned long physpfn, pgprot_t pgprot) extern inline pte_t pfn_pte(unsigned long physpfn, pgprot_t pgprot)
{ pte_t pte; pte_val(pte) = (PHYS_TWIDDLE(physpfn) << 32) | pgprot_val(pgprot); return pte; } { pte_t pte; pte_val(pte) = (PHYS_TWIDDLE(physpfn) << 32) | pgprot_val(pgprot); return pte; }

View File

@ -40,8 +40,6 @@ static inline pmd_t pte_pmd(pte_t pte)
#define pmd_young(pmd) pte_young(pmd_pte(pmd)) #define pmd_young(pmd) pte_young(pmd_pte(pmd))
#define pmd_dirty(pmd) pte_dirty(pmd_pte(pmd)) #define pmd_dirty(pmd) pte_dirty(pmd_pte(pmd))
#define mk_pmd(page, prot) pte_pmd(mk_pte(page, prot))
#define pmd_trans_huge(pmd) (pmd_val(pmd) & _PAGE_HW_SZ) #define pmd_trans_huge(pmd) (pmd_val(pmd) & _PAGE_HW_SZ)
#define pfn_pmd(pfn, prot) (__pmd(((pfn) << PAGE_SHIFT) | pgprot_val(prot))) #define pfn_pmd(pfn, prot) (__pmd(((pfn) << PAGE_SHIFT) | pgprot_val(prot)))

View File

@ -142,7 +142,6 @@
#define pmd_pfn(pmd) ((pmd_val(pmd) & PMD_MASK) >> PAGE_SHIFT) #define pmd_pfn(pmd) ((pmd_val(pmd) & PMD_MASK) >> PAGE_SHIFT)
#define pfn_pmd(pfn,prot) __pmd(((pfn) << PAGE_SHIFT) | pgprot_val(prot)) #define pfn_pmd(pfn,prot) __pmd(((pfn) << PAGE_SHIFT) | pgprot_val(prot))
#define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),prot)
#endif #endif
@ -177,7 +176,6 @@
#define set_pte(ptep, pte) ((*(ptep)) = (pte)) #define set_pte(ptep, pte) ((*(ptep)) = (pte))
#define pte_pfn(pte) (pte_val(pte) >> PAGE_SHIFT) #define pte_pfn(pte) (pte_val(pte) >> PAGE_SHIFT)
#define pfn_pte(pfn, prot) __pte(__pfn_to_phys(pfn) | pgprot_val(prot)) #define pfn_pte(pfn, prot) __pte(__pfn_to_phys(pfn) | pgprot_val(prot))
#define mk_pte(page, prot) pfn_pte(page_to_pfn(page), prot)
#ifdef CONFIG_ISA_ARCV2 #ifdef CONFIG_ISA_ARCV2
#define pmd_leaf(x) (pmd_val(x) & _PAGE_HW_SZ) #define pmd_leaf(x) (pmd_val(x) & _PAGE_HW_SZ)

View File

@ -23,6 +23,17 @@ syscall_get_nr(struct task_struct *task, struct pt_regs *regs)
return -1; return -1;
} }
static inline void
syscall_set_nr(struct task_struct *task, struct pt_regs *regs, int nr)
{
/*
* Unlike syscall_get_nr(), syscall_set_nr() can be called only when
* the target task is stopped for tracing on entering syscall, so
* there is no need to have the same check syscall_get_nr() has.
*/
regs->r8 = nr;
}
static inline void static inline void
syscall_rollback(struct task_struct *task, struct pt_regs *regs) syscall_rollback(struct task_struct *task, struct pt_regs *regs)
{ {
@ -67,6 +78,20 @@ syscall_get_arguments(struct task_struct *task, struct pt_regs *regs,
} }
} }
static inline void
syscall_set_arguments(struct task_struct *task, struct pt_regs *regs,
unsigned long *args)
{
unsigned long *inside_ptregs = &regs->r0;
unsigned int n = 6;
unsigned int i = 0;
while (n--) {
*inside_ptregs = args[i++];
inside_ptregs--;
}
}
static inline int static inline int
syscall_get_arch(struct task_struct *task) syscall_get_arch(struct task_struct *task)
{ {

View File

@ -209,7 +209,6 @@ PMD_BIT_FUNC(mkyoung, |= PMD_SECT_AF);
#define pmd_pfn(pmd) (((pmd_val(pmd) & PMD_MASK) & PHYS_MASK) >> PAGE_SHIFT) #define pmd_pfn(pmd) (((pmd_val(pmd) & PMD_MASK) & PHYS_MASK) >> PAGE_SHIFT)
#define pfn_pmd(pfn,prot) (__pmd(((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot))) #define pfn_pmd(pfn,prot) (__pmd(((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot)))
#define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),prot)
/* No hardware dirty/accessed bits -- generic_pmdp_establish() fits */ /* No hardware dirty/accessed bits -- generic_pmdp_establish() fits */
#define pmdp_establish generic_pmdp_establish #define pmdp_establish generic_pmdp_establish

View File

@ -168,7 +168,6 @@ static inline pte_t *pmd_page_vaddr(pmd_t pmd)
#define pfn_pte(pfn,prot) __pte(__pfn_to_phys(pfn) | pgprot_val(prot)) #define pfn_pte(pfn,prot) __pte(__pfn_to_phys(pfn) | pgprot_val(prot))
#define pte_page(pte) pfn_to_page(pte_pfn(pte)) #define pte_page(pte) pfn_to_page(pte_pfn(pte))
#define mk_pte(page,prot) pfn_pte(page_to_pfn(page), prot)
#define pte_clear(mm,addr,ptep) set_pte_ext(ptep, __pte(0), 0) #define pte_clear(mm,addr,ptep) set_pte_ext(ptep, __pte(0), 0)

View File

@ -68,6 +68,30 @@ static inline void syscall_set_return_value(struct task_struct *task,
regs->ARM_r0 = (long) error ? error : val; regs->ARM_r0 = (long) error ? error : val;
} }
static inline void syscall_set_nr(struct task_struct *task,
struct pt_regs *regs,
int nr)
{
if (nr == -1) {
task_thread_info(task)->abi_syscall = -1;
/*
* When the syscall number is set to -1, the syscall will be
* skipped. In this case the syscall return value has to be
* set explicitly, otherwise the first syscall argument is
* returned as the syscall return value.
*/
syscall_set_return_value(task, regs, -ENOSYS, 0);
return;
}
if ((IS_ENABLED(CONFIG_AEABI) && !IS_ENABLED(CONFIG_OABI_COMPAT))) {
task_thread_info(task)->abi_syscall = nr;
return;
}
task_thread_info(task)->abi_syscall =
(task_thread_info(task)->abi_syscall & ~__NR_SYSCALL_MASK) |
(nr & __NR_SYSCALL_MASK);
}
#define SYSCALL_MAX_ARGS 7 #define SYSCALL_MAX_ARGS 7
static inline void syscall_get_arguments(struct task_struct *task, static inline void syscall_get_arguments(struct task_struct *task,
@ -80,6 +104,19 @@ static inline void syscall_get_arguments(struct task_struct *task,
memcpy(args, &regs->ARM_r0 + 1, 5 * sizeof(args[0])); memcpy(args, &regs->ARM_r0 + 1, 5 * sizeof(args[0]));
} }
static inline void syscall_set_arguments(struct task_struct *task,
struct pt_regs *regs,
const unsigned long *args)
{
memcpy(&regs->ARM_r0, args, 6 * sizeof(args[0]));
/*
* Also copy the first argument into ARM_ORIG_r0
* so that syscall_get_arguments() would return it
* instead of the previous value.
*/
regs->ARM_ORIG_r0 = regs->ARM_r0;
}
static inline int syscall_get_arch(struct task_struct *task) static inline int syscall_get_arch(struct task_struct *task)
{ {
/* ARM tasks don't change audit architectures on the fly. */ /* ARM tasks don't change audit architectures on the fly. */

View File

@ -735,7 +735,7 @@ static void *__init late_alloc(unsigned long sz)
void *ptdesc = pagetable_alloc(GFP_PGTABLE_KERNEL & ~__GFP_HIGHMEM, void *ptdesc = pagetable_alloc(GFP_PGTABLE_KERNEL & ~__GFP_HIGHMEM,
get_order(sz)); get_order(sz));
if (!ptdesc || !pagetable_pte_ctor(ptdesc)) if (!ptdesc || !pagetable_pte_ctor(NULL, ptdesc))
BUG(); BUG();
return ptdesc_to_virt(ptdesc); return ptdesc_to_virt(ptdesc);
} }

View File

@ -26,10 +26,10 @@ bool is_swbp_insn(uprobe_opcode_t *insn)
(UPROBE_SWBP_ARM_INSN & 0x0fffffff); (UPROBE_SWBP_ARM_INSN & 0x0fffffff);
} }
int set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, int set_swbp(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
unsigned long vaddr) unsigned long vaddr)
{ {
return uprobe_write_opcode(auprobe, mm, vaddr, return uprobe_write_opcode(auprobe, vma, vaddr,
__opcode_to_mem_arm(auprobe->bpinsn)); __opcode_to_mem_arm(auprobe->bpinsn));
} }

View File

@ -1616,6 +1616,9 @@ config ARCH_SUPPORTS_KEXEC_IMAGE_VERIFY_SIG
config ARCH_DEFAULT_KEXEC_IMAGE_VERIFY_SIG config ARCH_DEFAULT_KEXEC_IMAGE_VERIFY_SIG
def_bool y def_bool y
config ARCH_SUPPORTS_KEXEC_HANDOVER
def_bool y
config ARCH_SUPPORTS_CRASH_DUMP config ARCH_SUPPORTS_CRASH_DUMP
def_bool y def_bool y

View File

@ -11,11 +11,19 @@
#include <asm/types.h> #include <asm/types.h>
typedef u64 pteval_t; /*
typedef u64 pmdval_t; * Page Table Descriptor
typedef u64 pudval_t; *
typedef u64 p4dval_t; * Generic page table descriptor format from which
typedef u64 pgdval_t; * all level specific descriptors can be derived.
*/
typedef u64 ptdesc_t;
typedef ptdesc_t pteval_t;
typedef ptdesc_t pmdval_t;
typedef ptdesc_t pudval_t;
typedef ptdesc_t p4dval_t;
typedef ptdesc_t pgdval_t;
/* /*
* These are used to make use of C type-checking.. * These are used to make use of C type-checking..
@ -46,7 +54,7 @@ typedef struct { pgdval_t pgd; } pgd_t;
#define pgd_val(x) ((x).pgd) #define pgd_val(x) ((x).pgd)
#define __pgd(x) ((pgd_t) { (x) } ) #define __pgd(x) ((pgd_t) { (x) } )
typedef struct { pteval_t pgprot; } pgprot_t; typedef struct { ptdesc_t pgprot; } pgprot_t;
#define pgprot_val(x) ((x).pgprot) #define pgprot_val(x) ((x).pgprot)
#define __pgprot(x) ((pgprot_t) { (x) } ) #define __pgprot(x) ((pgprot_t) { (x) } )

View File

@ -673,7 +673,6 @@ static inline pmd_t pmd_mkspecial(pmd_t pmd)
#define __phys_to_pmd_val(phys) __phys_to_pte_val(phys) #define __phys_to_pmd_val(phys) __phys_to_pte_val(phys)
#define pmd_pfn(pmd) ((__pmd_to_phys(pmd) & PMD_MASK) >> PAGE_SHIFT) #define pmd_pfn(pmd) ((__pmd_to_phys(pmd) & PMD_MASK) >> PAGE_SHIFT)
#define pfn_pmd(pfn,prot) __pmd(__phys_to_pmd_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot)) #define pfn_pmd(pfn,prot) __pmd(__phys_to_pmd_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot))
#define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),prot)
#define pud_young(pud) pte_young(pud_pte(pud)) #define pud_young(pud) pte_young(pud_pte(pud))
#define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud))) #define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud)))
@ -906,12 +905,6 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
/* use ONLY for statically allocated translation tables */ /* use ONLY for statically allocated translation tables */
#define pte_offset_kimg(dir,addr) ((pte_t *)__phys_to_kimg(pte_offset_phys((dir), (addr)))) #define pte_offset_kimg(dir,addr) ((pte_t *)__phys_to_kimg(pte_offset_phys((dir), (addr))))
/*
* Conversion functions: convert a page and protection to a page entry,
* and a page entry and page directory to the page they refer to.
*/
#define mk_pte(page,prot) pfn_pte(page_to_pfn(page),prot)
#if CONFIG_PGTABLE_LEVELS > 2 #if CONFIG_PGTABLE_LEVELS > 2
#define pmd_ERROR(e) \ #define pmd_ERROR(e) \

View File

@ -24,8 +24,8 @@ struct ptdump_info {
}; };
struct ptdump_prot_bits { struct ptdump_prot_bits {
u64 mask; ptdesc_t mask;
u64 val; ptdesc_t val;
const char *set; const char *set;
const char *clear; const char *clear;
}; };
@ -34,7 +34,7 @@ struct ptdump_pg_level {
const struct ptdump_prot_bits *bits; const struct ptdump_prot_bits *bits;
char name[4]; char name[4];
int num; int num;
u64 mask; ptdesc_t mask;
}; };
/* /*
@ -51,7 +51,7 @@ struct ptdump_pg_state {
const struct mm_struct *mm; const struct mm_struct *mm;
unsigned long start_address; unsigned long start_address;
int level; int level;
u64 current_prot; ptdesc_t current_prot;
bool check_wx; bool check_wx;
unsigned long wx_pages; unsigned long wx_pages;
unsigned long uxn_pages; unsigned long uxn_pages;
@ -59,7 +59,13 @@ struct ptdump_pg_state {
void ptdump_walk(struct seq_file *s, struct ptdump_info *info); void ptdump_walk(struct seq_file *s, struct ptdump_info *info);
void note_page(struct ptdump_state *pt_st, unsigned long addr, int level, void note_page(struct ptdump_state *pt_st, unsigned long addr, int level,
u64 val); pteval_t val);
void note_page_pte(struct ptdump_state *st, unsigned long addr, pte_t pte);
void note_page_pmd(struct ptdump_state *st, unsigned long addr, pmd_t pmd);
void note_page_pud(struct ptdump_state *st, unsigned long addr, pud_t pud);
void note_page_p4d(struct ptdump_state *st, unsigned long addr, p4d_t p4d);
void note_page_pgd(struct ptdump_state *st, unsigned long addr, pgd_t pgd);
void note_page_flush(struct ptdump_state *st);
#ifdef CONFIG_PTDUMP_DEBUGFS #ifdef CONFIG_PTDUMP_DEBUGFS
#define EFI_RUNTIME_MAP_END DEFAULT_MAP_WINDOW_64 #define EFI_RUNTIME_MAP_END DEFAULT_MAP_WINDOW_64
void __init ptdump_debugfs_register(struct ptdump_info *info, const char *name); void __init ptdump_debugfs_register(struct ptdump_info *info, const char *name);
@ -69,7 +75,13 @@ static inline void ptdump_debugfs_register(struct ptdump_info *info,
#endif /* CONFIG_PTDUMP_DEBUGFS */ #endif /* CONFIG_PTDUMP_DEBUGFS */
#else #else
static inline void note_page(struct ptdump_state *pt_st, unsigned long addr, static inline void note_page(struct ptdump_state *pt_st, unsigned long addr,
int level, u64 val) { } int level, pteval_t val) { }
static inline void note_page_pte(struct ptdump_state *st, unsigned long addr, pte_t pte) { }
static inline void note_page_pmd(struct ptdump_state *st, unsigned long addr, pmd_t pmd) { }
static inline void note_page_pud(struct ptdump_state *st, unsigned long addr, pud_t pud) { }
static inline void note_page_p4d(struct ptdump_state *st, unsigned long addr, p4d_t p4d) { }
static inline void note_page_pgd(struct ptdump_state *st, unsigned long addr, pgd_t pgd) { }
static inline void note_page_flush(struct ptdump_state *st) { }
#endif /* CONFIG_PTDUMP */ #endif /* CONFIG_PTDUMP */
#endif /* __ASM_PTDUMP_H */ #endif /* __ASM_PTDUMP_H */

View File

@ -61,6 +61,22 @@ static inline void syscall_set_return_value(struct task_struct *task,
regs->regs[0] = val; regs->regs[0] = val;
} }
static inline void syscall_set_nr(struct task_struct *task,
struct pt_regs *regs,
int nr)
{
regs->syscallno = nr;
if (nr == -1) {
/*
* When the syscall number is set to -1, the syscall will be
* skipped. In this case the syscall return value has to be
* set explicitly, otherwise the first syscall argument is
* returned as the syscall return value.
*/
syscall_set_return_value(task, regs, -ENOSYS, 0);
}
}
#define SYSCALL_MAX_ARGS 6 #define SYSCALL_MAX_ARGS 6
static inline void syscall_get_arguments(struct task_struct *task, static inline void syscall_get_arguments(struct task_struct *task,
@ -73,6 +89,19 @@ static inline void syscall_get_arguments(struct task_struct *task,
memcpy(args, &regs->regs[1], 5 * sizeof(args[0])); memcpy(args, &regs->regs[1], 5 * sizeof(args[0]));
} }
static inline void syscall_set_arguments(struct task_struct *task,
struct pt_regs *regs,
const unsigned long *args)
{
memcpy(&regs->regs[0], args, 6 * sizeof(args[0]));
/*
* Also copy the first argument into orig_x0
* so that syscall_get_arguments() would return it
* instead of the previous value.
*/
regs->orig_x0 = regs->regs[0];
}
/* /*
* We don't care about endianness (__AUDIT_ARCH_LE bit) here because * We don't care about endianness (__AUDIT_ARCH_LE bit) here because
* AArch64 has the same system calls both on little- and big- endian. * AArch64 has the same system calls both on little- and big- endian.

View File

@ -29,7 +29,7 @@ static bool region_is_misaligned(const efi_memory_desc_t *md)
* executable, everything else can be mapped with the XN bits * executable, everything else can be mapped with the XN bits
* set. Also take the new (optional) RO/XP bits into account. * set. Also take the new (optional) RO/XP bits into account.
*/ */
static __init pteval_t create_mapping_protection(efi_memory_desc_t *md) static __init ptdesc_t create_mapping_protection(efi_memory_desc_t *md)
{ {
u64 attr = md->attribute; u64 attr = md->attribute;
u32 type = md->type; u32 type = md->type;
@ -83,7 +83,7 @@ static __init pteval_t create_mapping_protection(efi_memory_desc_t *md)
int __init efi_create_mapping(struct mm_struct *mm, efi_memory_desc_t *md) int __init efi_create_mapping(struct mm_struct *mm, efi_memory_desc_t *md)
{ {
pteval_t prot_val = create_mapping_protection(md); ptdesc_t prot_val = create_mapping_protection(md);
bool page_mappings_only = (md->type == EFI_RUNTIME_SERVICES_CODE || bool page_mappings_only = (md->type == EFI_RUNTIME_SERVICES_CODE ||
md->type == EFI_RUNTIME_SERVICES_DATA); md->type == EFI_RUNTIME_SERVICES_DATA);

View File

@ -159,7 +159,7 @@ static void noinline __section(".idmap.text") set_ttbr0_for_lpa2(u64 ttbr)
static void __init remap_idmap_for_lpa2(void) static void __init remap_idmap_for_lpa2(void)
{ {
/* clear the bits that change meaning once LPA2 is turned on */ /* clear the bits that change meaning once LPA2 is turned on */
pteval_t mask = PTE_SHARED; ptdesc_t mask = PTE_SHARED;
/* /*
* We have to clear bits [9:8] in all block or page descriptors in the * We have to clear bits [9:8] in all block or page descriptors in the

View File

@ -30,7 +30,7 @@ void __init map_range(u64 *pte, u64 start, u64 end, u64 pa, pgprot_t prot,
int level, pte_t *tbl, bool may_use_cont, u64 va_offset) int level, pte_t *tbl, bool may_use_cont, u64 va_offset)
{ {
u64 cmask = (level == 3) ? CONT_PTE_SIZE - 1 : U64_MAX; u64 cmask = (level == 3) ? CONT_PTE_SIZE - 1 : U64_MAX;
pteval_t protval = pgprot_val(prot) & ~PTE_TYPE_MASK; ptdesc_t protval = pgprot_val(prot) & ~PTE_TYPE_MASK;
int lshift = (3 - level) * PTDESC_TABLE_SHIFT; int lshift = (3 - level) * PTDESC_TABLE_SHIFT;
u64 lmask = (PAGE_SIZE << lshift) - 1; u64 lmask = (PAGE_SIZE << lshift) - 1;
@ -87,7 +87,7 @@ void __init map_range(u64 *pte, u64 start, u64 end, u64 pa, pgprot_t prot,
} }
} }
asmlinkage u64 __init create_init_idmap(pgd_t *pg_dir, pteval_t clrmask) asmlinkage u64 __init create_init_idmap(pgd_t *pg_dir, ptdesc_t clrmask)
{ {
u64 ptep = (u64)pg_dir + PAGE_SIZE; u64 ptep = (u64)pg_dir + PAGE_SIZE;
pgprot_t text_prot = PAGE_KERNEL_ROX; pgprot_t text_prot = PAGE_KERNEL_ROX;

View File

@ -34,4 +34,4 @@ void map_range(u64 *pgd, u64 start, u64 end, u64 pa, pgprot_t prot,
asmlinkage void early_map_kernel(u64 boot_status, void *fdt); asmlinkage void early_map_kernel(u64 boot_status, void *fdt);
asmlinkage u64 create_init_idmap(pgd_t *pgd, pteval_t clrmask); asmlinkage u64 create_init_idmap(pgd_t *pgd, ptdesc_t clrmask);

View File

@ -83,7 +83,7 @@ arch_initcall(adjust_protection_map);
pgprot_t vm_get_page_prot(unsigned long vm_flags) pgprot_t vm_get_page_prot(unsigned long vm_flags)
{ {
pteval_t prot; ptdesc_t prot;
/* Short circuit GCS to avoid bloating the table. */ /* Short circuit GCS to avoid bloating the table. */
if (system_supports_gcs() && (vm_flags & VM_SHADOW_STACK)) { if (system_supports_gcs() && (vm_flags & VM_SHADOW_STACK)) {

View File

@ -46,6 +46,13 @@
#define NO_CONT_MAPPINGS BIT(1) #define NO_CONT_MAPPINGS BIT(1)
#define NO_EXEC_MAPPINGS BIT(2) /* assumes FEAT_HPDS is not used */ #define NO_EXEC_MAPPINGS BIT(2) /* assumes FEAT_HPDS is not used */
enum pgtable_type {
TABLE_PTE,
TABLE_PMD,
TABLE_PUD,
TABLE_P4D,
};
u64 kimage_voffset __ro_after_init; u64 kimage_voffset __ro_after_init;
EXPORT_SYMBOL(kimage_voffset); EXPORT_SYMBOL(kimage_voffset);
@ -107,7 +114,7 @@ pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
} }
EXPORT_SYMBOL(phys_mem_access_prot); EXPORT_SYMBOL(phys_mem_access_prot);
static phys_addr_t __init early_pgtable_alloc(int shift) static phys_addr_t __init early_pgtable_alloc(enum pgtable_type pgtable_type)
{ {
phys_addr_t phys; phys_addr_t phys;
@ -192,7 +199,7 @@ static void init_pte(pte_t *ptep, unsigned long addr, unsigned long end,
static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr, static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
unsigned long end, phys_addr_t phys, unsigned long end, phys_addr_t phys,
pgprot_t prot, pgprot_t prot,
phys_addr_t (*pgtable_alloc)(int), phys_addr_t (*pgtable_alloc)(enum pgtable_type),
int flags) int flags)
{ {
unsigned long next; unsigned long next;
@ -207,7 +214,7 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
if (flags & NO_EXEC_MAPPINGS) if (flags & NO_EXEC_MAPPINGS)
pmdval |= PMD_TABLE_PXN; pmdval |= PMD_TABLE_PXN;
BUG_ON(!pgtable_alloc); BUG_ON(!pgtable_alloc);
pte_phys = pgtable_alloc(PAGE_SHIFT); pte_phys = pgtable_alloc(TABLE_PTE);
ptep = pte_set_fixmap(pte_phys); ptep = pte_set_fixmap(pte_phys);
init_clear_pgtable(ptep); init_clear_pgtable(ptep);
ptep += pte_index(addr); ptep += pte_index(addr);
@ -243,7 +250,7 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
static void init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end, static void init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end,
phys_addr_t phys, pgprot_t prot, phys_addr_t phys, pgprot_t prot,
phys_addr_t (*pgtable_alloc)(int), int flags) phys_addr_t (*pgtable_alloc)(enum pgtable_type), int flags)
{ {
unsigned long next; unsigned long next;
@ -277,7 +284,8 @@ static void init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end,
static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr, static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr,
unsigned long end, phys_addr_t phys, unsigned long end, phys_addr_t phys,
pgprot_t prot, pgprot_t prot,
phys_addr_t (*pgtable_alloc)(int), int flags) phys_addr_t (*pgtable_alloc)(enum pgtable_type),
int flags)
{ {
unsigned long next; unsigned long next;
pud_t pud = READ_ONCE(*pudp); pud_t pud = READ_ONCE(*pudp);
@ -294,7 +302,7 @@ static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr,
if (flags & NO_EXEC_MAPPINGS) if (flags & NO_EXEC_MAPPINGS)
pudval |= PUD_TABLE_PXN; pudval |= PUD_TABLE_PXN;
BUG_ON(!pgtable_alloc); BUG_ON(!pgtable_alloc);
pmd_phys = pgtable_alloc(PMD_SHIFT); pmd_phys = pgtable_alloc(TABLE_PMD);
pmdp = pmd_set_fixmap(pmd_phys); pmdp = pmd_set_fixmap(pmd_phys);
init_clear_pgtable(pmdp); init_clear_pgtable(pmdp);
pmdp += pmd_index(addr); pmdp += pmd_index(addr);
@ -325,7 +333,7 @@ static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr,
static void alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end, static void alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end,
phys_addr_t phys, pgprot_t prot, phys_addr_t phys, pgprot_t prot,
phys_addr_t (*pgtable_alloc)(int), phys_addr_t (*pgtable_alloc)(enum pgtable_type),
int flags) int flags)
{ {
unsigned long next; unsigned long next;
@ -339,7 +347,7 @@ static void alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end,
if (flags & NO_EXEC_MAPPINGS) if (flags & NO_EXEC_MAPPINGS)
p4dval |= P4D_TABLE_PXN; p4dval |= P4D_TABLE_PXN;
BUG_ON(!pgtable_alloc); BUG_ON(!pgtable_alloc);
pud_phys = pgtable_alloc(PUD_SHIFT); pud_phys = pgtable_alloc(TABLE_PUD);
pudp = pud_set_fixmap(pud_phys); pudp = pud_set_fixmap(pud_phys);
init_clear_pgtable(pudp); init_clear_pgtable(pudp);
pudp += pud_index(addr); pudp += pud_index(addr);
@ -383,7 +391,7 @@ static void alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end,
static void alloc_init_p4d(pgd_t *pgdp, unsigned long addr, unsigned long end, static void alloc_init_p4d(pgd_t *pgdp, unsigned long addr, unsigned long end,
phys_addr_t phys, pgprot_t prot, phys_addr_t phys, pgprot_t prot,
phys_addr_t (*pgtable_alloc)(int), phys_addr_t (*pgtable_alloc)(enum pgtable_type),
int flags) int flags)
{ {
unsigned long next; unsigned long next;
@ -397,7 +405,7 @@ static void alloc_init_p4d(pgd_t *pgdp, unsigned long addr, unsigned long end,
if (flags & NO_EXEC_MAPPINGS) if (flags & NO_EXEC_MAPPINGS)
pgdval |= PGD_TABLE_PXN; pgdval |= PGD_TABLE_PXN;
BUG_ON(!pgtable_alloc); BUG_ON(!pgtable_alloc);
p4d_phys = pgtable_alloc(P4D_SHIFT); p4d_phys = pgtable_alloc(TABLE_P4D);
p4dp = p4d_set_fixmap(p4d_phys); p4dp = p4d_set_fixmap(p4d_phys);
init_clear_pgtable(p4dp); init_clear_pgtable(p4dp);
p4dp += p4d_index(addr); p4dp += p4d_index(addr);
@ -427,7 +435,7 @@ static void alloc_init_p4d(pgd_t *pgdp, unsigned long addr, unsigned long end,
static void __create_pgd_mapping_locked(pgd_t *pgdir, phys_addr_t phys, static void __create_pgd_mapping_locked(pgd_t *pgdir, phys_addr_t phys,
unsigned long virt, phys_addr_t size, unsigned long virt, phys_addr_t size,
pgprot_t prot, pgprot_t prot,
phys_addr_t (*pgtable_alloc)(int), phys_addr_t (*pgtable_alloc)(enum pgtable_type),
int flags) int flags)
{ {
unsigned long addr, end, next; unsigned long addr, end, next;
@ -455,7 +463,7 @@ static void __create_pgd_mapping_locked(pgd_t *pgdir, phys_addr_t phys,
static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys, static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
unsigned long virt, phys_addr_t size, unsigned long virt, phys_addr_t size,
pgprot_t prot, pgprot_t prot,
phys_addr_t (*pgtable_alloc)(int), phys_addr_t (*pgtable_alloc)(enum pgtable_type),
int flags) int flags)
{ {
mutex_lock(&fixmap_lock); mutex_lock(&fixmap_lock);
@ -468,39 +476,50 @@ static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
extern __alias(__create_pgd_mapping_locked) extern __alias(__create_pgd_mapping_locked)
void create_kpti_ng_temp_pgd(pgd_t *pgdir, phys_addr_t phys, unsigned long virt, void create_kpti_ng_temp_pgd(pgd_t *pgdir, phys_addr_t phys, unsigned long virt,
phys_addr_t size, pgprot_t prot, phys_addr_t size, pgprot_t prot,
phys_addr_t (*pgtable_alloc)(int), int flags); phys_addr_t (*pgtable_alloc)(enum pgtable_type),
int flags);
#endif #endif
static phys_addr_t __pgd_pgtable_alloc(int shift) static phys_addr_t __pgd_pgtable_alloc(struct mm_struct *mm,
enum pgtable_type pgtable_type)
{ {
/* Page is zeroed by init_clear_pgtable() so don't duplicate effort. */ /* Page is zeroed by init_clear_pgtable() so don't duplicate effort. */
void *ptr = (void *)__get_free_page(GFP_PGTABLE_KERNEL & ~__GFP_ZERO); struct ptdesc *ptdesc = pagetable_alloc(GFP_PGTABLE_KERNEL & ~__GFP_ZERO, 0);
phys_addr_t pa;
BUG_ON(!ptr); BUG_ON(!ptdesc);
return __pa(ptr); pa = page_to_phys(ptdesc_page(ptdesc));
switch (pgtable_type) {
case TABLE_PTE:
BUG_ON(!pagetable_pte_ctor(mm, ptdesc));
break;
case TABLE_PMD:
BUG_ON(!pagetable_pmd_ctor(mm, ptdesc));
break;
case TABLE_PUD:
pagetable_pud_ctor(ptdesc);
break;
case TABLE_P4D:
pagetable_p4d_ctor(ptdesc);
break;
} }
static phys_addr_t pgd_pgtable_alloc(int shift)
{
phys_addr_t pa = __pgd_pgtable_alloc(shift);
struct ptdesc *ptdesc = page_ptdesc(phys_to_page(pa));
/*
* Call proper page table ctor in case later we need to
* call core mm functions like apply_to_page_range() on
* this pre-allocated page table.
*
* We don't select ARCH_ENABLE_SPLIT_PMD_PTLOCK if pmd is
* folded, and if so pagetable_pte_ctor() becomes nop.
*/
if (shift == PAGE_SHIFT)
BUG_ON(!pagetable_pte_ctor(ptdesc));
else if (shift == PMD_SHIFT)
BUG_ON(!pagetable_pmd_ctor(ptdesc));
return pa; return pa;
} }
static phys_addr_t __maybe_unused
pgd_pgtable_alloc_init_mm(enum pgtable_type pgtable_type)
{
return __pgd_pgtable_alloc(&init_mm, pgtable_type);
}
static phys_addr_t
pgd_pgtable_alloc_special_mm(enum pgtable_type pgtable_type)
{
return __pgd_pgtable_alloc(NULL, pgtable_type);
}
/* /*
* This function can only be used to modify existing table entries, * This function can only be used to modify existing table entries,
* without allocating new levels of table. Note that this permits the * without allocating new levels of table. Note that this permits the
@ -530,7 +549,7 @@ void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS; flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
__create_pgd_mapping(mm->pgd, phys, virt, size, prot, __create_pgd_mapping(mm->pgd, phys, virt, size, prot,
pgd_pgtable_alloc, flags); pgd_pgtable_alloc_special_mm, flags);
} }
static void update_mapping_prot(phys_addr_t phys, unsigned long virt, static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
@ -744,7 +763,7 @@ static int __init map_entry_trampoline(void)
memset(tramp_pg_dir, 0, PGD_SIZE); memset(tramp_pg_dir, 0, PGD_SIZE);
__create_pgd_mapping(tramp_pg_dir, pa_start, TRAMP_VALIAS, __create_pgd_mapping(tramp_pg_dir, pa_start, TRAMP_VALIAS,
entry_tramp_text_size(), prot, entry_tramp_text_size(), prot,
__pgd_pgtable_alloc, NO_BLOCK_MAPPINGS); pgd_pgtable_alloc_init_mm, NO_BLOCK_MAPPINGS);
/* Map both the text and data into the kernel page table */ /* Map both the text and data into the kernel page table */
for (i = 0; i < DIV_ROUND_UP(entry_tramp_text_size(), PAGE_SIZE); i++) for (i = 0; i < DIV_ROUND_UP(entry_tramp_text_size(), PAGE_SIZE); i++)
@ -1350,7 +1369,7 @@ int arch_add_memory(int nid, u64 start, u64 size,
flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS; flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
__create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start), __create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start),
size, params->pgprot, __pgd_pgtable_alloc, size, params->pgprot, pgd_pgtable_alloc_init_mm,
flags); flags);
memblock_clear_nomap(start, size); memblock_clear_nomap(start, size);

View File

@ -189,12 +189,12 @@ static void note_prot_wx(struct ptdump_pg_state *st, unsigned long addr)
} }
void note_page(struct ptdump_state *pt_st, unsigned long addr, int level, void note_page(struct ptdump_state *pt_st, unsigned long addr, int level,
u64 val) pteval_t val)
{ {
struct ptdump_pg_state *st = container_of(pt_st, struct ptdump_pg_state, ptdump); struct ptdump_pg_state *st = container_of(pt_st, struct ptdump_pg_state, ptdump);
struct ptdump_pg_level *pg_level = st->pg_level; struct ptdump_pg_level *pg_level = st->pg_level;
static const char units[] = "KMGTPE"; static const char units[] = "KMGTPE";
u64 prot = 0; ptdesc_t prot = 0;
/* check if the current level has been folded dynamically */ /* check if the current level has been folded dynamically */
if (st->mm && ((level == 1 && mm_p4d_folded(st->mm)) || if (st->mm && ((level == 1 && mm_p4d_folded(st->mm)) ||
@ -251,6 +251,38 @@ void note_page(struct ptdump_state *pt_st, unsigned long addr, int level,
} }
void note_page_pte(struct ptdump_state *pt_st, unsigned long addr, pte_t pte)
{
note_page(pt_st, addr, 4, pte_val(pte));
}
void note_page_pmd(struct ptdump_state *pt_st, unsigned long addr, pmd_t pmd)
{
note_page(pt_st, addr, 3, pmd_val(pmd));
}
void note_page_pud(struct ptdump_state *pt_st, unsigned long addr, pud_t pud)
{
note_page(pt_st, addr, 2, pud_val(pud));
}
void note_page_p4d(struct ptdump_state *pt_st, unsigned long addr, p4d_t p4d)
{
note_page(pt_st, addr, 1, p4d_val(p4d));
}
void note_page_pgd(struct ptdump_state *pt_st, unsigned long addr, pgd_t pgd)
{
note_page(pt_st, addr, 0, pgd_val(pgd));
}
void note_page_flush(struct ptdump_state *pt_st)
{
pte_t pte_zero = {0};
note_page(pt_st, 0, -1, pte_val(pte_zero));
}
void ptdump_walk(struct seq_file *s, struct ptdump_info *info) void ptdump_walk(struct seq_file *s, struct ptdump_info *info)
{ {
unsigned long end = ~0UL; unsigned long end = ~0UL;
@ -266,7 +298,12 @@ void ptdump_walk(struct seq_file *s, struct ptdump_info *info)
.pg_level = &kernel_pg_levels[0], .pg_level = &kernel_pg_levels[0],
.level = -1, .level = -1,
.ptdump = { .ptdump = {
.note_page = note_page, .note_page_pte = note_page_pte,
.note_page_pmd = note_page_pmd,
.note_page_pud = note_page_pud,
.note_page_p4d = note_page_p4d,
.note_page_pgd = note_page_pgd,
.note_page_flush = note_page_flush,
.range = (struct ptdump_range[]){ .range = (struct ptdump_range[]){
{info->base_addr, end}, {info->base_addr, end},
{0, 0} {0, 0}
@ -303,7 +340,12 @@ bool ptdump_check_wx(void)
.level = -1, .level = -1,
.check_wx = true, .check_wx = true,
.ptdump = { .ptdump = {
.note_page = note_page, .note_page_pte = note_page_pte,
.note_page_pmd = note_page_pmd,
.note_page_pud = note_page_pud,
.note_page_p4d = note_page_p4d,
.note_page_pgd = note_page_pgd,
.note_page_flush = note_page_flush,
.range = (struct ptdump_range[]) { .range = (struct ptdump_range[]) {
{_PAGE_OFFSET(vabits_actual), ~0UL}, {_PAGE_OFFSET(vabits_actual), ~0UL},
{0, 0} {0, 0}

View File

@ -29,7 +29,7 @@ static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
pte_t *pte; pte_t *pte;
unsigned long i; unsigned long i;
pte = (pte_t *) __get_free_page(GFP_KERNEL); pte = __pte_alloc_one_kernel(mm);
if (!pte) if (!pte)
return NULL; return NULL;

View File

@ -249,11 +249,6 @@ static inline pgprot_t pgprot_writecombine(pgprot_t _prot)
return __pgprot(prot); return __pgprot(prot);
} }
/*
* Conversion functions: convert a page and protection to a page entry,
* and a page entry and page directory to the page they refer to.
*/
#define mk_pte(page, pgprot) pfn_pte(page_to_pfn(page), (pgprot))
static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
{ {
return __pte((pte_val(pte) & _PAGE_CHG_MASK) | return __pte((pte_val(pte) & _PAGE_CHG_MASK) |

View File

@ -59,6 +59,19 @@ syscall_get_arguments(struct task_struct *task, struct pt_regs *regs,
memcpy(args, &regs->a1, 5 * sizeof(args[0])); memcpy(args, &regs->a1, 5 * sizeof(args[0]));
} }
static inline void
syscall_set_arguments(struct task_struct *task, struct pt_regs *regs,
const unsigned long *args)
{
memcpy(&regs->a0, args, 6 * sizeof(regs->a0));
/*
* Also copy the first argument into orig_a0
* so that syscall_get_arguments() would return it
* instead of the previous value.
*/
regs->orig_a0 = regs->a0;
}
static inline int static inline int
syscall_get_arch(struct task_struct *task) syscall_get_arch(struct task_struct *task)
{ {

View File

@ -238,9 +238,6 @@ static inline int pte_present(pte_t pte)
return pte_val(pte) & _PAGE_PRESENT; return pte_val(pte) & _PAGE_PRESENT;
} }
/* mk_pte - make a PTE out of a page pointer and protection bits */
#define mk_pte(page, pgprot) pfn_pte(page_to_pfn(page), (pgprot))
/* pte_page - returns a page (frame pointer/descriptor?) based on a PTE */ /* pte_page - returns a page (frame pointer/descriptor?) based on a PTE */
#define pte_page(x) pfn_to_page(pte_pfn(x)) #define pte_page(x) pfn_to_page(pte_pfn(x))

View File

@ -26,6 +26,13 @@ static inline long syscall_get_nr(struct task_struct *task,
return regs->r06; return regs->r06;
} }
static inline void syscall_set_nr(struct task_struct *task,
struct pt_regs *regs,
int nr)
{
regs->r06 = nr;
}
static inline void syscall_get_arguments(struct task_struct *task, static inline void syscall_get_arguments(struct task_struct *task,
struct pt_regs *regs, struct pt_regs *regs,
unsigned long *args) unsigned long *args)
@ -33,6 +40,13 @@ static inline void syscall_get_arguments(struct task_struct *task,
memcpy(args, &(&regs->r00)[0], 6 * sizeof(args[0])); memcpy(args, &(&regs->r00)[0], 6 * sizeof(args[0]));
} }
static inline void syscall_set_arguments(struct task_struct *task,
struct pt_regs *regs,
unsigned long *args)
{
memcpy(&(&regs->r00)[0], args, 6 * sizeof(args[0]));
}
static inline long syscall_get_error(struct task_struct *task, static inline long syscall_get_error(struct task_struct *task,
struct pt_regs *regs) struct pt_regs *regs)
{ {
@ -45,6 +59,13 @@ static inline long syscall_get_return_value(struct task_struct *task,
return regs->r00; return regs->r00;
} }
static inline void syscall_set_return_value(struct task_struct *task,
struct pt_regs *regs,
int error, long val)
{
regs->r00 = (long) error ?: val;
}
static inline int syscall_get_arch(struct task_struct *task) static inline int syscall_get_arch(struct task_struct *task)
{ {
return AUDIT_ARCH_HEXAGON; return AUDIT_ARCH_HEXAGON;

View File

@ -69,7 +69,7 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
if (!ptdesc) if (!ptdesc)
return NULL; return NULL;
if (!pagetable_pmd_ctor(ptdesc)) { if (!pagetable_pmd_ctor(mm, ptdesc)) {
pagetable_free(ptdesc); pagetable_free(ptdesc);
return NULL; return NULL;
} }

View File

@ -255,7 +255,6 @@ static inline void pmd_clear(pmd_t *pmdp)
#define pmd_page_vaddr(pmd) pmd_val(pmd) #define pmd_page_vaddr(pmd) pmd_val(pmd)
extern pmd_t mk_pmd(struct page *page, pgprot_t prot);
extern void set_pmd_at(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp, pmd_t pmd); extern void set_pmd_at(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp, pmd_t pmd);
#define pte_page(x) pfn_to_page(pte_pfn(x)) #define pte_page(x) pfn_to_page(pte_pfn(x))
@ -426,12 +425,6 @@ static inline unsigned long pte_accessible(struct mm_struct *mm, pte_t a)
return false; return false;
} }
/*
* Conversion functions: convert a page and protection to a page entry,
* and a page entry and page directory to the page they refer to.
*/
#define mk_pte(page, pgprot) pfn_pte(page_to_pfn(page), (pgprot))
static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
{ {
return __pte((pte_val(pte) & _PAGE_CHG_MASK) | return __pte((pte_val(pte) & _PAGE_CHG_MASK) |

View File

@ -26,6 +26,13 @@ static inline long syscall_get_nr(struct task_struct *task,
return regs->regs[11]; return regs->regs[11];
} }
static inline void syscall_set_nr(struct task_struct *task,
struct pt_regs *regs,
int nr)
{
regs->regs[11] = nr;
}
static inline void syscall_rollback(struct task_struct *task, static inline void syscall_rollback(struct task_struct *task,
struct pt_regs *regs) struct pt_regs *regs)
{ {
@ -61,6 +68,14 @@ static inline void syscall_get_arguments(struct task_struct *task,
memcpy(&args[1], &regs->regs[5], 5 * sizeof(long)); memcpy(&args[1], &regs->regs[5], 5 * sizeof(long));
} }
static inline void syscall_set_arguments(struct task_struct *task,
struct pt_regs *regs,
unsigned long *args)
{
regs->orig_a0 = args[0];
memcpy(&regs->regs[5], &args[1], 5 * sizeof(long));
}
static inline int syscall_get_arch(struct task_struct *task) static inline int syscall_get_arch(struct task_struct *task)
{ {
return AUDIT_ARCH_LOONGARCH64; return AUDIT_ARCH_LOONGARCH64;

View File

@ -135,15 +135,6 @@ void kernel_pte_init(void *addr)
} while (p != end); } while (p != end);
} }
pmd_t mk_pmd(struct page *page, pgprot_t prot)
{
pmd_t pmd;
pmd_val(pmd) = (page_to_pfn(page) << PFN_PTE_SHIFT) | pgprot_val(prot);
return pmd;
}
void set_pmd_at(struct mm_struct *mm, unsigned long addr, void set_pmd_at(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp, pmd_t pmd) pmd_t *pmdp, pmd_t pmd)
{ {

View File

@ -7,7 +7,7 @@
static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte) static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
{ {
pagetable_free(virt_to_ptdesc(pte)); pagetable_dtor_free(virt_to_ptdesc(pte));
} }
extern const char bad_pmd_string[]; extern const char bad_pmd_string[];
@ -19,6 +19,10 @@ static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
if (!ptdesc) if (!ptdesc)
return NULL; return NULL;
if (!pagetable_pte_ctor(mm, ptdesc)) {
pagetable_free(ptdesc);
return NULL;
}
return ptdesc_address(ptdesc); return ptdesc_address(ptdesc);
} }
@ -48,7 +52,7 @@ static inline pgtable_t pte_alloc_one(struct mm_struct *mm)
if (!ptdesc) if (!ptdesc)
return NULL; return NULL;
if (!pagetable_pte_ctor(ptdesc)) { if (!pagetable_pte_ctor(mm, ptdesc)) {
pagetable_free(ptdesc); pagetable_free(ptdesc);
return NULL; return NULL;
} }

View File

@ -96,12 +96,6 @@
#define pmd_pgtable(pmd) pfn_to_virt(pmd_val(pmd) >> PAGE_SHIFT) #define pmd_pgtable(pmd) pfn_to_virt(pmd_val(pmd) >> PAGE_SHIFT)
/*
* Conversion functions: convert a page and protection to a page entry,
* and a page entry and page directory to the page they refer to.
*/
#define mk_pte(page, pgprot) pfn_pte(page_to_pfn(page), (pgprot))
static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
{ {
pte_val(pte) = (pte_val(pte) & CF_PAGE_CHG_MASK) | pgprot_val(newprot); pte_val(pte) = (pte_val(pte) & CF_PAGE_CHG_MASK) | pgprot_val(newprot);

View File

@ -15,7 +15,7 @@ enum m68k_table_types {
}; };
extern void init_pointer_table(void *table, int type); extern void init_pointer_table(void *table, int type);
extern void *get_pointer_table(int type); extern void *get_pointer_table(struct mm_struct *mm, int type);
extern int free_pointer_table(void *table, int type); extern int free_pointer_table(void *table, int type);
/* /*
@ -26,7 +26,7 @@ extern int free_pointer_table(void *table, int type);
static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm) static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
{ {
return get_pointer_table(TABLE_PTE); return get_pointer_table(mm, TABLE_PTE);
} }
static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte) static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
@ -36,7 +36,7 @@ static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
static inline pgtable_t pte_alloc_one(struct mm_struct *mm) static inline pgtable_t pte_alloc_one(struct mm_struct *mm)
{ {
return get_pointer_table(TABLE_PTE); return get_pointer_table(mm, TABLE_PTE);
} }
static inline void pte_free(struct mm_struct *mm, pgtable_t pgtable) static inline void pte_free(struct mm_struct *mm, pgtable_t pgtable)
@ -53,7 +53,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pgtable,
static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address) static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
{ {
return get_pointer_table(TABLE_PMD); return get_pointer_table(mm, TABLE_PMD);
} }
static inline int pmd_free(struct mm_struct *mm, pmd_t *pmd) static inline int pmd_free(struct mm_struct *mm, pmd_t *pmd)
@ -75,7 +75,7 @@ static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
static inline pgd_t *pgd_alloc(struct mm_struct *mm) static inline pgd_t *pgd_alloc(struct mm_struct *mm)
{ {
return get_pointer_table(TABLE_PGD); return get_pointer_table(mm, TABLE_PGD);
} }

View File

@ -81,12 +81,6 @@ extern unsigned long mm_cachebits;
#define pmd_pgtable(pmd) ((pgtable_t)pmd_page_vaddr(pmd)) #define pmd_pgtable(pmd) ((pgtable_t)pmd_page_vaddr(pmd))
/*
* Conversion functions: convert a page and protection to a page entry,
* and a page entry and page directory to the page they refer to.
*/
#define mk_pte(page, pgprot) pfn_pte(page_to_pfn(page), (pgprot))
static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
{ {
pte_val(pte) = (pte_val(pte) & _PAGE_CHG_MASK) | pgprot_val(newprot); pte_val(pte) = (pte_val(pte) & _PAGE_CHG_MASK) | pgprot_val(newprot);

View File

@ -76,12 +76,6 @@
#ifndef __ASSEMBLY__ #ifndef __ASSEMBLY__
/*
* Conversion functions: convert a page and protection to a page entry,
* and a page entry and page directory to the page they refer to.
*/
#define mk_pte(page, pgprot) pfn_pte(page_to_pfn(page), (pgprot))
static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
{ {
pte_val(pte) = (pte_val(pte) & SUN3_PAGE_CHG_MASK) | pgprot_val(newprot); pte_val(pte) = (pte_val(pte) & SUN3_PAGE_CHG_MASK) | pgprot_val(newprot);

View File

@ -14,6 +14,13 @@ static inline int syscall_get_nr(struct task_struct *task,
return regs->orig_d0; return regs->orig_d0;
} }
static inline void syscall_set_nr(struct task_struct *task,
struct pt_regs *regs,
int nr)
{
regs->orig_d0 = nr;
}
static inline void syscall_rollback(struct task_struct *task, static inline void syscall_rollback(struct task_struct *task,
struct pt_regs *regs) struct pt_regs *regs)
{ {

View File

@ -139,7 +139,7 @@ void __init init_pointer_table(void *table, int type)
return; return;
} }
void *get_pointer_table(int type) void *get_pointer_table(struct mm_struct *mm, int type)
{ {
ptable_desc *dp = ptable_list[type].next; ptable_desc *dp = ptable_list[type].next;
unsigned int mask = list_empty(&ptable_list[type]) ? 0 : PD_MARKBITS(dp); unsigned int mask = list_empty(&ptable_list[type]) ? 0 : PD_MARKBITS(dp);
@ -164,10 +164,10 @@ void *get_pointer_table(int type)
* m68k doesn't have SPLIT_PTE_PTLOCKS for not having * m68k doesn't have SPLIT_PTE_PTLOCKS for not having
* SMP. * SMP.
*/ */
pagetable_pte_ctor(virt_to_ptdesc(page)); pagetable_pte_ctor(mm, virt_to_ptdesc(page));
break; break;
case TABLE_PMD: case TABLE_PMD:
pagetable_pmd_ctor(virt_to_ptdesc(page)); pagetable_pmd_ctor(mm, virt_to_ptdesc(page));
break; break;
case TABLE_PGD: case TABLE_PGD:
pagetable_pgd_ctor(virt_to_ptdesc(page)); pagetable_pgd_ctor(virt_to_ptdesc(page));

View File

@ -285,14 +285,6 @@ static inline pte_t mk_pte_phys(phys_addr_t physpage, pgprot_t pgprot)
return pte; return pte;
} }
#define mk_pte(page, pgprot) \
({ \
pte_t pte; \
pte_val(pte) = (((page - mem_map) << PAGE_SHIFT) + memory_start) | \
pgprot_val(pgprot); \
pte; \
})
static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
{ {
pte_val(pte) = (pte_val(pte) & _PAGE_CHG_MASK) | pgprot_val(newprot); pte_val(pte) = (pte_val(pte) & _PAGE_CHG_MASK) | pgprot_val(newprot);

View File

@ -14,6 +14,13 @@ static inline long syscall_get_nr(struct task_struct *task,
return regs->r12; return regs->r12;
} }
static inline void syscall_set_nr(struct task_struct *task,
struct pt_regs *regs,
int nr)
{
regs->r12 = nr;
}
static inline void syscall_rollback(struct task_struct *task, static inline void syscall_rollback(struct task_struct *task,
struct pt_regs *regs) struct pt_regs *regs)
{ {

View File

@ -245,7 +245,7 @@ unsigned long iopa(unsigned long addr)
__ref pte_t *pte_alloc_one_kernel(struct mm_struct *mm) __ref pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
{ {
if (mem_init_done) if (mem_init_done)
return (pte_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO); return __pte_alloc_one_kernel(mm);
else else
return memblock_alloc_try_nid(PAGE_SIZE, PAGE_SIZE, return memblock_alloc_try_nid(PAGE_SIZE, PAGE_SIZE,
MEMBLOCK_LOW_LIMIT, MEMBLOCK_LOW_LIMIT,

View File

@ -62,7 +62,7 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
if (!ptdesc) if (!ptdesc)
return NULL; return NULL;
if (!pagetable_pmd_ctor(ptdesc)) { if (!pagetable_pmd_ctor(mm, ptdesc)) {
pagetable_free(ptdesc); pagetable_free(ptdesc);
return NULL; return NULL;
} }

View File

@ -504,12 +504,6 @@ static inline int ptep_set_access_flags(struct vm_area_struct *vma,
return true; return true;
} }
/*
* Conversion functions: convert a page and protection to a page entry,
* and a page entry and page directory to the page they refer to.
*/
#define mk_pte(page, pgprot) pfn_pte(page_to_pfn(page), (pgprot))
#if defined(CONFIG_XPA) #if defined(CONFIG_XPA)
static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
{ {
@ -719,9 +713,6 @@ static inline pmd_t pmd_clear_soft_dirty(pmd_t pmd)
#endif /* CONFIG_HAVE_ARCH_SOFT_DIRTY */ #endif /* CONFIG_HAVE_ARCH_SOFT_DIRTY */
/* Extern to avoid header file madness */
extern pmd_t mk_pmd(struct page *page, pgprot_t prot);
static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot) static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
{ {
pmd_val(pmd) = (pmd_val(pmd) & (_PAGE_CHG_MASK | _PAGE_HUGE)) | pmd_val(pmd) = (pmd_val(pmd) & (_PAGE_CHG_MASK | _PAGE_HUGE)) |

View File

@ -41,6 +41,21 @@ static inline long syscall_get_nr(struct task_struct *task,
return task_thread_info(task)->syscall; return task_thread_info(task)->syscall;
} }
static inline void syscall_set_nr(struct task_struct *task,
struct pt_regs *regs,
int nr)
{
/*
* New syscall number has to be assigned to regs[2] because
* it is loaded from there unconditionally after return from
* syscall_trace_enter() invocation.
*
* Consequently, if the syscall was indirect and nr != __NR_syscall,
* then after this assignment the syscall will cease to be indirect.
*/
task_thread_info(task)->syscall = regs->regs[2] = nr;
}
static inline void mips_syscall_update_nr(struct task_struct *task, static inline void mips_syscall_update_nr(struct task_struct *task,
struct pt_regs *regs) struct pt_regs *regs)
{ {
@ -74,6 +89,23 @@ static inline void mips_get_syscall_arg(unsigned long *arg,
#endif #endif
} }
static inline void mips_set_syscall_arg(unsigned long *arg,
struct task_struct *task, struct pt_regs *regs, unsigned int n)
{
#ifdef CONFIG_32BIT
switch (n) {
case 0: case 1: case 2: case 3:
regs->regs[4 + n] = *arg;
return;
case 4: case 5: case 6: case 7:
*arg = regs->args[n] = *arg;
return;
}
#else
regs->regs[4 + n] = *arg;
#endif
}
static inline long syscall_get_error(struct task_struct *task, static inline long syscall_get_error(struct task_struct *task,
struct pt_regs *regs) struct pt_regs *regs)
{ {
@ -120,6 +152,17 @@ static inline void syscall_get_arguments(struct task_struct *task,
mips_get_syscall_arg(args++, task, regs, i++); mips_get_syscall_arg(args++, task, regs, i++);
} }
static inline void syscall_set_arguments(struct task_struct *task,
struct pt_regs *regs,
unsigned long *args)
{
unsigned int i = 0;
unsigned int n = 6;
while (n--)
mips_set_syscall_arg(args++, task, regs, i++);
}
extern const unsigned long sys_call_table[]; extern const unsigned long sys_call_table[];
extern const unsigned long sys32_call_table[]; extern const unsigned long sys32_call_table[];
extern const unsigned long sysn32_call_table[]; extern const unsigned long sysn32_call_table[];

View File

@ -31,16 +31,6 @@ void pgd_init(void *addr)
} }
#if defined(CONFIG_TRANSPARENT_HUGEPAGE) #if defined(CONFIG_TRANSPARENT_HUGEPAGE)
pmd_t mk_pmd(struct page *page, pgprot_t prot)
{
pmd_t pmd;
pmd_val(pmd) = (page_to_pfn(page) << PFN_PTE_SHIFT) | pgprot_val(prot);
return pmd;
}
void set_pmd_at(struct mm_struct *mm, unsigned long addr, void set_pmd_at(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp, pmd_t pmd) pmd_t *pmdp, pmd_t pmd)
{ {

View File

@ -90,15 +90,6 @@ void pud_init(void *addr)
#endif #endif
#ifdef CONFIG_TRANSPARENT_HUGEPAGE #ifdef CONFIG_TRANSPARENT_HUGEPAGE
pmd_t mk_pmd(struct page *page, pgprot_t prot)
{
pmd_t pmd;
pmd_val(pmd) = (page_to_pfn(page) << PFN_PTE_SHIFT) | pgprot_val(prot);
return pmd;
}
void set_pmd_at(struct mm_struct *mm, unsigned long addr, void set_pmd_at(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp, pmd_t pmd) pmd_t *pmdp, pmd_t pmd)
{ {

View File

@ -217,12 +217,6 @@ static inline void pte_clear(struct mm_struct *mm,
set_pte(ptep, null); set_pte(ptep, null);
} }
/*
* Conversion functions: convert a page and protection to a page entry,
* and a page entry and page directory to the page they refer to.
*/
#define mk_pte(page, prot) (pfn_pte(page_to_pfn(page), prot))
/* /*
* Conversion functions: convert a page and protection to a page entry, * Conversion functions: convert a page and protection to a page entry,
* and a page entry and page directory to the page they refer to. * and a page entry and page directory to the page they refer to.

View File

@ -15,6 +15,11 @@ static inline int syscall_get_nr(struct task_struct *task, struct pt_regs *regs)
return regs->r2; return regs->r2;
} }
static inline void syscall_set_nr(struct task_struct *task, struct pt_regs *regs, int nr)
{
regs->r2 = nr;
}
static inline void syscall_rollback(struct task_struct *task, static inline void syscall_rollback(struct task_struct *task,
struct pt_regs *regs) struct pt_regs *regs)
{ {
@ -58,6 +63,17 @@ static inline void syscall_get_arguments(struct task_struct *task,
*args = regs->r9; *args = regs->r9;
} }
static inline void syscall_set_arguments(struct task_struct *task,
struct pt_regs *regs, const unsigned long *args)
{
regs->r4 = *args++;
regs->r5 = *args++;
regs->r6 = *args++;
regs->r7 = *args++;
regs->r8 = *args++;
regs->r9 = *args;
}
static inline int syscall_get_arch(struct task_struct *task) static inline int syscall_get_arch(struct task_struct *task)
{ {
return AUDIT_ARCH_NIOS2; return AUDIT_ARCH_NIOS2;

View File

@ -299,8 +299,6 @@ static inline pte_t __mk_pte(void *page, pgprot_t pgprot)
return pte; return pte;
} }
#define mk_pte(page, pgprot) __mk_pte(page_address(page), (pgprot))
#define mk_pte_phys(physpage, pgprot) \ #define mk_pte_phys(physpage, pgprot) \
({ \ ({ \
pte_t __pte; \ pte_t __pte; \

View File

@ -25,6 +25,12 @@ syscall_get_nr(struct task_struct *task, struct pt_regs *regs)
return regs->orig_gpr11; return regs->orig_gpr11;
} }
static inline void
syscall_set_nr(struct task_struct *task, struct pt_regs *regs, int nr)
{
regs->orig_gpr11 = nr;
}
static inline void static inline void
syscall_rollback(struct task_struct *task, struct pt_regs *regs) syscall_rollback(struct task_struct *task, struct pt_regs *regs)
{ {
@ -57,6 +63,13 @@ syscall_get_arguments(struct task_struct *task, struct pt_regs *regs,
memcpy(args, &regs->gpr[3], 6 * sizeof(args[0])); memcpy(args, &regs->gpr[3], 6 * sizeof(args[0]));
} }
static inline void
syscall_set_arguments(struct task_struct *task, struct pt_regs *regs,
const unsigned long *args)
{
memcpy(&regs->gpr[3], args, 6 * sizeof(args[0]));
}
static inline int syscall_get_arch(struct task_struct *task) static inline int syscall_get_arch(struct task_struct *task)
{ {
return AUDIT_ARCH_OPENRISC; return AUDIT_ARCH_OPENRISC;

View File

@ -36,7 +36,7 @@ pte_t __ref *pte_alloc_one_kernel(struct mm_struct *mm)
pte_t *pte; pte_t *pte;
if (likely(mem_init_done)) { if (likely(mem_init_done)) {
pte = (pte_t *)get_zeroed_page(GFP_KERNEL); pte = __pte_alloc_one_kernel(mm);
} else { } else {
pte = memblock_alloc_or_panic(PAGE_SIZE, PAGE_SIZE); pte = memblock_alloc_or_panic(PAGE_SIZE, PAGE_SIZE);
} }

View File

@ -39,7 +39,7 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long address)
ptdesc = pagetable_alloc(gfp, PMD_TABLE_ORDER); ptdesc = pagetable_alloc(gfp, PMD_TABLE_ORDER);
if (!ptdesc) if (!ptdesc)
return NULL; return NULL;
if (!pagetable_pmd_ctor(ptdesc)) { if (!pagetable_pmd_ctor(mm, ptdesc)) {
pagetable_free(ptdesc); pagetable_free(ptdesc);
return NULL; return NULL;
} }

View File

@ -338,10 +338,6 @@ static inline pte_t pte_mkspecial(pte_t pte) { pte_val(pte) |= _PAGE_SPECIAL; re
#endif #endif
/*
* Conversion functions: convert a page and protection to a page entry,
* and a page entry and page directory to the page they refer to.
*/
#define __mk_pte(addr,pgprot) \ #define __mk_pte(addr,pgprot) \
({ \ ({ \
pte_t __pte; \ pte_t __pte; \
@ -351,8 +347,6 @@ static inline pte_t pte_mkspecial(pte_t pte) { pte_val(pte) |= _PAGE_SPECIAL; re
__pte; \ __pte; \
}) })
#define mk_pte(page, pgprot) pfn_pte(page_to_pfn(page), (pgprot))
static inline pte_t pfn_pte(unsigned long pfn, pgprot_t pgprot) static inline pte_t pfn_pte(unsigned long pfn, pgprot_t pgprot)
{ {
pte_t pte; pte_t pte;

View File

@ -17,6 +17,13 @@ static inline long syscall_get_nr(struct task_struct *tsk,
return regs->gr[20]; return regs->gr[20];
} }
static inline void syscall_set_nr(struct task_struct *tsk,
struct pt_regs *regs,
int nr)
{
regs->gr[20] = nr;
}
static inline void syscall_get_arguments(struct task_struct *tsk, static inline void syscall_get_arguments(struct task_struct *tsk,
struct pt_regs *regs, struct pt_regs *regs,
unsigned long *args) unsigned long *args)
@ -29,6 +36,18 @@ static inline void syscall_get_arguments(struct task_struct *tsk,
args[0] = regs->gr[26]; args[0] = regs->gr[26];
} }
static inline void syscall_set_arguments(struct task_struct *tsk,
struct pt_regs *regs,
unsigned long *args)
{
regs->gr[21] = args[5];
regs->gr[22] = args[4];
regs->gr[23] = args[3];
regs->gr[24] = args[2];
regs->gr[25] = args[1];
regs->gr[26] = args[0];
}
static inline long syscall_get_error(struct task_struct *task, static inline long syscall_get_error(struct task_struct *task,
struct pt_regs *regs) struct pt_regs *regs)
{ {

View File

@ -1096,7 +1096,6 @@ static inline bool pmd_access_permitted(pmd_t pmd, bool write)
#ifdef CONFIG_TRANSPARENT_HUGEPAGE #ifdef CONFIG_TRANSPARENT_HUGEPAGE
extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot); extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot);
extern pud_t pfn_pud(unsigned long pfn, pgprot_t pgprot); extern pud_t pfn_pud(unsigned long pfn, pgprot_t pgprot);
extern pmd_t mk_pmd(struct page *page, pgprot_t pgprot);
extern pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot); extern pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot);
extern pud_t pud_modify(pud_t pud, pgprot_t newprot); extern pud_t pud_modify(pud_t pud, pgprot_t newprot);
extern void set_pmd_at(struct mm_struct *mm, unsigned long addr, extern void set_pmd_at(struct mm_struct *mm, unsigned long addr,

View File

@ -53,9 +53,8 @@ void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
#define MAX_PTRS_PER_PGD PTRS_PER_PGD #define MAX_PTRS_PER_PGD PTRS_PER_PGD
#endif #endif
/* Keep these as a macros to avoid include dependency mess */ /* Keep this as a macro to avoid include dependency mess */
#define pte_page(x) pfn_to_page(pte_pfn(x)) #define pte_page(x) pfn_to_page(pte_pfn(x))
#define mk_pte(page, pgprot) pfn_pte(page_to_pfn(page), (pgprot))
static inline unsigned long pte_pfn(pte_t pte) static inline unsigned long pte_pfn(pte_t pte)
{ {

View File

@ -39,6 +39,16 @@ static inline int syscall_get_nr(struct task_struct *task, struct pt_regs *regs)
return -1; return -1;
} }
static inline void syscall_set_nr(struct task_struct *task, struct pt_regs *regs, int nr)
{
/*
* Unlike syscall_get_nr(), syscall_set_nr() can be called only when
* the target task is stopped for tracing on entering syscall, so
* there is no need to have the same check syscall_get_nr() has.
*/
regs->gpr[0] = nr;
}
static inline void syscall_rollback(struct task_struct *task, static inline void syscall_rollback(struct task_struct *task,
struct pt_regs *regs) struct pt_regs *regs)
{ {
@ -110,6 +120,16 @@ static inline void syscall_get_arguments(struct task_struct *task,
} }
} }
static inline void syscall_set_arguments(struct task_struct *task,
struct pt_regs *regs,
const unsigned long *args)
{
memcpy(&regs->gpr[3], args, 6 * sizeof(args[0]));
/* Also copy the first argument into orig_gpr3 */
regs->orig_gpr3 = args[0];
}
static inline int syscall_get_arch(struct task_struct *task) static inline int syscall_get_arch(struct task_struct *task)
{ {
if (is_tsk_32bit_task(task)) if (is_tsk_32bit_task(task))

View File

@ -269,11 +269,6 @@ pud_t pfn_pud(unsigned long pfn, pgprot_t pgprot)
return __pud_mkhuge(pud_set_protbits(__pud(pudv), pgprot)); return __pud_mkhuge(pud_set_protbits(__pud(pudv), pgprot));
} }
pmd_t mk_pmd(struct page *page, pgprot_t pgprot)
{
return pfn_pmd(page_to_pfn(page), pgprot);
}
pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot) pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
{ {
unsigned long pmdv; unsigned long pmdv;
@ -422,7 +417,7 @@ static pmd_t *__alloc_for_pmdcache(struct mm_struct *mm)
ptdesc = pagetable_alloc(gfp, 0); ptdesc = pagetable_alloc(gfp, 0);
if (!ptdesc) if (!ptdesc)
return NULL; return NULL;
if (!pagetable_pmd_ctor(ptdesc)) { if (!pagetable_pmd_ctor(mm, ptdesc)) {
pagetable_free(ptdesc); pagetable_free(ptdesc);
return NULL; return NULL;
} }

View File

@ -56,20 +56,18 @@ static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int kernel)
{ {
void *ret = NULL; void *ret = NULL;
struct ptdesc *ptdesc; struct ptdesc *ptdesc;
gfp_t gfp = PGALLOC_GFP;
if (!kernel) { if (!kernel)
ptdesc = pagetable_alloc(PGALLOC_GFP | __GFP_ACCOUNT, 0); gfp |= __GFP_ACCOUNT;
ptdesc = pagetable_alloc(gfp, 0);
if (!ptdesc) if (!ptdesc)
return NULL; return NULL;
if (!pagetable_pte_ctor(ptdesc)) { if (!pagetable_pte_ctor(mm, ptdesc)) {
pagetable_free(ptdesc); pagetable_free(ptdesc);
return NULL; return NULL;
} }
} else {
ptdesc = pagetable_alloc(PGALLOC_GFP, 0);
if (!ptdesc)
return NULL;
}
atomic_set(&ptdesc->pt_frag_refcount, 1); atomic_set(&ptdesc->pt_frag_refcount, 1);
@ -124,12 +122,10 @@ void pte_fragment_free(unsigned long *table, int kernel)
BUG_ON(atomic_read(&ptdesc->pt_frag_refcount) <= 0); BUG_ON(atomic_read(&ptdesc->pt_frag_refcount) <= 0);
if (atomic_dec_and_test(&ptdesc->pt_frag_refcount)) { if (atomic_dec_and_test(&ptdesc->pt_frag_refcount)) {
if (kernel) if (kernel || !folio_test_clear_active(ptdesc_folio(ptdesc)))
pagetable_free(ptdesc);
else if (folio_test_clear_active(ptdesc_folio(ptdesc)))
call_rcu(&ptdesc->pt_rcu_head, pte_free_now);
else
pte_free_now(&ptdesc->pt_rcu_head); pte_free_now(&ptdesc->pt_rcu_head);
else
call_rcu(&ptdesc->pt_rcu_head, pte_free_now);
} }
} }

View File

@ -298,6 +298,38 @@ static void populate_markers(void)
#endif #endif
} }
static void note_page_pte(struct ptdump_state *pt_st, unsigned long addr, pte_t pte)
{
note_page(pt_st, addr, 4, pte_val(pte));
}
static void note_page_pmd(struct ptdump_state *pt_st, unsigned long addr, pmd_t pmd)
{
note_page(pt_st, addr, 3, pmd_val(pmd));
}
static void note_page_pud(struct ptdump_state *pt_st, unsigned long addr, pud_t pud)
{
note_page(pt_st, addr, 2, pud_val(pud));
}
static void note_page_p4d(struct ptdump_state *pt_st, unsigned long addr, p4d_t p4d)
{
note_page(pt_st, addr, 1, p4d_val(p4d));
}
static void note_page_pgd(struct ptdump_state *pt_st, unsigned long addr, pgd_t pgd)
{
note_page(pt_st, addr, 0, pgd_val(pgd));
}
static void note_page_flush(struct ptdump_state *pt_st)
{
pte_t pte_zero = {0};
note_page(pt_st, 0, -1, pte_val(pte_zero));
}
static int ptdump_show(struct seq_file *m, void *v) static int ptdump_show(struct seq_file *m, void *v)
{ {
struct pg_state st = { struct pg_state st = {
@ -305,7 +337,12 @@ static int ptdump_show(struct seq_file *m, void *v)
.marker = address_markers, .marker = address_markers,
.level = -1, .level = -1,
.ptdump = { .ptdump = {
.note_page = note_page, .note_page_pte = note_page_pte,
.note_page_pmd = note_page_pmd,
.note_page_pud = note_page_pud,
.note_page_p4d = note_page_p4d,
.note_page_pgd = note_page_pgd,
.note_page_flush = note_page_flush,
.range = ptdump_range, .range = ptdump_range,
} }
}; };
@ -338,7 +375,12 @@ bool ptdump_check_wx(void)
.level = -1, .level = -1,
.check_wx = true, .check_wx = true,
.ptdump = { .ptdump = {
.note_page = note_page, .note_page_pte = note_page_pte,
.note_page_pmd = note_page_pmd,
.note_page_pud = note_page_pud,
.note_page_p4d = note_page_p4d,
.note_page_pgd = note_page_pgd,
.note_page_flush = note_page_flush,
.range = ptdump_range, .range = ptdump_range,
} }
}; };

View File

@ -262,8 +262,6 @@ static inline unsigned long _pmd_pfn(pmd_t pmd)
return __page_val_to_pfn(pmd_val(pmd)); return __page_val_to_pfn(pmd_val(pmd));
} }
#define mk_pmd(page, prot) pfn_pmd(page_to_pfn(page), prot)
#define pmd_ERROR(e) \ #define pmd_ERROR(e) \
pr_err("%s:%d: bad pmd %016lx.\n", __FILE__, __LINE__, pmd_val(e)) pr_err("%s:%d: bad pmd %016lx.\n", __FILE__, __LINE__, pmd_val(e))

View File

@ -343,8 +343,6 @@ static inline pte_t pfn_pte(unsigned long pfn, pgprot_t prot)
return __pte((pfn << _PAGE_PFN_SHIFT) | prot_val); return __pte((pfn << _PAGE_PFN_SHIFT) | prot_val);
} }
#define mk_pte(page, prot) pfn_pte(page_to_pfn(page), prot)
#define pte_pgprot pte_pgprot #define pte_pgprot pte_pgprot
static inline pgprot_t pte_pgprot(pte_t pte) static inline pgprot_t pte_pgprot(pte_t pte)
{ {

View File

@ -30,6 +30,13 @@ static inline int syscall_get_nr(struct task_struct *task,
return regs->a7; return regs->a7;
} }
static inline void syscall_set_nr(struct task_struct *task,
struct pt_regs *regs,
int nr)
{
regs->a7 = nr;
}
static inline void syscall_rollback(struct task_struct *task, static inline void syscall_rollback(struct task_struct *task,
struct pt_regs *regs) struct pt_regs *regs)
{ {
@ -69,6 +76,18 @@ static inline void syscall_get_arguments(struct task_struct *task,
args[5] = regs->a5; args[5] = regs->a5;
} }
static inline void syscall_set_arguments(struct task_struct *task,
struct pt_regs *regs,
const unsigned long *args)
{
regs->orig_a0 = args[0];
regs->a1 = args[1];
regs->a2 = args[2];
regs->a3 = args[3];
regs->a4 = args[4];
regs->a5 = args[5];
}
static inline int syscall_get_arch(struct task_struct *task) static inline int syscall_get_arch(struct task_struct *task)
{ {
#ifdef CONFIG_64BIT #ifdef CONFIG_64BIT

View File

@ -442,7 +442,12 @@ static phys_addr_t __meminit alloc_pte_late(uintptr_t va)
{ {
struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL & ~__GFP_HIGHMEM, 0); struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL & ~__GFP_HIGHMEM, 0);
BUG_ON(!ptdesc || !pagetable_pte_ctor(ptdesc)); /*
* We do not know which mm the PTE page is associated to at this point.
* Passing NULL to the ctor is the safe option, though it may result
* in unnecessary work (e.g. initialising the ptlock for init_mm).
*/
BUG_ON(!ptdesc || !pagetable_pte_ctor(NULL, ptdesc));
return __pa((pte_t *)ptdesc_address(ptdesc)); return __pa((pte_t *)ptdesc_address(ptdesc));
} }
@ -522,7 +527,8 @@ static phys_addr_t __meminit alloc_pmd_late(uintptr_t va)
{ {
struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL & ~__GFP_HIGHMEM, 0); struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL & ~__GFP_HIGHMEM, 0);
BUG_ON(!ptdesc || !pagetable_pmd_ctor(ptdesc)); /* See comment in alloc_pte_late() regarding NULL passed the ctor */
BUG_ON(!ptdesc || !pagetable_pmd_ctor(NULL, ptdesc));
return __pa((pmd_t *)ptdesc_address(ptdesc)); return __pa((pmd_t *)ptdesc_address(ptdesc));
} }
@ -584,11 +590,11 @@ static phys_addr_t __init alloc_pud_fixmap(uintptr_t va)
static phys_addr_t __meminit alloc_pud_late(uintptr_t va) static phys_addr_t __meminit alloc_pud_late(uintptr_t va)
{ {
unsigned long vaddr; struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL, 0);
vaddr = __get_free_page(GFP_KERNEL); BUG_ON(!ptdesc);
BUG_ON(!vaddr); pagetable_pud_ctor(ptdesc);
return __pa(vaddr); return __pa((pud_t *)ptdesc_address(ptdesc));
} }
static p4d_t *__init get_p4d_virt_early(phys_addr_t pa) static p4d_t *__init get_p4d_virt_early(phys_addr_t pa)
@ -622,11 +628,11 @@ static phys_addr_t __init alloc_p4d_fixmap(uintptr_t va)
static phys_addr_t __meminit alloc_p4d_late(uintptr_t va) static phys_addr_t __meminit alloc_p4d_late(uintptr_t va)
{ {
unsigned long vaddr; struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL, 0);
vaddr = __get_free_page(GFP_KERNEL); BUG_ON(!ptdesc);
BUG_ON(!vaddr); pagetable_p4d_ctor(ptdesc);
return __pa(vaddr); return __pa((p4d_t *)ptdesc_address(ptdesc));
} }
static void __meminit create_pud_mapping(pud_t *pudp, uintptr_t va, phys_addr_t pa, phys_addr_t sz, static void __meminit create_pud_mapping(pud_t *pudp, uintptr_t va, phys_addr_t pa, phys_addr_t sz,

View File

@ -318,6 +318,38 @@ static void note_page(struct ptdump_state *pt_st, unsigned long addr,
} }
} }
static void note_page_pte(struct ptdump_state *pt_st, unsigned long addr, pte_t pte)
{
note_page(pt_st, addr, 4, pte_val(pte));
}
static void note_page_pmd(struct ptdump_state *pt_st, unsigned long addr, pmd_t pmd)
{
note_page(pt_st, addr, 3, pmd_val(pmd));
}
static void note_page_pud(struct ptdump_state *pt_st, unsigned long addr, pud_t pud)
{
note_page(pt_st, addr, 2, pud_val(pud));
}
static void note_page_p4d(struct ptdump_state *pt_st, unsigned long addr, p4d_t p4d)
{
note_page(pt_st, addr, 1, p4d_val(p4d));
}
static void note_page_pgd(struct ptdump_state *pt_st, unsigned long addr, pgd_t pgd)
{
note_page(pt_st, addr, 0, pgd_val(pgd));
}
static void note_page_flush(struct ptdump_state *pt_st)
{
pte_t pte_zero = {0};
note_page(pt_st, 0, -1, pte_val(pte_zero));
}
static void ptdump_walk(struct seq_file *s, struct ptd_mm_info *pinfo) static void ptdump_walk(struct seq_file *s, struct ptd_mm_info *pinfo)
{ {
struct pg_state st = { struct pg_state st = {
@ -325,7 +357,12 @@ static void ptdump_walk(struct seq_file *s, struct ptd_mm_info *pinfo)
.marker = pinfo->markers, .marker = pinfo->markers,
.level = -1, .level = -1,
.ptdump = { .ptdump = {
.note_page = note_page, .note_page_pte = note_page_pte,
.note_page_pmd = note_page_pmd,
.note_page_pud = note_page_pud,
.note_page_p4d = note_page_p4d,
.note_page_pgd = note_page_pgd,
.note_page_flush = note_page_flush,
.range = (struct ptdump_range[]) { .range = (struct ptdump_range[]) {
{pinfo->base_addr, pinfo->end}, {pinfo->base_addr, pinfo->end},
{0, 0} {0, 0}
@ -347,7 +384,12 @@ bool ptdump_check_wx(void)
.level = -1, .level = -1,
.check_wx = true, .check_wx = true,
.ptdump = { .ptdump = {
.note_page = note_page, .note_page_pte = note_page_pte,
.note_page_pmd = note_page_pmd,
.note_page_pud = note_page_pud,
.note_page_p4d = note_page_p4d,
.note_page_pgd = note_page_pgd,
.note_page_flush = note_page_flush,
.range = (struct ptdump_range[]) { .range = (struct ptdump_range[]) {
{KERN_VIRT_START, ULONG_MAX}, {KERN_VIRT_START, ULONG_MAX},
{0, 0} {0, 0}

View File

@ -97,7 +97,7 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long vmaddr)
if (!table) if (!table)
return NULL; return NULL;
crst_table_init(table, _SEGMENT_ENTRY_EMPTY); crst_table_init(table, _SEGMENT_ENTRY_EMPTY);
if (!pagetable_pmd_ctor(virt_to_ptdesc(table))) { if (!pagetable_pmd_ctor(mm, virt_to_ptdesc(table))) {
crst_table_free(mm, table); crst_table_free(mm, table);
return NULL; return NULL;
} }

View File

@ -1448,16 +1448,6 @@ static inline pte_t mk_pte_phys(unsigned long physpage, pgprot_t pgprot)
return pte_mkyoung(__pte); return pte_mkyoung(__pte);
} }
static inline pte_t mk_pte(struct page *page, pgprot_t pgprot)
{
unsigned long physpage = page_to_phys(page);
pte_t __pte = mk_pte_phys(physpage, pgprot);
if (pte_write(__pte) && PageDirty(page))
__pte = pte_mkdirty(__pte);
return __pte;
}
#define pgd_index(address) (((address) >> PGDIR_SHIFT) & (PTRS_PER_PGD-1)) #define pgd_index(address) (((address) >> PGDIR_SHIFT) & (PTRS_PER_PGD-1))
#define p4d_index(address) (((address) >> P4D_SHIFT) & (PTRS_PER_P4D-1)) #define p4d_index(address) (((address) >> P4D_SHIFT) & (PTRS_PER_P4D-1))
#define pud_index(address) (((address) >> PUD_SHIFT) & (PTRS_PER_PUD-1)) #define pud_index(address) (((address) >> PUD_SHIFT) & (PTRS_PER_PUD-1))
@ -1879,7 +1869,6 @@ static inline pmd_t pmdp_collapse_flush(struct vm_area_struct *vma,
#define pmdp_collapse_flush pmdp_collapse_flush #define pmdp_collapse_flush pmdp_collapse_flush
#define pfn_pmd(pfn, pgprot) mk_pmd_phys(((pfn) << PAGE_SHIFT), (pgprot)) #define pfn_pmd(pfn, pgprot) mk_pmd_phys(((pfn) << PAGE_SHIFT), (pgprot))
#define mk_pmd(page, pgprot) pfn_pmd(page_to_pfn(page), (pgprot))
static inline int pmd_trans_huge(pmd_t pmd) static inline int pmd_trans_huge(pmd_t pmd)
{ {

View File

@ -24,6 +24,18 @@ static inline long syscall_get_nr(struct task_struct *task,
(regs->int_code & 0xffff) : -1; (regs->int_code & 0xffff) : -1;
} }
static inline void syscall_set_nr(struct task_struct *task,
struct pt_regs *regs,
int nr)
{
/*
* Unlike syscall_get_nr(), syscall_set_nr() can be called only when
* the target task is stopped for tracing on entering syscall, so
* there is no need to have the same check syscall_get_nr() has.
*/
regs->int_code = (regs->int_code & ~0xffff) | (nr & 0xffff);
}
static inline void syscall_rollback(struct task_struct *task, static inline void syscall_rollback(struct task_struct *task,
struct pt_regs *regs) struct pt_regs *regs)
{ {
@ -76,6 +88,15 @@ static inline void syscall_get_arguments(struct task_struct *task,
args[0] = regs->orig_gpr2 & mask; args[0] = regs->orig_gpr2 & mask;
} }
static inline void syscall_set_arguments(struct task_struct *task,
struct pt_regs *regs,
const unsigned long *args)
{
regs->orig_gpr2 = args[0];
for (int n = 1; n < 6; n++)
regs->gprs[2 + n] = args[n];
}
static inline int syscall_get_arch(struct task_struct *task) static inline int syscall_get_arch(struct task_struct *task)
{ {
#ifdef CONFIG_COMPAT #ifdef CONFIG_COMPAT

View File

@ -40,7 +40,7 @@ static inline bool __tlb_remove_folio_pages(struct mmu_gather *tlb,
/* /*
* Release the page cache reference for a pte removed by * Release the page cache reference for a pte removed by
* tlb_ptep_clear_flush. In both flush modes the tlb for a page cache page * tlb_ptep_clear_flush. In both flush modes the tlb for a page cache page
* has already been freed, so just do free_page_and_swap_cache. * has already been freed, so just do free_folio_and_swap_cache.
* *
* s390 doesn't delay rmap removal. * s390 doesn't delay rmap removal.
*/ */
@ -49,7 +49,7 @@ static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
{ {
VM_WARN_ON_ONCE(delay_rmap); VM_WARN_ON_ONCE(delay_rmap);
free_page_and_swap_cache(page); free_folio_and_swap_cache(page_folio(page));
return false; return false;
} }

View File

@ -147,11 +147,48 @@ static void note_page(struct ptdump_state *pt_st, unsigned long addr, int level,
} }
} }
static void note_page_pte(struct ptdump_state *pt_st, unsigned long addr, pte_t pte)
{
note_page(pt_st, addr, 4, pte_val(pte));
}
static void note_page_pmd(struct ptdump_state *pt_st, unsigned long addr, pmd_t pmd)
{
note_page(pt_st, addr, 3, pmd_val(pmd));
}
static void note_page_pud(struct ptdump_state *pt_st, unsigned long addr, pud_t pud)
{
note_page(pt_st, addr, 2, pud_val(pud));
}
static void note_page_p4d(struct ptdump_state *pt_st, unsigned long addr, p4d_t p4d)
{
note_page(pt_st, addr, 1, p4d_val(p4d));
}
static void note_page_pgd(struct ptdump_state *pt_st, unsigned long addr, pgd_t pgd)
{
note_page(pt_st, addr, 0, pgd_val(pgd));
}
static void note_page_flush(struct ptdump_state *pt_st)
{
pte_t pte_zero = {0};
note_page(pt_st, 0, -1, pte_val(pte_zero));
}
bool ptdump_check_wx(void) bool ptdump_check_wx(void)
{ {
struct pg_state st = { struct pg_state st = {
.ptdump = { .ptdump = {
.note_page = note_page, .note_page_pte = note_page_pte,
.note_page_pmd = note_page_pmd,
.note_page_pud = note_page_pud,
.note_page_p4d = note_page_p4d,
.note_page_pgd = note_page_pgd,
.note_page_flush = note_page_flush,
.range = (struct ptdump_range[]) { .range = (struct ptdump_range[]) {
{.start = 0, .end = max_addr}, {.start = 0, .end = max_addr},
{.start = 0, .end = 0}, {.start = 0, .end = 0},
@ -190,7 +227,12 @@ static int ptdump_show(struct seq_file *m, void *v)
{ {
struct pg_state st = { struct pg_state st = {
.ptdump = { .ptdump = {
.note_page = note_page, .note_page_pte = note_page_pte,
.note_page_pmd = note_page_pmd,
.note_page_pud = note_page_pud,
.note_page_p4d = note_page_p4d,
.note_page_pgd = note_page_pgd,
.note_page_flush = note_page_flush,
.range = (struct ptdump_range[]) { .range = (struct ptdump_range[]) {
{.start = 0, .end = max_addr}, {.start = 0, .end = max_addr},
{.start = 0, .end = 0}, {.start = 0, .end = 0},

View File

@ -144,7 +144,7 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
ptdesc = pagetable_alloc(GFP_KERNEL, 0); ptdesc = pagetable_alloc(GFP_KERNEL, 0);
if (!ptdesc) if (!ptdesc)
return NULL; return NULL;
if (!pagetable_pte_ctor(ptdesc)) { if (!pagetable_pte_ctor(mm, ptdesc)) {
pagetable_free(ptdesc); pagetable_free(ptdesc);
return NULL; return NULL;
} }

View File

@ -380,14 +380,6 @@ PTE_BIT_FUNC(low, mkspecial, |= _PAGE_SPECIAL);
#define pgprot_noncached pgprot_writecombine #define pgprot_noncached pgprot_writecombine
/*
* Conversion functions: convert a page and protection to a page entry,
* and a page entry and page directory to the page they refer to.
*
* extern pte_t mk_pte(struct page *page, pgprot_t pgprot)
*/
#define mk_pte(page, pgprot) pfn_pte(page_to_pfn(page), (pgprot))
static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
{ {
pte.pte_low &= _PAGE_CHG_MASK; pte.pte_low &= _PAGE_CHG_MASK;

Some files were not shown because too many files have changed in this diff Show More