linux

mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-03-22 07:27:12 +08:00

Files

Youling Tang 9fd53c8122 mm/filemap: align last_index to folio size

On XFS systems with pagesize=4K, blocksize=16K, and
CONFIG_TRANSPARENT_HUGEPAGE enabled, We observed the following readahead
behaviors:

 # echo 3 > /proc/sys/vm/drop_caches
 # dd if=test of=/dev/null bs=64k count=1
 # ./tools/mm/page-types -r -L -f  /mnt/xfs/test
 foffset	offset	flags
 0	136d4c	__RU_l_________H______t_________________F_1
 1	136d4d	__RU_l__________T_____t_________________F_1
 2	136d4e	__RU_l__________T_____t_________________F_1
 3	136d4f	__RU_l__________T_____t_________________F_1
 ...
 c	136bb8	__RU_l_________H______t_________________F_1
 d	136bb9	__RU_l__________T_____t_________________F_1
 e	136bba	__RU_l__________T_____t_________________F_1
 f	136bbb	__RU_l__________T_____t_________________F_1   <-- first read
 10	13c2cc	___U_l_________H______t______________I__F_1   <-- readahead flag
 11	13c2cd	___U_l__________T_____t______________I__F_1
 12	13c2ce	___U_l__________T_____t______________I__F_1
 13	13c2cf	___U_l__________T_____t______________I__F_1
 ...
 1c	1405d4	___U_l_________H______t_________________F_1
 1d	1405d5	___U_l__________T_____t_________________F_1
 1e	1405d6	___U_l__________T_____t_________________F_1
 1f	1405d7	___U_l__________T_____t_________________F_1
 [ra_size = 32, req_count = 16, async_size = 16]

 # echo 3 > /proc/sys/vm/drop_caches
 # dd if=test of=/dev/null bs=60k count=1
 # ./page-types -r -L -f  /mnt/xfs/test
 foffset	offset	flags
 0	136048	__RU_l_________H______t_________________F_1
 ...
 c	110a40	__RU_l_________H______t_________________F_1
 d	110a41	__RU_l__________T_____t_________________F_1
 e	110a42	__RU_l__________T_____t_________________F_1   <-- first read
 f	110a43	__RU_l__________T_____t_________________F_1   <-- first readahead flag
 10	13e7a8	___U_l_________H______t_________________F_1
 ...
 20	137a00	___U_l_________H______t_______P______I__F_1   <-- second readahead flag (20 - 2f)
 21	137a01	___U_l__________T_____t_______P______I__F_1
 ...
 3f	10d4af	___U_l__________T_____t_______P_________F_1
 [first readahead: ra_size = 32, req_count = 15, async_size = 17]

When reading 64k data (same for 61-63k range, where last_index is
page-aligned in filemap_get_pages()), 128k readahead is triggered via
page_cache_sync_ra() and the PG_readahead flag is set on the next folio
(the one containing 0x10 page).

When reading 60k data, 128k readahead is also triggered via
page_cache_sync_ra().  However, in this case the readahead flag is set on
the 0xf page.  Although the requested read size (req_count) is 60k, the
actual read will be aligned to folio size (64k), which triggers the
readahead flag and initiates asynchronous readahead via
page_cache_async_ra().  This results in two readahead operations totaling
256k.

The root cause is that when the requested size is smaller than the actual
read size (due to folio alignment), it triggers asynchronous readahead. 
By changing last_index alignment from page size to folio size, we ensure
the requested size matches the actual read size, preventing the case where
a single read operation triggers two readahead operations.

After applying the patch:
 # echo 3 > /proc/sys/vm/drop_caches
 # dd if=test of=/dev/null bs=60k count=1
 # ./page-types -r -L -f  /mnt/xfs/test
 foffset	offset	flags
 0	136d4c	__RU_l_________H______t_________________F_1
 1	136d4d	__RU_l__________T_____t_________________F_1
 2	136d4e	__RU_l__________T_____t_________________F_1
 3	136d4f	__RU_l__________T_____t_________________F_1
 ...
 c	136bb8	__RU_l_________H______t_________________F_1
 d	136bb9	__RU_l__________T_____t_________________F_1
 e	136bba	__RU_l__________T_____t_________________F_1   <-- first read
 f	136bbb	__RU_l__________T_____t_________________F_1
 10	13c2cc	___U_l_________H______t______________I__F_1   <-- readahead flag
 11	13c2cd	___U_l__________T_____t______________I__F_1
 12	13c2ce	___U_l__________T_____t______________I__F_1
 13	13c2cf	___U_l__________T_____t______________I__F_1
 ...
 1c	1405d4	___U_l_________H______t_________________F_1
 1d	1405d5	___U_l__________T_____t_________________F_1
 1e	1405d6	___U_l__________T_____t_________________F_1
 1f	1405d7	___U_l__________T_____t_________________F_1
 [ra_size = 32, req_count = 16, async_size = 16]

The same phenomenon will occur when reading from 49k to 64k.  Set the
readahead flag to the next folio.

Because the minimum order of folio in address_space equals the block size
(at least in xfs and bcachefs that already support bs > ps), having
request_count aligned to block size will not cause overread.

[klarasmodin@gmail.com: fix overflow on 32-bit]
  Link: https://lkml.kernel.org/r/yru7qf5gvyzccq5ohhpylvxug5lr5tf54omspbjh4sm6pcdb2r@fpjgj2pxw7va
[akpm@linux-foundation.org: update it for Max's constification efforts]
Link: https://lkml.kernel.org/r/20250711055509.91587-1-youling.tang@linux.dev
Co-developed-by: Chi Zhiling <chizhiling@kylinos.cn>
Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Klara Modin <klarasmodin@gmail.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Youling Tang <youling.tang@linux.dev>
Cc: David Hildenbrand <david@redhat.com>
Cc: Klara Modin <klarasmodin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

2025-09-21 14:22:15 -07:00

damon

Merge branch 'mm-hotfixes-stable' into mm-stable in order to pick up

2025-09-21 14:19:36 -07:00

kasan

kasan: apply write-only mode in kasan kunit testcases

2025-09-21 14:22:10 -07:00

kfence

kfence: drop nth_page() usage

2025-09-21 14:22:09 -07:00

kmsan

kmsan: test: add module description

2025-06-05 22:02:25 -07:00

backing-dev.c

mm: replace (20 - PAGE_SHIFT) with common macros for pages<->MB conversion

2025-09-13 16:54:42 -07:00

balloon_compaction.c

mm/migrate: fix NULL movable_ops if CONFIG_ZSMALLOC=m

2025-08-19 16:35:57 -07:00

bootmem_info.c

mm/sparse: allow for alternate vmemmap section init at boot

2025-03-16 22:06:27 -07:00

cma_debug.c

mm: cma: simplify cma_maxchunk_get()

2025-07-24 19:12:36 -07:00

cma_sysfs.c

mm/cma: export total and free number of pages for CMA areas

2025-03-16 22:06:24 -07:00

cma.c

mm/cma: refuse handing out non-contiguous page ranges

2025-09-21 14:22:06 -07:00

cma.h

mm: cma: set early_pfn and bitmap as a union in cma_memrange

2025-05-22 14:55:36 -07:00

compaction.c

mm: rename PG_isolated to PG_movable_ops_isolated

2025-07-13 16:38:30 -07:00

debug_page_alloc.c

mm/debug_page_alloc: improve error message for invalid guardpage minorder

2025-05-12 23:50:38 -07:00

debug_page_ref.c

…

debug_vm_pgtable.c

mm/debug_vm_pgtable: clear page table entries at destroy_args()

2025-08-19 16:35:54 -07:00

debug.c

mm: convert core mm to mm_flags_*() accessors

2025-09-13 16:54:56 -07:00

dmapool_test.c

…

dmapool.c

docs: dma-api: replace consistent with coherent

2025-07-01 13:25:36 -06:00

early_ioremap.c

mm/early_ioremap: add null pointer checks to prevent NULL-pointer dereference

2025-01-13 22:40:59 -08:00

execmem.c

mm: correct type for vmalloc vm_flags fields

2025-08-02 12:06:13 -07:00

fadvise.c

…

fail_page_alloc.c

…

failslab.c

…

filemap.c

mm/filemap: align last_index to folio size

2025-09-21 14:22:15 -07:00

folio-compat.c

mm: Remove grab_cache_page_write_begin()

2025-03-04 17:02:25 +00:00

gup_test.c

…

gup_test.h

…

gup.c

mm/gup: drop nth_page() usage in unpin_user_page_range_dirty_lock()

2025-09-21 14:22:09 -07:00

highmem.c

mm: constify highmem related functions for improved const-correctness

2025-09-21 14:22:15 -07:00

hmm.c

mm/hmm: move pmd_to_hmm_pfn_flags() to the respective #ifdeffery

2025-07-19 18:59:53 -07:00

huge_memory.c

mm/huge_memory: remove enforce_sysfs from __thp_vma_allowable_orders

2025-09-13 16:55:16 -07:00

hugetlb_cgroup.c

page_counter: track failcnt only for legacy cgroups

2025-03-17 00:05:35 -07:00

hugetlb_cma.c

mm: hugetlb: directly pass order when allocate a hugetlb folio

2025-09-21 14:22:11 -07:00

hugetlb_cma.h

mm: hugetlb: directly pass order when allocate a hugetlb folio

2025-09-21 14:22:11 -07:00

hugetlb_vmemmap.c

mm/pagewalk: split walk_page_range_novma() into kernel/user parts

2025-07-09 22:42:05 -07:00

hugetlb_vmemmap.h

mm/hugetlb: do pre-HVO for bootmem allocated pages

2025-03-16 22:06:29 -07:00

hugetlb.c

mm: hugeltb: check NUMA_NO_NODE in only_alloc_fresh_hugetlb_folio()

2025-09-21 14:22:12 -07:00

hwpoison-inject.c

…

init-mm.c

mm: replace vm_lock and detached flag with a reference count

2025-03-16 22:06:20 -07:00

internal.h

mm: sanity-check maximum folio size in folio_set_order()

2025-09-21 14:22:03 -07:00

interval_tree.c

…

ioremap.c

mm/ioremap: pass pgprot_t to ioremap_prot() instead of unsigned long

2025-03-16 22:06:23 -07:00

Kconfig

mm: stop making SPARSEMEM_VMEMMAP user-selectable

2025-09-21 14:22:00 -07:00

Kconfig.debug

mm: rename GENERIC_PTDUMP and PTDUMP_CORE

2025-03-17 00:05:32 -07:00

khugepaged.c

Merge branch 'mm-hotfixes-stable' into mm-stable in order to pick up

2025-09-21 14:19:36 -07:00

kmemleak.c

mm: fix possible deadlock in kmemleak

2025-09-01 17:11:37 -07:00

ksm.c

mm: convert core mm to mm_flags_*() accessors

2025-09-13 16:54:56 -07:00

list_lru.c

mm, list_lru: refactor the locking code

2025-07-09 22:41:56 -07:00

maccess.c

mm: unexport globally copy_to_kernel_nofault

2025-07-09 22:42:22 -07:00

madvise.c

mm/mseal: small cleanups

2025-08-02 12:06:09 -07:00

Makefile

mm: remove unused zpool layer

2025-09-21 14:21:59 -07:00

mapping_dirty_helpers.c

mm: remove redundant pXd_devmap calls

2025-07-09 22:42:17 -07:00

memblock.c

memblock: fix kernel-doc for MEMBLOCK_RSRV_NOINIT

2025-08-26 10:47:03 +03:00

memcontrol-v1.c

memcg: make count_memcg_events re-entrant safe against irqs

2025-05-22 14:55:38 -07:00

memcontrol-v1.h

memcg: move do_memsw_account() to CONFIG_MEMCG_V1

2025-03-21 22:03:11 -07:00

memcontrol.c

memcg: optimize exit to user space

2025-09-13 16:55:01 -07:00

memfd.c

mm/memfd: remove redundant casts

2025-09-21 14:22:00 -07:00

memory_hotplug.c

mm/memory_hotplug: fix hwpoisoned large folio handling in do_migrate_range()

2025-09-03 17:10:35 -07:00

memory-failure.c

Merge branch 'mm-hotfixes-stable' into mm-stable in order to pick up

2025-09-21 14:19:36 -07:00

memory-tiers.c

mm,memory-tiers: use node-notifier instead of memory-notifier

2025-07-13 16:38:15 -07:00

memory.c

mm/huge_memory: respect MADV_COLLAPSE with PR_THP_DISABLE_EXCEPT_ADVISED

2025-09-13 16:55:05 -07:00

mempolicy.c

mm: split folio_pte_batch() into folio_pte_batch() and folio_pte_batch_flags()

2025-07-19 18:59:45 -07:00

mempool.c

mm: mempool: fix crash in mempool_free() for zero-minimum pools

2025-08-02 12:06:13 -07:00

memremap.c

mm/memremap: reject unreasonable folio/compound page sizes in memremap_pages()

2025-09-21 14:22:02 -07:00

memtest.c

…

migrate_device.c

treewide: remove MIGRATEPAGE_SUCCESS

2025-09-13 16:54:50 -07:00

migrate.c

treewide: remove MIGRATEPAGE_SUCCESS

2025-09-13 16:54:50 -07:00

mincore.c

mm/mincore: use a helper for checking the swap cache

2025-09-13 16:54:49 -07:00

mlock.c

mm: folio_may_be_lru_cached() unless folio_test_large()

2025-09-13 13:05:36 -07:00

mm_init.c

mm/mm_init: make memmap_init_compound() look more like prep_compound_page()

2025-09-21 14:22:03 -07:00

mm_slot.h

…

mmap_lock.c

mm: change vma_start_read() to drop RCU lock on failure

2025-09-13 16:54:43 -07:00

mmap.c

mm: convert core mm to mm_flags_*() accessors

2025-09-13 16:54:56 -07:00

mmu_gather.c

mm: remove redundant __GFP_NOWARN

2025-09-13 16:54:58 -07:00

mmu_notifier.c

Update Christoph's Email address and make it consistent

2025-05-12 23:50:31 -07:00

mmzone.c

mm: introduce memdesc_flags_t

2025-09-13 16:55:07 -07:00

mprotect.c

mm: pass page directly instead of using folio_page

2025-08-11 23:00:59 -07:00

mremap.c

mm/mremap: fix regression in vrm->new_addr check

2025-09-08 23:45:11 -07:00

mseal.c

mm/mseal: rework mseal apply logic

2025-08-02 12:06:09 -07:00

msync.c

…

nommu.c

mm/nommu: convert kobjsize() to folios

2025-09-13 16:54:46 -07:00

numa_emulation.c

mm: numa,memblock: Use SZ_1M macro to denote bytes to MB conversion

2025-08-20 16:31:23 +03:00

numa_memblks.c

mm: numa,memblock: Use SZ_1M macro to denote bytes to MB conversion

2025-08-20 16:31:23 +03:00

numa.c

mm/numa: remove unnecessary local variable in alloc_node_data()

2025-05-12 23:50:38 -07:00

oom_kill.c

mm: constify process_shares_mm() for improved const-correctness

2025-09-21 14:22:13 -07:00

page_alloc.c

mm/page_alloc: reject unreasonable folio/compound page sizes in alloc_contig_range_noprof()

2025-09-21 14:22:02 -07:00

page_counter.c

page_counter: track failcnt only for legacy cgroups

2025-03-17 00:05:35 -07:00

page_ext.c

mm,page_ext: derive the node from the pfn

2025-07-13 16:38:16 -07:00

page_frag_cache.c

mm/page_alloc: export free_frozen_pages() instead of free_unref_page()

2025-01-13 22:40:31 -08:00

page_idle.c

sysfs: treewide: switch back to attribute_group::bin_attrs

2025-06-17 10:44:15 +02:00

page_io.c

mm: stop passing a writeback_control structure to swap_writeout

2025-07-09 22:41:58 -07:00

page_isolation.c

mm/page_isolation: drop __folio_test_movable() check for large folios

2025-07-13 16:38:29 -07:00

page_owner.c

mm/page_owner: convert set_page_owner_migrate_reason() to folios

2025-07-19 18:59:57 -07:00

page_poison.c

…

page_reporting.c

…

page_reporting.h

…

page_table_check.c

mm/page_table_check: Batch-check pmds/puds just like ptes

2025-05-09 13:43:07 +01:00

page_vma_mapped.c

mm: remove redundant pXd_devmap calls

2025-07-09 22:42:17 -07:00

page-writeback.c

mm/page-writeback: drop usage of folio_index

2025-09-13 16:55:17 -07:00

pagewalk.c

mm/pagewalk: drop nth_page() usage within folio in folio_walk_start()

2025-09-21 14:22:05 -07:00

percpu-internal.h

…

percpu-km.c

mm/mm/percpu-km: drop nth_page() usage within single allocation

2025-09-21 14:22:04 -07:00

percpu-stats.c

mm: remove outdated filename comment in percpu-stats.c

2025-07-13 16:38:23 -07:00

percpu-vm.c

…

percpu.c

percpu: fix race on alloc failed warning limit

2025-09-08 23:45:10 -07:00

pgalloc-track.h

…

pgtable-generic.c

mm: remove redundant pXd_devmap calls

2025-07-09 22:42:17 -07:00

process_vm_access.c

…

pt_reclaim.c

mm: pgtable: reclaim empty PTE page in madvise(MADV_DONTNEED)

2025-01-13 22:40:48 -08:00

ptdump.c

mm/ptdump: take the memory hotplug lock inside ptdump_walk_pgd()

2025-07-09 22:42:20 -07:00

readahead.c

readahead: use folio_nr_pages() instead of shift operation

2025-07-19 18:59:53 -07:00

rmap.c

mm/rmap: use folio_large_nr_pages() when we are sure it is a large folio

2025-09-13 16:55:15 -07:00

rodata_test.c

mm/rodata_test: verify test data is unchanged, rather than non-zero

2025-01-13 22:40:38 -08:00

secretmem.c

Merge tag 'mm-stable-2025-07-30-15-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

2025-07-31 14:57:54 -07:00

shmem_quota.c

…

shmem.c

mm: constify shmem related test functions for improved const-correctness

2025-09-21 14:22:12 -07:00

show_mem.c

mm: show_mem: show number of zspages in show_free_areas

2025-09-21 14:22:11 -07:00

shrinker_debug.c

mm/shrinker: fix name consistency issue in shrinker_debugfs_rename()

2025-03-17 00:05:40 -07:00

shrinker.c

…

shuffle.c

…

shuffle.h

…

slab_common.c

Update Christoph's Email address and make it consistent

2025-05-12 23:50:31 -07:00

slab.h

slab: use memdesc_nid()

2025-09-13 16:55:08 -07:00

slub.c

slab: use memdesc_flags_t

2025-09-13 16:55:08 -07:00

sparse-vmemmap.c

mm: introduce and use {pgd,p4d}_populate_kernel()

2025-08-27 22:45:44 -07:00

sparse.c

mm: introduce memdesc_nid()

2025-09-13 16:55:07 -07:00

swap_cgroup.c

mm: swap_cgroup: remove double initialization of locals

2025-03-17 22:06:58 -07:00

swap_state.c

mm/mincore, swap: consolidate swap cache checking for mincore

2025-09-13 16:54:49 -07:00

swap.c

Merge branch 'mm-hotfixes-stable' into mm-stable in order to pick up

2025-09-21 14:19:36 -07:00

swap.h

mm/mincore, swap: consolidate swap cache checking for mincore

2025-09-13 16:54:49 -07:00

swapfile.c

mm/swapfile.c: introduce function alloc_swap_scan_list()

2025-09-13 16:55:00 -07:00

truncate.c

Merge tag 'mm-stable-2025-06-01-14-06' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

2025-06-02 16:00:26 -07:00

usercopy.c

mm: security: Check early if HARDENED_USERCOPY is enabled

2025-02-28 11:51:31 -08:00

userfaultfd.c

userfaultfd: opportunistic TLB-flush batching for present pages in MOVE

2025-09-13 16:55:00 -07:00

util.c

mm: constify arch_pick_mmap_layout() for improved const-correctness

2025-09-21 14:22:14 -07:00

vma_exec.c

mm/vma: use vmg->target to specify target VMA for new VMA merge

2025-07-09 22:42:11 -07:00

vma_init.c

mm: fix typos in VMA comments

2025-09-13 16:55:02 -07:00

vma_internal.h

mm/vma: move brk() internals to mm/vma.c

2025-01-13 22:40:42 -08:00

vma.c

Merge tag 'mm-stable-2025-08-03-12-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

2025-08-05 16:02:07 +03:00

vma.h

mm: fix typos in VMA comments

2025-09-13 16:55:02 -07:00

vmalloc.c

Merge branch 'mm-hotfixes-stable' into mm-stable in order to pick up

2025-09-21 14:19:36 -07:00

vmpressure.c

memcg: convert memcg->socket_pressure to u64

2025-07-24 19:12:32 -07:00

vmscan.c

Merge branch 'mm-hotfixes-stable' into mm-stable in order to pick up

2025-09-21 14:19:36 -07:00

vmstat.c

mm: add vmstat for kernel_file pages

2025-09-13 16:55:20 -07:00

workingset.c

mm: introduce memdesc_flags_t

2025-09-13 16:55:07 -07:00

zpdesc.h

mm: zpdesc: minor naming and comment corrections

2025-09-21 14:21:59 -07:00

zsmalloc.c

mm: remove unused zpool layer

2025-09-21 14:21:59 -07:00

zswap.c

mm: zswap: interact directly with zsmalloc

2025-09-21 14:21:58 -07:00