original_kernel/lib
Kairui Song 6758c1128c mm/filemap: optimize filemap folio adding
Instead of doing multiple tree walks, do one optimism range check with
lock hold, and exit if raced with another insertion.  If a shadow exists,
check it with a new xas_get_order helper before releasing the lock to
avoid redundant tree walks for getting its order.

Drop the lock and do the allocation only if a split is needed.

In the best case, it only need to walk the tree once.  If it needs to
alloc and split, 3 walks are issued (One for first ranged conflict check
and order retrieving, one for the second check after allocation, one for
the insert after split).

Testing with 4K pages, in an 8G cgroup, with 16G brd as block device:

  echo 3 > /proc/sys/vm/drop_caches

  fio -name=cached --numjobs=16 --filename=/mnt/test.img \
    --buffered=1 --ioengine=mmap --rw=randread --time_based \
    --ramp_time=30s --runtime=5m --group_reporting

Before:
bw (  MiB/s): min= 1027, max= 3520, per=100.00%, avg=2445.02, stdev=18.90, samples=8691
iops        : min=263001, max=901288, avg=625924.36, stdev=4837.28, samples=8691

After (+7.3%):
bw (  MiB/s): min=  493, max= 3947, per=100.00%, avg=2625.56, stdev=25.74, samples=8651
iops        : min=126454, max=1010681, avg=672142.61, stdev=6590.48, samples=8651

Test result with THP (do a THP randread then switch to 4K page in hope it
issues a lot of splitting):

  echo 3 > /proc/sys/vm/drop_caches

  fio -name=cached --numjobs=16 --filename=/mnt/test.img \
      --buffered=1 --ioengine=mmap -thp=1 --readonly \
      --rw=randread --time_based --ramp_time=30s --runtime=10m \
      --group_reporting

  fio -name=cached --numjobs=16 --filename=/mnt/test.img \
      --buffered=1 --ioengine=mmap \
      --rw=randread --time_based --runtime=5s --group_reporting

Before:
bw (  KiB/s): min= 4141, max=14202, per=100.00%, avg=7935.51, stdev=96.85, samples=18976
iops        : min= 1029, max= 3548, avg=1979.52, stdev=24.23, samples=18976·

READ: bw=4545B/s (4545B/s), 4545B/s-4545B/s (4545B/s-4545B/s), io=64.0KiB (65.5kB), run=14419-14419msec

After (+12.5%):
bw (  KiB/s): min= 4611, max=15370, per=100.00%, avg=8928.74, stdev=105.17, samples=19146
iops        : min= 1151, max= 3842, avg=2231.27, stdev=26.29, samples=19146

READ: bw=4635B/s (4635B/s), 4635B/s-4635B/s (4635B/s-4635B/s), io=64.0KiB (65.5kB), run=14137-14137msec

The performance is better for both 4K (+7.5%) and THP (+12.5%) cached read.

Link: https://lkml.kernel.org/r/20240415171857.19244-5-ryncsn@gmail.com
Signed-off-by: Kairui Song <kasong@tencent.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25 20:56:09 -07:00
..
842
crypto
dim
fonts fbdev fixes and cleanups for 6.9-rc1: 2024-03-22 10:09:08 -07:00
kunit
lz4
lzo
math
pldmfw
raid6
reed_solomon
test_fortify
vdso
xz
zlib_deflate
zlib_dfltcc
zlib_inflate
zstd
.gitignore
Kconfig
Kconfig.debug lib: add codetag reference into slabobj_ext 2024-04-25 20:55:55 -07:00
Kconfig.kasan
Kconfig.kcsan
Kconfig.kfence
Kconfig.kgdb
Kconfig.kmsan
Kconfig.ubsan
Makefile lib: add allocation tagging support for memory allocation profiling 2024-04-25 20:55:52 -07:00
alloc_tag.c alloc_tag: Tighten file permissions on /proc/allocinfo 2024-04-25 20:55:59 -07:00
argv_split.c
ashldi3.c
ashrdi3.c
asn1_decoder.c
asn1_encoder.c
assoc_array.c
atomic64.c
atomic64_test.c
audit.c
base64.c
bcd.c
bch.c
bitfield_kunit.c
bitmap-str.c
bitmap.c
bitrev.c
bootconfig-data.S
bootconfig.c
bsearch.c
btree.c
bucket_locks.c
bug.c
build_OID_registry
buildid.c
bust_spinlocks.c
check_signature.c
checksum.c
checksum_kunit.c lib: checksum: hide unused expected_csum_ipv6_magic[] 2024-04-08 11:03:05 +01:00
closure.c
clz_ctz.c
clz_tab.c
cmdline.c
cmdline_kunit.c
cmpdi2.c
codetag.c lib: add memory allocations report in show_mem() 2024-04-25 20:55:57 -07:00
compat_audit.c
cpu_rmap.c
cpumask.c
cpumask_kunit.c
crc-ccitt.c
crc-itu-t.c
crc-t10dif.c
crc4.c
crc7.c
crc8.c
crc16.c
crc32.c
crc32defs.h
crc32test.c
crc64-rocksoft.c
crc64.c
ctype.c
debug_info.c
debug_locks.c
debugobjects.c
dec_and_lock.c
decompress.c
decompress_bunzip2.c
decompress_inflate.c
decompress_unlz4.c
decompress_unlzma.c
decompress_unlzo.c
decompress_unxz.c
decompress_unzstd.c
devmem_is_allowed.c
devres.c
dhry.h
dhry_1.c
dhry_2.c
dhry_run.c
digsig.c
dump_stack.c
dynamic_debug.c
dynamic_queue_limits.c
earlycpio.c
errname.c
error-inject.c
errseq.c
extable.c
fault-inject-usercopy.c
fault-inject.c
fdt.c
fdt_addresses.c
fdt_empty_tree.c
fdt_ro.c
fdt_rw.c
fdt_strerror.c
fdt_sw.c
fdt_wip.c
find_bit.c
find_bit_benchmark.c
flex_proportions.c
fortify_kunit.c
fw_table.c
gen_crc32table.c
gen_crc64table.c
genalloc.c
generic-radix-tree.c
glob.c
globtest.c
group_cpus.c
hashtable_test.c
hexdump.c
hweight.c
idr.c
inflate.c
interval_tree.c
interval_tree_test.c
iomap.c
iomap_copy.c
iommu-helper.c
iov_iter.c
irq_poll.c
irq_regs.c
is_signed_type_kunit.c
is_single_threaded.c
kasprintf.c
kfifo.c
klist.c
kobject.c
kobject_uevent.c
kstrtox.c
kstrtox.h
kunit_iov_iter.c
libcrc32c.c
linear_ranges.c
list-test.c
list_debug.c
list_sort.c
llist.c
locking-selftest-hardirq.h
locking-selftest-mutex.h
locking-selftest-rlock-hardirq.h
locking-selftest-rlock-softirq.h
locking-selftest-rlock.h
locking-selftest-rsem.h
locking-selftest-rtmutex.h
locking-selftest-softirq.h
locking-selftest-spin-hardirq.h
locking-selftest-spin-softirq.h
locking-selftest-spin.h
locking-selftest-wlock-hardirq.h
locking-selftest-wlock-softirq.h
locking-selftest-wlock.h
locking-selftest-wsem.h
locking-selftest.c
lockref.c
logic_iomem.c
logic_pio.c
lru_cache.c
lshrdi3.c
lwq.c
maple_tree.c
memcat_p.c
memcpy_kunit.c
memory-notifier-error-inject.c
memregion.c
memweight.c
muldi3.c
net_utils.c
netdev-notifier-error-inject.c
nlattr.c
nmi_backtrace.c
notifier-error-inject.c
notifier-error-inject.h
objagg.c
objpool.c
of-reconfig-notifier-error-inject.c
oid_registry.c
once.c
overflow_kunit.c overflow: Change DEFINE_FLEX to take __counted_by member 2024-03-22 16:25:31 -07:00
packing.c
parman.c
parser.c
percpu-refcount.c
percpu_counter.c
percpu_test.c
plist.c
pm-notifier-error-inject.c
polynomial.c
radix-tree.c
radix-tree.h
random32.c
ratelimit.c
rbtree.c
rbtree_test.c
rcuref.c
ref_tracker.c
refcount.c
rhashtable.c rhashtable: plumb through alloc tag 2024-04-25 20:55:57 -07:00
sbitmap.c
scatterlist.c
seq_buf.c
sg_pool.c
sg_split.c
siphash.c
siphash_kunit.c
slub_kunit.c
smp_processor_id.c
sort.c
stackdepot.c stackdepot: respect __GFP_NOLOCKDEP allocation flag 2024-04-24 19:34:26 -07:00
stackinit_kunit.c
stmp_device.c
strcat_kunit.c
string.c
string_helpers.c
string_helpers_kunit.c
string_kunit.c
strncpy_from_user.c
strnlen_user.c
strscpy_kunit.c
syscall.c
test-kstrtox.c
test_bitmap.c
test_bitops.c
test_bits.c
test_blackhole_dev.c
test_bpf.c
test_debug_virtual.c
test_dynamic_debug.c
test_firmware.c
test_fprobe.c
test_fpu.c
test_free_pages.c
test_hash.c
test_hexdump.c
test_hmm.c lib/test_hmm.c: handle src_pfns and dst_pfns allocation failure 2024-04-25 20:55:48 -07:00
test_hmm_uapi.h
test_ida.c
test_kmod.c
test_kprobes.c
test_linear_ranges.c
test_list_sort.c
test_lockup.c
test_maple_tree.c
test_memcat_p.c
test_meminit.c
test_min_heap.c
test_module.c
test_objagg.c
test_objpool.c
test_parman.c
test_printf.c
test_ref_tracker.c
test_rhashtable.c
test_scanf.c
test_sort.c
test_static_key_base.c
test_static_keys.c
test_sysctl.c
test_ubsan.c ubsan: fix unused variable warning in test module 2024-04-03 14:35:57 -07:00
test_user_copy.c
test_uuid.c
test_vmalloc.c
test_xarray.c mm/filemap: optimize filemap folio adding 2024-04-25 20:56:09 -07:00
textsearch.c
timerqueue.c
trace_readwrite.c
ts_bm.c
ts_fsm.c
ts_kmp.c
ubsan.c
ubsan.h
ucmpdi2.c
ucs2_string.c
usercopy.c
uuid.c
vsprintf.c
win_minmax.c
xarray.c lib/xarray: introduce a new helper xas_get_order 2024-04-25 20:56:09 -07:00
xxhash.c