original_kernel

History

Kairui Song 6758c1128c mm/filemap: optimize filemap folio adding Instead of doing multiple tree walks, do one optimism range check with lock hold, and exit if raced with another insertion. If a shadow exists, check it with a new xas_get_order helper before releasing the lock to avoid redundant tree walks for getting its order. Drop the lock and do the allocation only if a split is needed. In the best case, it only need to walk the tree once. If it needs to alloc and split, 3 walks are issued (One for first ranged conflict check and order retrieving, one for the second check after allocation, one for the insert after split). Testing with 4K pages, in an 8G cgroup, with 16G brd as block device: echo 3 > /proc/sys/vm/drop_caches fio -name=cached --numjobs=16 --filename=/mnt/test.img \ --buffered=1 --ioengine=mmap --rw=randread --time_based \ --ramp_time=30s --runtime=5m --group_reporting Before: bw ( MiB/s): min= 1027, max= 3520, per=100.00%, avg=2445.02, stdev=18.90, samples=8691 iops : min=263001, max=901288, avg=625924.36, stdev=4837.28, samples=8691 After (+7.3%): bw ( MiB/s): min= 493, max= 3947, per=100.00%, avg=2625.56, stdev=25.74, samples=8651 iops : min=126454, max=1010681, avg=672142.61, stdev=6590.48, samples=8651 Test result with THP (do a THP randread then switch to 4K page in hope it issues a lot of splitting): echo 3 > /proc/sys/vm/drop_caches fio -name=cached --numjobs=16 --filename=/mnt/test.img \ --buffered=1 --ioengine=mmap -thp=1 --readonly \ --rw=randread --time_based --ramp_time=30s --runtime=10m \ --group_reporting fio -name=cached --numjobs=16 --filename=/mnt/test.img \ --buffered=1 --ioengine=mmap \ --rw=randread --time_based --runtime=5s --group_reporting Before: bw ( KiB/s): min= 4141, max=14202, per=100.00%, avg=7935.51, stdev=96.85, samples=18976 iops : min= 1029, max= 3548, avg=1979.52, stdev=24.23, samples=18976· READ: bw=4545B/s (4545B/s), 4545B/s-4545B/s (4545B/s-4545B/s), io=64.0KiB (65.5kB), run=14419-14419msec After (+12.5%): bw ( KiB/s): min= 4611, max=15370, per=100.00%, avg=8928.74, stdev=105.17, samples=19146 iops : min= 1151, max= 3842, avg=2231.27, stdev=26.29, samples=19146 READ: bw=4635B/s (4635B/s), 4635B/s-4635B/s (4635B/s-4635B/s), io=64.0KiB (65.5kB), run=14137-14137msec The performance is better for both 4K (+7.5%) and THP (+12.5%) cached read. Link: https://lkml.kernel.org/r/20240415171857.19244-5-ryncsn@gmail.com Signed-off-by: Kairui Song <kasong@tencent.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>		2024-04-25 20:56:09 -07:00
..
842	…
crypto	…
dim	…
fonts	fbdev fixes and cleanups for 6.9-rc1:	2024-03-22 10:09:08 -07:00
kunit	…
lz4	…
lzo	…
math	…
pldmfw	…
raid6	…
reed_solomon	…
test_fortify	…
vdso	…
xz	…
zlib_deflate	…
zlib_dfltcc	…
zlib_inflate	…
zstd	…
.gitignore	…
Kconfig	…
Kconfig.debug	lib: add codetag reference into slabobj_ext	2024-04-25 20:55:55 -07:00
Kconfig.kasan	…
Kconfig.kcsan	…
Kconfig.kfence	…
Kconfig.kgdb	…
Kconfig.kmsan	…
Kconfig.ubsan	…
Makefile	lib: add allocation tagging support for memory allocation profiling	2024-04-25 20:55:52 -07:00
alloc_tag.c	alloc_tag: Tighten file permissions on /proc/allocinfo	2024-04-25 20:55:59 -07:00
argv_split.c	…
ashldi3.c	…
ashrdi3.c	…
asn1_decoder.c	…
asn1_encoder.c	…
assoc_array.c	…
atomic64.c	…
atomic64_test.c	…
audit.c	…
base64.c	…
bcd.c	…
bch.c	…
bitfield_kunit.c	…
bitmap-str.c	…
bitmap.c	…
bitrev.c	…
bootconfig-data.S	…
bootconfig.c	…
bsearch.c	…
btree.c	…
bucket_locks.c	…
bug.c	…
build_OID_registry	…
buildid.c	…
bust_spinlocks.c	…
check_signature.c	…
checksum.c	…
checksum_kunit.c	lib: checksum: hide unused expected_csum_ipv6_magic[]	2024-04-08 11:03:05 +01:00
closure.c	…
clz_ctz.c	…
clz_tab.c	…
cmdline.c	…
cmdline_kunit.c	…
cmpdi2.c	…
codetag.c	lib: add memory allocations report in show_mem()	2024-04-25 20:55:57 -07:00
compat_audit.c	…
cpu_rmap.c	…
cpumask.c	…
cpumask_kunit.c	…
crc-ccitt.c	…
crc-itu-t.c	…
crc-t10dif.c	…
crc4.c	…
crc7.c	…
crc8.c	…
crc16.c	…
crc32.c	…
crc32defs.h	…
crc32test.c	…
crc64-rocksoft.c	…
crc64.c	…
ctype.c	…
debug_info.c	…
debug_locks.c	…
debugobjects.c	…
dec_and_lock.c	…
decompress.c	…
decompress_bunzip2.c	…
decompress_inflate.c	…
decompress_unlz4.c	…
decompress_unlzma.c	…
decompress_unlzo.c	…
decompress_unxz.c	…
decompress_unzstd.c	…
devmem_is_allowed.c	…
devres.c	…
dhry.h	…
dhry_1.c	…
dhry_2.c	…
dhry_run.c	…
digsig.c	…
dump_stack.c	…
dynamic_debug.c	…
dynamic_queue_limits.c	…
earlycpio.c	…
errname.c	…
error-inject.c	…
errseq.c	…
extable.c	…
fault-inject-usercopy.c	…
fault-inject.c	…
fdt.c	…
fdt_addresses.c	…
fdt_empty_tree.c	…
fdt_ro.c	…
fdt_rw.c	…
fdt_strerror.c	…
fdt_sw.c	…
fdt_wip.c	…
find_bit.c	…
find_bit_benchmark.c	…
flex_proportions.c	…
fortify_kunit.c	…
fw_table.c	…
gen_crc32table.c	…
gen_crc64table.c	…
genalloc.c	…
generic-radix-tree.c	…
glob.c	…
globtest.c	…
group_cpus.c	…
hashtable_test.c	…
hexdump.c	…
hweight.c	…
idr.c	…
inflate.c	…
interval_tree.c	…
interval_tree_test.c	…
iomap.c	…
iomap_copy.c	…
iommu-helper.c	…
iov_iter.c	…
irq_poll.c	…
irq_regs.c	…
is_signed_type_kunit.c	…
is_single_threaded.c	…
kasprintf.c	…
kfifo.c	…
klist.c	…
kobject.c	…
kobject_uevent.c	…
kstrtox.c	…
kstrtox.h	…
kunit_iov_iter.c	…
libcrc32c.c	…
linear_ranges.c	…
list-test.c	…
list_debug.c	…
list_sort.c	…
llist.c	…
locking-selftest-hardirq.h	…
locking-selftest-mutex.h	…
locking-selftest-rlock-hardirq.h	…
locking-selftest-rlock-softirq.h	…
locking-selftest-rlock.h	…
locking-selftest-rsem.h	…
locking-selftest-rtmutex.h	…
locking-selftest-softirq.h	…
locking-selftest-spin-hardirq.h	…
locking-selftest-spin-softirq.h	…
locking-selftest-spin.h	…
locking-selftest-wlock-hardirq.h	…
locking-selftest-wlock-softirq.h	…
locking-selftest-wlock.h	…
locking-selftest-wsem.h	…
locking-selftest.c	…
lockref.c	…
logic_iomem.c	…
logic_pio.c	…
lru_cache.c	…
lshrdi3.c	…
lwq.c	…
maple_tree.c	…
memcat_p.c	…
memcpy_kunit.c	…
memory-notifier-error-inject.c	…
memregion.c	…
memweight.c	…
muldi3.c	…
net_utils.c	…
netdev-notifier-error-inject.c	…
nlattr.c	…
nmi_backtrace.c	…
notifier-error-inject.c	…
notifier-error-inject.h	…
objagg.c	…
objpool.c	…
of-reconfig-notifier-error-inject.c	…
oid_registry.c	…
once.c	…
overflow_kunit.c	overflow: Change DEFINE_FLEX to take __counted_by member	2024-03-22 16:25:31 -07:00
packing.c	…
parman.c	…
parser.c	…
percpu-refcount.c	…
percpu_counter.c	…
percpu_test.c	…
plist.c	…
pm-notifier-error-inject.c	…
polynomial.c	…
radix-tree.c	…
radix-tree.h	…
random32.c	…
ratelimit.c	…
rbtree.c	…
rbtree_test.c	…
rcuref.c	…
ref_tracker.c	…
refcount.c	…
rhashtable.c	rhashtable: plumb through alloc tag	2024-04-25 20:55:57 -07:00
sbitmap.c	…
scatterlist.c	…
seq_buf.c	…
sg_pool.c	…
sg_split.c	…
siphash.c	…
siphash_kunit.c	…
slub_kunit.c	…
smp_processor_id.c	…
sort.c	…
stackdepot.c	stackdepot: respect __GFP_NOLOCKDEP allocation flag	2024-04-24 19:34:26 -07:00
stackinit_kunit.c	…
stmp_device.c	…
strcat_kunit.c	…
string.c	…
string_helpers.c	…
string_helpers_kunit.c	…
string_kunit.c	…
strncpy_from_user.c	…
strnlen_user.c	…
strscpy_kunit.c	…
syscall.c	…
test-kstrtox.c	…
test_bitmap.c	…
test_bitops.c	…
test_bits.c	…
test_blackhole_dev.c	…
test_bpf.c	…
test_debug_virtual.c	…
test_dynamic_debug.c	…
test_firmware.c	…
test_fprobe.c	…
test_fpu.c	…
test_free_pages.c	…
test_hash.c	…
test_hexdump.c	…
test_hmm.c	lib/test_hmm.c: handle src_pfns and dst_pfns allocation failure	2024-04-25 20:55:48 -07:00
test_hmm_uapi.h	…
test_ida.c	…
test_kmod.c	…
test_kprobes.c	…
test_linear_ranges.c	…
test_list_sort.c	…
test_lockup.c	…
test_maple_tree.c	…
test_memcat_p.c	…
test_meminit.c	…
test_min_heap.c	…
test_module.c	…
test_objagg.c	…
test_objpool.c	…
test_parman.c	…
test_printf.c	…
test_ref_tracker.c	…
test_rhashtable.c	…
test_scanf.c	…
test_sort.c	…
test_static_key_base.c	…
test_static_keys.c	…
test_sysctl.c	…
test_ubsan.c	ubsan: fix unused variable warning in test module	2024-04-03 14:35:57 -07:00
test_user_copy.c	…
test_uuid.c	…
test_vmalloc.c	…
test_xarray.c	mm/filemap: optimize filemap folio adding	2024-04-25 20:56:09 -07:00
textsearch.c	…
timerqueue.c	…
trace_readwrite.c	…
ts_bm.c	…
ts_fsm.c	…
ts_kmp.c	…
ubsan.c	…
ubsan.h	…
ucmpdi2.c	…
ucs2_string.c	…
usercopy.c	…
uuid.c	…
vsprintf.c	…
win_minmax.c	…
xarray.c	lib/xarray: introduce a new helper xas_get_order	2024-04-25 20:56:09 -07:00
xxhash.c	…