linux-stable-rt/kernel
Andrea Arcangeli ba76149f47 thp: khugepaged
Add khugepaged to relocate fragmented pages into hugepages if new
hugepages become available.  (this is indipendent of the defrag logic that
will have to make new hugepages available)

The fundamental reason why khugepaged is unavoidable, is that some memory
can be fragmented and not everything can be relocated.  So when a virtual
machine quits and releases gigabytes of hugepages, we want to use those
freely available hugepages to create huge-pmd in the other virtual
machines that may be running on fragmented memory, to maximize the CPU
efficiency at all times.  The scan is slow, it takes nearly zero cpu time,
except when it copies data (in which case it means we definitely want to
pay for that cpu time) so it seems a good tradeoff.

In addition to the hugepages being released by other process releasing
memory, we have the strong suspicion that the performance impact of
potentially defragmenting hugepages during or before each page fault could
lead to more performance inconsistency than allocating small pages at
first and having them collapsed into large pages later...  if they prove
themselfs to be long lived mappings (khugepaged scan is slow so short
lived mappings have low probability to run into khugepaged if compared to
long lived mappings).

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:43 -08:00
..
debug
gcov
irq
power
time
trace
.gitignore
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
Makefile
acct.c
async.c
audit.c
audit.h
audit_tree.c
audit_watch.c
auditfilter.c
auditsc.c
backtracetest.c
bounds.c
capability.c
cgroup.c
cgroup_freezer.c
compat.c
configs.c
cpu.c
cpuset.c
cred.c
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c
extable.c
fork.c
freezer.c
futex.c
futex_compat.c
groups.c
hrtimer.c
hung_task.c
hw_breakpoint.c
irq_work.c
itimer.c
jump_label.c
kallsyms.c
kexec.c
kfifo.c
kmod.c
kprobes.c
ksysfs.c
kthread.c
latencytop.c
lockdep.c
lockdep_internals.h
lockdep_proc.c
lockdep_states.h
module.c
mutex-debug.c
mutex-debug.h
mutex.c
mutex.h
notifier.c
ns_cgroup.c
nsproxy.c
padata.c
panic.c
params.c
perf_event.c
pid.c
pid_namespace.c
pm_qos_params.c
posix-cpu-timers.c
posix-timers.c
printk.c
profile.c
ptrace.c
range.c
rcupdate.c
rcutiny.c
rcutiny_plugin.h
rcutorture.c
rcutree.c
rcutree.h
rcutree_plugin.h
rcutree_trace.c
relay.c
res_counter.c
resource.c
rtmutex-debug.c
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c
rtmutex.h
rtmutex_common.h
rwsem.c
sched.c
sched_autogroup.c
sched_autogroup.h
sched_clock.c
sched_cpupri.c
sched_cpupri.h
sched_debug.c
sched_fair.c
sched_features.h
sched_idletask.c
sched_rt.c
sched_stats.h
sched_stoptask.c
seccomp.c
semaphore.c
signal.c
smp.c
softirq.c
spinlock.c
srcu.c
stacktrace.c
stop_machine.c
sys.c
sys_ni.c
sysctl.c
sysctl_binary.c
sysctl_check.c
taskstats.c
test_kprobes.c
time.c
timeconst.pl
timer.c
tracepoint.c
tsacct.c
uid16.c
up.c
user-return-notifier.c
user.c
user_namespace.c
utsname.c
utsname_sysctl.c
wait.c
watchdog.c
workqueue.c
workqueue_sched.h