linux-stable-rt/kernel
Al Viro 8f7b0ba1c8 Fix inotify watch removal/umount races
Inotify watch removals suck violently.

To kick the watch out we need (in this order) inode->inotify_mutex and
ih->mutex.  That's fine if we have a hold on inode; however, for all
other cases we need to make damn sure we don't race with umount.  We can
*NOT* just grab a reference to a watch - inotify_unmount_inodes() will
happily sail past it and we'll end with reference to inode potentially
outliving its superblock.

Ideally we just want to grab an active reference to superblock if we
can; that will make sure we won't go into inotify_umount_inodes() until
we are done.  Cleanup is just deactivate_super().

However, that leaves a messy case - what if we *are* racing with
umount() and active references to superblock can't be acquired anymore?
We can bump ->s_count, grab ->s_umount, which will almost certainly wait
until the superblock is shut down and the watch in question is pining
for fjords.  That's fine, but there is a problem - we might have hit the
window between ->s_active getting to 0 / ->s_count - below S_BIAS (i.e.
the moment when superblock is past the point of no return and is heading
for shutdown) and the moment when deactivate_super() acquires
->s_umount.

We could just do drop_super() yield() and retry, but that's rather
antisocial and this stuff is luser-triggerable.  OTOH, having grabbed
->s_umount and having found that we'd got there first (i.e.  that
->s_root is non-NULL) we know that we won't race with
inotify_umount_inodes().

So we could grab a reference to watch and do the rest as above, just
with drop_super() instead of deactivate_super(), right? Wrong.  We had
to drop ih->mutex before we could grab ->s_umount.  So the watch
could've been gone already.

That still can be dealt with - we need to save watch->wd, do idr_find()
and compare its result with our pointer.  If they match, we either have
the damn thing still alive or we'd lost not one but two races at once,
the watch had been killed and a new one got created with the same ->wd
at the same address.  That couldn't have happened in inotify_destroy(),
but inotify_rm_wd() could run into that.  Still, "new one got created"
is not a problem - we have every right to kill it or leave it alone,
whatever's more convenient.

So we can use idr_find(...) == watch && watch->inode->i_sb == sb as
"grab it and kill it" check.  If it's been our original watch, we are
fine, if it's a newcomer - nevermind, just pretend that we'd won the
race and kill the fscker anyway; we are safe since we know that its
superblock won't be going away.

And yes, this is far beyond mere "not very pretty"; so's the entire
concept of inotify to start with.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-15 12:26:44 -08:00
..
irq irq: make variable static 2008-10-22 07:37:17 +02:00
power PM_TEST_SUSPEND should depend on RTC_CLASS, not RTC_LIB 2008-11-01 12:40:38 -07:00
time nohz: disable tick_nohz_kick_tick() for now 2008-11-10 22:39:27 +01:00
trace Merge branch 'devel' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/urgent 2008-11-11 09:16:20 +01:00
.gitignore
Kconfig.freezer container freezer: implement freezer cgroup subsystem 2008-10-20 08:52:34 -07:00
Kconfig.hz
Kconfig.preempt
Makefile Merge branch 'tracing/ftrace' into tracing/urgent 2008-10-22 09:08:14 +02:00
acct.c
audit.c
audit.h
audit_tree.c Fix inotify watch removal/umount races 2008-11-15 12:26:44 -08:00
auditfilter.c Fix inotify watch removal/umount races 2008-11-15 12:26:44 -08:00
auditsc.c
backtracetest.c
bounds.c
capability.c
cgroup.c cgroups: fix invalid cgrp->dentry before cgroup has been completely removed 2008-11-06 15:41:19 -08:00
cgroup_debug.c cgroups: fix probable race with put_css_set[_taskexit] and find_css_set 2008-10-20 08:52:38 -07:00
cgroup_freezer.c freezer_cg: disable writing freezer.state of root cgroup 2008-11-12 17:17:16 -08:00
compat.c Merge branches 'timers/clocksource', 'timers/hrtimers', 'timers/nohz', 'timers/ntp', 'timers/posixtimers' and 'timers/debug' into v28-timers-for-linus 2008-10-20 13:14:06 +02:00
configs.c kernel/configs.c: remove useless comments 2008-10-20 08:52:34 -07:00
cpu.c cpumask: introduce new API, without changing anything 2008-11-06 09:05:33 +01:00
cpuset.c cpuset: use seq_*mask_* to print masks 2008-10-20 08:52:39 -07:00
delayacct.c
dma-coherent.c
dma.c
exec_domain.c proc: move /proc/execdomains to kernel/exec_domain.c 2008-10-23 14:30:41 +04:00
exit.c Move "exit_robust_list" into mm_release() 2008-11-15 10:20:36 -08:00
extable.c
fork.c Move "exit_robust_list" into mm_release() 2008-11-15 10:20:36 -08:00
freezer.c freezer_cg: use thaw_process() in unfreeze_cgroup() 2008-10-30 11:38:45 -07:00
futex.c
futex_compat.c
hrtimer.c hrtimer: clean up unused callback modes 2008-11-12 09:54:40 +01:00
itimer.c
kallsyms.c
kexec.c kexec: fix crash_save_vmcoreinfo_init build problem 2008-10-20 15:28:50 -07:00
kfifo.c
kgdb.c
kmod.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus 2008-10-16 12:38:34 -07:00
kprobes.c kernel/kprobes.c: don't pad kretprobe_table_locks[] on uniprocessor builds 2008-11-12 17:17:17 -08:00
ksysfs.c
kthread.c Merge branch 'tracing-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2008-10-20 13:35:07 -07:00
latencytop.c
lockdep.c lockdep: fix irqs on/off ip tracing 2008-10-28 11:19:07 +01:00
lockdep_internals.h
lockdep_proc.c
marker.c
module.c Merge branch 'proc' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc 2008-10-23 12:04:37 -07:00
mutex-debug.c
mutex-debug.h
mutex.c
mutex.h
notifier.c
ns_cgroup.c
nsproxy.c
panic.c Make panic= and panic_on_oops into core_params 2008-10-22 10:00:25 +11:00
params.c Fix compile warning in kernel/params.c 2008-10-23 12:09:00 -07:00
pid.c
pid_namespace.c
pm_qos_params.c
posix-cpu-timers.c
posix-timers.c Merge branch 'timers/range-hrtimers' into v28-range-hrtimers-for-linus-v2 2008-10-22 09:48:06 +02:00
printk.c printk: remove unused code from kernel/printk.c 2008-10-23 21:54:29 +02:00
profile.c kernel/profile: fix profile_init() section mismatch 2008-10-30 11:38:46 -07:00
ptrace.c make ptrace_untrace() static 2008-10-20 08:52:39 -07:00
rcuclassic.c
rcupdate.c rcupdate: fix bug of rcu_barrier*() 2008-10-21 15:59:53 +02:00
rcupreempt.c byteorder: remove direct includes of linux/byteorder/swab[b].h 2008-10-20 08:52:40 -07:00
rcupreempt_trace.c
rcutorture.c byteorder: remove direct includes of linux/byteorder/swab[b].h 2008-10-20 12:51:53 -07:00
relay.c
res_counter.c
resource.c reserve_region_with_split: Fix GFP_KERNEL usage under spinlock 2008-11-01 09:53:58 -07:00
rtmutex-debug.c
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c
rtmutex.h
rtmutex_common.h
rwsem.c
sched.c sched: fix init_idle()'s use of sched_clock() 2008-11-12 20:05:50 +01:00
sched_clock.c
sched_cpupri.c
sched_cpupri.h
sched_debug.c sched: clean up debug info 2008-11-10 10:51:51 +01:00
sched_fair.c sched: release buddies on yield 2008-11-11 11:57:22 +01:00
sched_features.h sched: backward looking buddy 2008-11-05 10:30:14 +01:00
sched_idletask.c sched: add CONFIG_SMP consistency 2008-10-22 10:01:52 +02:00
sched_rt.c Merge commit 'v2.6.28-rc1' into sched/urgent 2008-10-24 12:48:46 +02:00
sched_stats.h Merge branch 'proc' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc 2008-10-23 12:04:37 -07:00
seccomp.c
semaphore.c
signal.c 'kill sig -1' must only apply to caller's namespace 2008-10-30 11:38:46 -07:00
smp.c generic-ipi: fix the smp_mb() placement 2008-11-06 08:41:56 +01:00
softirq.c irq: call __irq_enter() before calling the tick_idle_check 2008-11-10 22:36:39 +01:00
softlockup.c
spinlock.c
srcu.c
stacktrace.c
stop_machine.c Revert "Call init_workqueues before pre smp initcalls." 2008-10-25 19:53:38 -07:00
sys.c Merge branch 'timers/range-hrtimers' into v28-range-hrtimers-for-linus-v2 2008-10-22 09:48:06 +02:00
sys_ni.c
sysctl.c Merge commit 'v2.6.28-rc2' into tracing/urgent 2008-10-27 10:50:54 +01:00
sysctl_check.c
taskstats.c
test_kprobes.c
time.c
timeconst.pl
timer.c Add round_jiffies_up and related routines 2008-11-06 08:42:48 +01:00
tracepoint.c tracepoint: check if the probe has been registered 2008-10-27 16:45:46 +01:00
tsacct.c
uid16.c
user.c
user_namespace.c
utsname.c
utsname_sysctl.c
wait.c
workqueue.c cpumask: introduce new API, without changing anything 2008-11-06 09:05:33 +01:00