124 lines
4.6 KiB
ReStructuredText
124 lines
4.6 KiB
ReStructuredText
.. SPDX-License-Identifier: GPL-2.0
|
|
|
|
===================================
|
|
File management in the Linux kernel
|
|
===================================
|
|
|
|
This document describes how locking for files (struct file)
|
|
and file descriptor table (struct files) works.
|
|
|
|
Up until 2.6.12, the file descriptor table has been protected
|
|
with a lock (files->file_lock) and reference count (files->count).
|
|
->file_lock protected accesses to all the file related fields
|
|
of the table. ->count was used for sharing the file descriptor
|
|
table between tasks cloned with CLONE_FILES flag. Typically
|
|
this would be the case for posix threads. As with the common
|
|
refcounting model in the kernel, the last task doing
|
|
a put_files_struct() frees the file descriptor (fd) table.
|
|
The files (struct file) themselves are protected using
|
|
reference count (->f_count).
|
|
|
|
In the new lock-free model of file descriptor management,
|
|
the reference counting is similar, but the locking is
|
|
based on RCU. The file descriptor table contains multiple
|
|
elements - the fd sets (open_fds and close_on_exec, the
|
|
array of file pointers, the sizes of the sets and the array
|
|
etc.). In order for the updates to appear atomic to
|
|
a lock-free reader, all the elements of the file descriptor
|
|
table are in a separate structure - struct fdtable.
|
|
files_struct contains a pointer to struct fdtable through
|
|
which the actual fd table is accessed. Initially the
|
|
fdtable is embedded in files_struct itself. On a subsequent
|
|
expansion of fdtable, a new fdtable structure is allocated
|
|
and files->fdtab points to the new structure. The fdtable
|
|
structure is freed with RCU and lock-free readers either
|
|
see the old fdtable or the new fdtable making the update
|
|
appear atomic. Here are the locking rules for
|
|
the fdtable structure -
|
|
|
|
1. All references to the fdtable must be done through
|
|
the files_fdtable() macro::
|
|
|
|
struct fdtable *fdt;
|
|
|
|
rcu_read_lock();
|
|
|
|
fdt = files_fdtable(files);
|
|
....
|
|
if (n <= fdt->max_fds)
|
|
....
|
|
...
|
|
rcu_read_unlock();
|
|
|
|
files_fdtable() uses rcu_dereference() macro which takes care of
|
|
the memory barrier requirements for lock-free dereference.
|
|
The fdtable pointer must be read within the read-side
|
|
critical section.
|
|
|
|
2. Reading of the fdtable as described above must be protected
|
|
by rcu_read_lock()/rcu_read_unlock().
|
|
|
|
3. For any update to the fd table, files->file_lock must
|
|
be held.
|
|
|
|
4. To look up the file structure given an fd, a reader
|
|
must use either lookup_fdget_rcu() or files_lookup_fdget_rcu() APIs. These
|
|
take care of barrier requirements due to lock-free lookup.
|
|
|
|
An example::
|
|
|
|
struct file *file;
|
|
|
|
rcu_read_lock();
|
|
file = lookup_fdget_rcu(fd);
|
|
rcu_read_unlock();
|
|
if (file) {
|
|
...
|
|
fput(file);
|
|
}
|
|
....
|
|
|
|
5. Since both fdtable and file structures can be looked up
|
|
lock-free, they must be installed using rcu_assign_pointer()
|
|
API. If they are looked up lock-free, rcu_dereference()
|
|
must be used. However it is advisable to use files_fdtable()
|
|
and lookup_fdget_rcu()/files_lookup_fdget_rcu() which take care of these
|
|
issues.
|
|
|
|
6. While updating, the fdtable pointer must be looked up while
|
|
holding files->file_lock. If ->file_lock is dropped, then
|
|
another thread expand the files thereby creating a new
|
|
fdtable and making the earlier fdtable pointer stale.
|
|
|
|
For example::
|
|
|
|
spin_lock(&files->file_lock);
|
|
fd = locate_fd(files, file, start);
|
|
if (fd >= 0) {
|
|
/* locate_fd() may have expanded fdtable, load the ptr */
|
|
fdt = files_fdtable(files);
|
|
__set_open_fd(fd, fdt);
|
|
__clear_close_on_exec(fd, fdt);
|
|
spin_unlock(&files->file_lock);
|
|
.....
|
|
|
|
Since locate_fd() can drop ->file_lock (and reacquire ->file_lock),
|
|
the fdtable pointer (fdt) must be loaded after locate_fd().
|
|
|
|
On newer kernels rcu based file lookup has been switched to rely on
|
|
SLAB_TYPESAFE_BY_RCU instead of call_rcu(). It isn't sufficient anymore
|
|
to just acquire a reference to the file in question under rcu using
|
|
atomic_long_inc_not_zero() since the file might have already been
|
|
recycled and someone else might have bumped the reference. In other
|
|
words, callers might see reference count bumps from newer users. For
|
|
this is reason it is necessary to verify that the pointer is the same
|
|
before and after the reference count increment. This pattern can be seen
|
|
in get_file_rcu() and __files_get_rcu().
|
|
|
|
In addition, it isn't possible to access or check fields in struct file
|
|
without first acquiring a reference on it under rcu lookup. Not doing
|
|
that was always very dodgy and it was only usable for non-pointer data
|
|
in struct file. With SLAB_TYPESAFE_BY_RCU it is necessary that callers
|
|
either first acquire a reference or they must hold the files_lock of the
|
|
fdtable.
|