Something has changed in the core kernel such that we now get concurrent
inode write outs, one e.g via pdflush and one via sys_sync or whatever.
This causes a nasty deadlock in ntfs. The only clean solution
unfortunately requires a minor vfs api extension.
First the deadlock analysis:
Prerequisive knowledge: NTFS has a file $MFT (inode 0) loaded at mount
time. The NTFS driver uses the page cache for storing the file contents as
usual. More interestingly this file contains the table of on-disk inodes
as a sequence of MFT_RECORDs. Thus NTFS driver accesses the on-disk inodes
by accessing the MFT_RECORDs in the page cache pages of the loaded inode
$MFT.
The situation: VFS inode X on a mounted ntfs volume is dirty. For same
inode X, the ntfs_inode is dirty and thus corresponding on-disk inode,
which is as explained above in a dirty PAGE_CACHE_PAGE belonging to the
table of inodes ($MFT, inode 0).
What happens:
Process 1: sys_sync()/umount()/whatever... calls __sync_single_inode() for
$MFT -> do_writepages() -> write_page for the dirty page containing the
on-disk inode X, the page is now locked -> ntfs_write_mst_block() which
clears PageUptodate() on the page to prevent anyone else getting hold of it
whilst it does the write out (this is necessary as the on-disk inode needs
"fixups" applied before the write to disk which are removed again after the
write and PageUptodate is then set again). It then analyses the page
looking for dirty on-disk inodes and when it finds one it calls
ntfs_may_write_mft_record() to see if it is safe to write this on-disk
inode. This then calls ilookup5() to check if the corresponding VFS inode
is in icache(). This in turn calls ifind() which waits on the inode lock
via wait_on_inode whilst holding the global inode_lock.
Process 2: pdflush results in a call to __sync_single_inode for the same
VFS inode X on the ntfs volume. This locks the inode (I_LOCK) then calls
write-inode -> ntfs_write_inode -> map_mft_record() -> read_cache_page() of
the page (in page cache of table of inodes $MFT, inode 0) containing the
on-disk inode. This page has PageUptodate() clear because of Process 1
(see above) so read_cache_page() blocks when tries to take the page lock
for the page so it can call ntfs_read_page().
Thus Process 1 is holding the page lock on the page containing the on-disk
inode X and it is waiting on the inode X to be unlocked in ifind() so it
can write the page out and then unlock the page.
And Process 2 is holding the inode lock on inode X and is waiting for the
page to be unlocked so it can call ntfs_readpage() or discover that
Process 1 set PageUptodate() again and use the page.
Thus we have a deadlock due to ifind() waiting on the inode lock.
The only sensible solution: NTFS does not care whether the VFS inode is
locked or not when it calls ilookup5() (it doesn't use the VFS inode at
all, it just uses it to find the corresponding ntfs_inode which is of
course attached to the VFS inode (both are one single struct); and it uses
the ntfs_inode which is subject to its own locking so I_LOCK is irrelevant)
hence we want a modified ilookup5_nowait() which is the same as ilookup5()
but it does not wait on the inode lock.
Without such functionality I would have to keep my own ntfs_inode cache in
the NTFS driver just so I can find ntfs_inodes independent of their VFS
inodes which would be slow, memory and cpu cycle wasting, and incredibly
stupid given the icache already exists in the VFS.
Below is a patch that does the ilookup5_nowait() implementation in
fs/inode.c and exports it.
ilookup5_nowait.diff:
Introduce ilookup5_nowait() which is basically the same as ilookup5() but
it does not wait on the inode's lock (i.e. it omits the wait_on_inode()
done in ifind()).
This is needed to avoid a nasty deadlock in NTFS.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This rearranges the event ordering for "open" to be consistent with the
ordering of the other events.
Signed-off-by: Robert Love <rml@novell.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This moves the inotify sysctl knobs to "/proc/sys/fs/inotify" from
"/proc/sys/fs". Also some related cleanup.
Signed-off-by: Robert Love <rml@novell.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
inotify is intended to correct the deficiencies of dnotify, particularly
its inability to scale and its terrible user interface:
* dnotify requires the opening of one fd per each directory
that you intend to watch. This quickly results in too many
open files and pins removable media, preventing unmount.
* dnotify is directory-based. You only learn about changes to
directories. Sure, a change to a file in a directory affects
the directory, but you are then forced to keep a cache of
stat structures.
* dnotify's interface to user-space is awful. Signals?
inotify provides a more usable, simple, powerful solution to file change
notification:
* inotify's interface is a system call that returns a fd, not SIGIO.
You get a single fd, which is select()-able.
* inotify has an event that says "the filesystem that the item
you were watching is on was unmounted."
* inotify can watch directories or files.
Inotify is currently used by Beagle (a desktop search infrastructure),
Gamin (a FAM replacement), and other projects.
See Documentation/filesystems/inotify.txt.
Signed-off-by: Robert Love <rml@novell.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This was a pure indentation change, using:
scripts/Lindent fs/reiserfs/*.c include/linux/reiserfs_*.h
to make reiserfs match the regular Linux indentation style. As Jeff
Mahoney <jeffm@suse.com> writes:
The ReiserFS code is a mix of a number of different coding styles, sometimes
different even from line-to-line. Since the code has been relatively stable
for quite some time and there are few outstanding patches to be applied, it
is time to reformat the code to conform to the Linux style standard outlined
in Documentation/CodingStyle.
This patch contains the result of running scripts/Lindent against
fs/reiserfs/*.c and include/linux/reiserfs_*.h. There are places where the
code can be made to look better, but I'd rather keep those patches separate
so that there isn't a subtle by-hand hand accident in the middle of a huge
patch. To be clear: This patch is reformatting *only*.
A number of patches may follow that continue to make the code more consistent
with the Linux coding style.
Hans wasn't particularly enthusiastic about these patches, but said he
wouldn't really oppose them either.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Fixed a trouble on tuner-core that generates erros on computers with more
than one TV card.
- Rename tuner structures fields.
- Tail spaces removed.
- I2C cleanups and converged to a basic reference structure.
- Removed unused structures.
- Fix setting frequency on tda8290.
- Added code for TEA5767 autodetection.
- Standby mode support implemented. It is used to disable
a non used tuner. Currenlty implemented on tea5767.
- New macro: set_type disables other tuner when changing mode.
- Some cleanups.
- Use 50 kHz step when tunning radio for most tuners to improve precision.
Signed-off-by: Fabien Perrot <perrot1983@yahoo.fr>
Signed-off-by: Michael Krufky <mkrufky@m1k.net>
Signed-off-By: Nickolay V. Shmyrev <nshmyrev@yandex.ru>
Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Add new Typhoon DVB-T Cardbus.
- DVB-T support for MD7134 cardbus and the PCI variants
- initial DVB-T support for Lifeview Flydvb-t duo
- DVB-T support for Philips TOUGH reference design
- Don't turn off the xtal output of tda8274/75 in sleep mode
- Let Kconfig decide whether to include frontend-specific code in saa7134-dvb.
- Removed unused structures.
Signed-off-by: Juergen Orschiedt <jorschiedt@web.de>
Signed-off-by: Michael Krufky <mkrufky@m1k.net>
Signed-off-by: Hartmut Hackmann <hartmut.hackmann@t-online.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The following patch prevents the crash dump helper code found within kexec
from breaking ppc which still lacks crash dump functionality.
ksysfs crash_notes attribute handling was left under CONFIG_KEXEC for
simplicity although it is not strictly kexec related.
We provide here a dummy definition for crash_notes on ppc.
Signed-off-by: Albert Herranz <albert_herranz@yahoo.es>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
free_pages_and_swap_cache() and free_page_and_swap_cache() use release_pages()
and page_cache_release() respectively, so make sure that we have the
declarations in scope.
Cc: Olaf Hering <olh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix a problem with ext3 mount option parsing. When remount of a filesystem
fails, old options are now restored.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes a fairly serious tlb flushing bug that makes aio use under
uml very unreliable -- SEGVs, Oops and panic()s occur as a result of stale
tlb entires being used by uml when aio switches mms due to the fact that
uml does not implement the activate_mm() hook. This patch introduces a
simple but correct approach (read: hammer) for implementing activate_mm()
in uml by doing a force_flush_all() if the new mm is different from old.
With this patch in place, uml is able to succeed at the aio test case that
was randomly faulting for me before.
Cc: Jeff Dike <jdike@addtoit.com>
Cc: <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
kernel/power/disk.c needs a declaration of name_to_dev_t() in scope. mount.h
seems like an appropriate choice.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
tr_type_trans(), hippi_type_trans() left as-is.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds another CDC descriptor type to <linux/usb_cdc.h>; the main claim
to fame for this is that some Motorola phones include it. It's not currently
needed by any driver code; included for completeness.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Greg,
This patch fixes the kmalloc() flags argument type in USB
subsystem; hopefully all of its occurences. The patch was
made against patch-2.6.12-git2 from Jun 20.
Cleanup of flags for kmalloc() in USB subsystem.
Signed-off-by: Olav Kongas <ok@artecdesign.ee>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fixed three cases in the interpreter where an "index"
argument to an ASL function was still (internally) 32
bits instead of the required 64 bits. This was the Index
argument to the Index, Mid, and Match operators.
The "strupr" function is now permanently local
(acpi_ut_strupr), since this is not a POSIX-defined
function and not present in most kernel-level C
libraries. References to the C library strupr function
have been removed from the headers.
Completed the deployment of static
functions/prototypes. All prototypes with the static
attribute have been moved from the headers to the owning
C file.
ACPICA 20050329 from Bob Moore
An error is now generated if an attempt is made to create
a Buffer Field of length zero (A CreateField with a length
operand of zero.)
The interpreter now issues a warning whenever executable
code at the module level is detected during ACPI table
load. This will give some idea of the prevalence of this
type of code.
Implemented support for references to named objects (other
than control methods) within package objects.
Enhanced package object output for the debug
object. Package objects are now completely dumped, showing
all elements.
Enhanced miscellaneous object output for the debug
object. Any object can now be written to the debug object
(for example, a device object can be written, and the type
of the object will be displayed.)
The "static" qualifier has been added to all local
functions across the core subsystem.
The number of "long" lines (> 80 chars) within the source
has been significantly reduced, by about 1/3.
Cleaned up all header files to ensure that all CA/iASL
functions are prototyped (even static functions) and the
formatting is consistent.
Two new header files have been added, acopcode.h and
acnames.h.
Removed several obsolete functions that were no longer
used.
Signed-off-by: Len Brown <len.brown@intel.com>
http://bugme.osdl.org/show_bug.cgi?id=4016
Written-by: David Shaohua Li <shaohua.li@intel.com>
Acked-by: Adam Belay <abelay@novell.com>
Signed-off-by: Len Brown <len.brown@intel.com>
ACPI 3.0 added a Correctable Platform Error Interrupt (CPEI)
Processor Overide flag to MADT.Platform_Interrupt_Source.
Record the processor that was provided as hint from ACPI.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Implement the framework for binding physical devices
with ACPI devices. A physical bus like PCI bus
should create a 'acpi_bus_type', with:
.find_device:
For device which has parent such as normal PCI devices.
.find_bridge:
It's for special devices, such as PCI root bridge
or IDE controller. Such devices generally haven't a
parent or ->bus. We use the special method
to get an ACPI handle.
Uses new field in struct device: firmware_data
http://bugzilla.kernel.org/show_bug.cgi?id=4277
Signed-off-by: David Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
See Documentation/acpi-hotkey.txt
Use cmdline "acpi_specific_hotkey" to enable
legacy platform specific drivers.
http://bugzilla.kernel.org/show_bug.cgi?id=3887
Signed-off-by: Luming Yu <luming.yu@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Register an "acpi" system device to be notified of shutdown preparation.
This depends on CONFIG_PM
http://bugzilla.kernel.org/show_bug.cgi?id=4041
Signed-off-by: Alexey Starikovskiy <alexey.y.starikovskiy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Current assign_irq_vector() will panic if interrupt vectors is running
out. But I think how to handle the case of lack of interrupt vectors
should be handled by the caller of this function. For example, some
PCI devices can raise the interrupt signal via both MSI and I/O
APIC. So even if the driver for these device fails to allocate a
vector for MSI, the driver still has a chance to use I/O APIC based
interrupt. But currently there is no chance for these driver to use
I/O APIC based interrupt because kernel will panic when
assign_irq_vector() fails to allocate interrupt vector.
The following patch changes assign_irq_vector() for ia64 to return
-ENOSPC on error instead of panic (as i386 and x86_64 versions do).
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
changing CONFIG_LOCALVERSION rebuilds too much, for no appearent reason.
Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Now that sys_ipc has been removed from xtensa, asm/ipc.h is no longer
needed for that architecture. Not tested, but obviously correct. This
file is included only from arch code and this patch also removes the only
inclusion.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Patch from Tony Lindgren
This patch by various OMAP developers syncs the OMAP
specific arch files with the linux-omap tree.
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>