This patch addresses a number of minor comment improvements and
other minor issues from Thomas' review of the alarmtimers code.
CC: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
This patch exposes alarm-timers to userland via the posix clock
and timers interface, using two new clockids: CLOCK_REALTIME_ALARM
and CLOCK_BOOTTIME_ALARM. Both clockids behave identically to
CLOCK_REALTIME and CLOCK_BOOTTIME, respectively, but timers
set against the _ALARM suffixed clockids will wake the system if
it is suspended.
Some background can be found here:
https://lwn.net/Articles/429925/
The concept for Alarm-timers was inspired by the Android Alarm
driver (by Arve Hjønnevåg) found in the Android kernel tree.
See: http://android.git.kernel.org/?p=kernel/common.git;a=blob;f=drivers/rtc/alarm.c;h=1250edfbdf3302f5e4ea6194847c6ef4bb7beb1c;hb=android-2.6.36
While the in-kernel interface is pretty similar between
alarm-timers and Android alarm driver, the user-space interface
for the Android alarm driver is via ioctls to a new char device.
As mentioned above, I've instead chosen to export this functionality
via the posix interface, as it seemed a little simpler and avoids
creating duplicate interfaces to things like CLOCK_REALTIME and
CLOCK_MONOTONIC under alternate names (ie:ANDROID_ALARM_RTC and
ANDROID_ALARM_SYSTEMTIME).
The semantics of the Android alarm driver are different from what
this posix interface provides. For instance, threads other then
the thread waiting on the Android alarm driver are able to modify
the alarm being waited on. Also this interface does not allow
the same wakelock semantics that the Android driver provides
(ie: kernel takes a wakelock on RTC alarm-interupt, and holds it
through process wakeup, and while the process runs, until the
process either closes the char device or calls back in to wait
on a new alarm).
One potential way to implement similar semantics may be via
the timerfd infrastructure, but this needs more research.
There may also need to be some sort of sysfs system level policy
hooks that allow alarm timers to be disabled to keep them
from firing at inappropriate times (ie: laptop in a well insulated
bag, mid-flight).
CC: Arve Hjønnevåg <arve@android.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Alessandro Zummo <a.zummo@towertech.it>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
This provides the in kernel interface and infrastructure for
alarm-timers.
Alarm-timers are a hybrid style timer, similar to hrtimers,
but when the system is suspended, the RTC device is set to
fire and wake the system for when the soonest alarm-timer
expires.
The concept for Alarm-timers was inspired by the Android Alarm
driver (by Arve Hjønnevåg) found in the Android kernel tree.
See: http://android.git.kernel.org/?p=kernel/common.git;a=blob;f=drivers/rtc/alarm.c;h=1250edfbdf3302f5e4ea6194847c6ef4bb7beb1c;hb=android-2.6.36
This in-kernel interface should be fairly compatible with the
Android alarm driver in-kernel interface, but has the advantage
of utilizing the new RTC timerqueue code instead of doing direct
RTC manipulation.
CC: Arve Hjønnevåg <arve@android.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Alessandro Zummo <a.zummo@towertech.it>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
In cases where a timerqueue_node or some structure that utilizes
a timerqueue_node is allocated on the stack, gcc would give warnings
caused by the timerqueue_init()'s calling RB_CLEAR_NODE, which
self-references the nodes uninitialized data.
The solution is to create an rb_init_node() function that zeros
the rb_node structure out and then calls RB_CLEAR_NODE(), and
then call the new init function from timerqueue_init().
CC: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Some platforms cannot implement read_persistent_clock, as
their RTC devices are only accessible when interrupts are enabled.
This keeps them from being used by the timekeeping code on resume
to measure the time in suspend.
The RTC layer tries to work around this, by calling do_settimeofday
on resume after irqs are reenabled to set the time properly. However,
this only corrects CLOCK_REALTIME, and does not properly adjust
the sleep time value. This causes btime in /proc/stat to be incorrect
as well as making the new CLOCK_BOTTTIME inaccurate.
This patch resolves the issue by introducing a new timekeeping hook
to allow the RTC layer to inject the sleep time on resume.
The code also checks to make sure that read_persistent_clock is
nonfunctional before setting the sleep time, so that should the RTC's
HCTOSYS option be configured in on a system that does support
read_persistent_clock we will not increase the total_sleep_time twice.
CC: Arve Hjønnevåg <arve@android.com>
CC: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
[S390] kvm-390: Let kernel exit SIE instruction on work
[S390] dasd: check sense type in device change handler
[S390] pfault: fix token handling
[S390] qdio: reset error states immediately
[S390] fix page table walk for changing page attributes
[S390] prng: prevent access beyond end of stack
[S390] dasd: fix race between open and offline
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: cleanup error handling in inode.c
Btrfs: put the right bio if we have an error
Btrfs: free bitmaps properly when evicting the cache
Btrfs: Free free_space item properly in btrfs_trim_block_group()
btrfs: add missing spin_unlock to a rare exit path
Btrfs: check return value of kmalloc()
btrfs: fix wrong allocating flag when reading page
Btrfs: fix missing mutex_unlock in btrfs_del_dir_entries_in_log()
F15h CPUs may report a non-DRAM address when reporting an error address
belonging to a CC6 state save area. Add a workaround to detect this
condition and compute the actual DRAM address of the error as documented
in the Revision Guide for AMD Family 15h Models 00h-0Fh Processors.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
F15h and later use a portion of DRAM as a CC6 storage area. BIOS
programs D18F1x[17C:140,7C:40] DRAM Base/Limit accordingly by
subtracting the storage area from the DRAM limit setting. However, in
order for edac to consider that part of DRAM too, we need to include it
into the per-node range.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
This warning was wrongfully added for a normal condition - intlvsel
actually selects the destination node when node interleaving is enabled
and it is not a mismatch. For a detailed example, see section 2.8.10.2
"Node Interleaving" in F10h BKDG.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
This patch adds the TCO Watchdog DeviceIDs for the Intel Panther Point PCH.
Signed-off-by: Seth Heasley <seth.heasley@intel.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6:
eCryptfs: Flush dirty pages in setattr
eCryptfs: Handle failed metadata read in lookup
eCryptfs: Add reference counting to lower files
eCryptfs: dput dentries returned from dget_parent
eCryptfs: Remove extra d_delete in ecryptfs_rmdir
Now that the security modules can decide whether they support the
dcache RCU walk or not it's possible to make selinux a bit more
RCU friendly. The SELinux AVC and security server access decision
code is RCU safe. A specific piece of the LSM audit code may not
be RCU safe.
This patch makes the VFS RCU walk retry if it would hit the non RCU
safe chunk of code. It will normally just work under RCU. This is
done simply by passing the VFS RCU state as a flag down into the
avc_audit() code and returning ECHILD there if it would have an issue.
Based-on-patch-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that the whole dcache_hash_bucket crap is gone, go all the way and
also remove the weird locking layering violations for locking the hash
buckets. Add hlist_bl_lock/unlock helpers to move the locking into the
list abstraction instead of requiring each caller to open code it.
After all allowing for the bit locks is the whole point of these helpers
over the plain hlist variant.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When we are waiting for the bit-lock to be released, and are looping
over the 'cpu_relax()' should not be doing anything else - otherwise we
miss the point of trying to do the whole 'cpu_relax()'.
Do the preemption enable/disable around the loop, rather than inside of
it.
Noticed when I was looking at the code generation for the dcache
__d_drop usage, and the code just looked very odd.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After 57db4e8d73 changed eCryptfs to
write-back caching, eCryptfs page writeback updates the lower inode
times due to the use of vfs_write() on the lower file.
To preserve inode metadata changes, such as 'cp -p' does with
utimensat(), we need to flush all dirty pages early in
ecryptfs_setattr() so that the user-updated lower inode metadata isn't
clobbered later in writeback.
https://bugzilla.kernel.org/show_bug.cgi?id=33372
Reported-by: Rocko <rockorequin@hotmail.com>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
When failing to read the lower file's crypto metadata during a lookup,
eCryptfs must continue on without throwing an error. For example, there
may be a plaintext file in the lower mount point that the user wants to
delete through the eCryptfs mount.
If an error is encountered while reading the metadata in lookup(), the
eCryptfs inode's size could be incorrect. We must be sure to reread the
plaintext inode size from the metadata when performing an open() or
setattr(). The metadata is already being read in those paths, so this
adds minimal performance overhead.
This patch introduces a flag which will track whether or not the
plaintext inode size has been read so that an incorrect i_size can be
fixed in the open() or setattr() paths.
https://bugs.launchpad.net/bugs/509180
Cc: <stable@kernel.org>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
The error processing of several places is changed like setting the
error number only at the error.
Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
In btrfs_submit_direct_hook if the first btrfs_map_block fails we need to put
the orig_bio, not bio.
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
If our space cache is wrong, we do the right thing and free up everything that
we loaded, however we don't reset the total_bitmaps counter or the thresholds or
anything. So in btrfs_remove_free_space_cache make sure to call free_bitmap()
if it's a bitmap, this will keep us from panicing when we check to make sure we
don't have too many bitmaps. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Since commit dc89e98244, we've changed
to use a specific slab for alocation of free_space items.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
The check on the return value of kmalloc() is added to some places.
Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
the space cache use extent_readpages() to read free space information,
so we can not use GFP_KERNEL flag to allocate memory, or it may lead
to deadlock.
Signed-off-by: Itaru Kitayama <kitayama@cl.bb4u.ne.jp>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
It is necessary to unlock mutex_lock before it return an error when
btrfs_alloc_path() fails.
Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
For any given lower inode, eCryptfs keeps only one lower file open and
multiplexes all eCryptfs file operations through that lower file. The
lower file was considered "persistent" and stayed open from the first
lookup through the lifetime of the inode.
This patch keeps the notion of a single, per-inode lower file, but adds
reference counting around the lower file so that it is closed when not
currently in use. If the reference count is at 0 when an operation (such
as open, create, etc.) needs to use the lower file, a new lower file is
opened. Since the file is no longer persistent, all references to the
term persistent file are changed to lower file.
Locking is added around the sections of code that opens the lower file
and assign the pointer in the inode info, as well as the code the fputs
the lower file when all eCryptfs users are done with it.
This patch is needed to fix issues, when mounted on top of the NFSv3
client, where the lower file is left silly renamed until the eCryptfs
inode is destroyed.
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Call dput on the dentries previously returned by dget_parent() in
ecryptfs_rename(). This is needed for supported eCryptfs mounts on top
of the NFSv3 client.
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
vfs_rmdir() already calls d_delete() on the lower dentry. That was being
duplicated in ecryptfs_rmdir() and caused a NULL pointer dereference
when NFSv3 was the lower filesystem.
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
libata: ahci_start_engine compliant to AHCI spec
ata: pata_at91.c bugfix for initial_timing initialisation
ata: pata_at91.c bugfix for high master clock
ahci: AHCI-mode SATA patch for Intel Panther Point DeviceIDs
ata_piix: IDE-mode SATA patch for Intel Panther Point DeviceIDs
libata: Pioneer DVR-216D can't do SETXFER
ahci: don't enable port irq before handler is registered
libata: Implement ATA_FLAG_NO_DIPM and apply it to mcp65
libata: Kill unused ATA_DFLAG_{H|D}IPM flags
ahci: EM supported message type sysfs attribute
* 'for-linus' of git://git.infradead.org/ubifs-2.6:
UBIFS: fix master node recovery
UBIFS: fix false assertion warning in case of I/O failures
UBIFS: fix false space checking failure
At the end of section 10.1 of AHCI spec (rev 1.3), it states
Software shall not set PxCMD.ST to 1 until it is determined that
a functoinal device is present on the port as determined by
PxTFD.STS.BSY=0, PxTFD.STS.DRQ=0 and PxSSTS.DET=3h
Even though most AHCI host controller works without this check,
specific controller will fail under this condition.
Signed-off-by: Jian Peng <jipeng2005@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
The "struct ata_timing" must contain 10 members, but ".dmack_hold" member was
forgotten for "initial_timing" initialisation. This patch fixes such a problem.
Signed-off-by: Igor Plyatov <plyatov@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
The AT91SAM9 microcontrollers with master clock higher then 105 MHz
and PIO0, have overflow of the NCS_RD_PULSE value in the MSB. This
lead to "NCS_RD_PULSE" pulse longer then "NRD_CYCLE" pulse and driver
does not detect ATA device.
Signed-off-by: Igor Plyatov <plyatov@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
The previously submitted patch was word-wrapped.
This patch adds the AHCI-mode SATA DeviceIDs for the Intel Panther Point PCH.
Signed-off-by: Seth Heasley <seth.heasley@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
The previously submitted patch was word-wrapped.
This patch adds the IDE-mode SATA DeviceIDs for the Intel Panther
Point PCH.
Signed-off-by: Seth Heasley <seth.heasley@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Commit 4a5610a04d fixed an issue with
the Pioneer DVR-212D not handling SETXFER correctly. An openSUSE user
reported a similar issue with his DVR-216D that the NOSETXFER horkage
worked around for him as well.
This patch adds the DVR-216D (1.08) to the horkage list for NOSETXFER.
The issue was reported at:
https://bugzilla.novell.com/show_bug.cgi?id=679143
Reported-by: Volodymyr Kyrychenko <vladimir.kirichenko@gmail.com>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
The ahci_pmp_attach() & ahci_pmp_detach() unmask port irqs, but they
are also called during port initialization, before ahci host irq
handler is registered. On ce4100 platform, this sometimes triggers
"irq 4: nobody cared" message when loading driver.
Fixed this by not touching the register if the port is in frozen
state, and mark all uninitialized port as frozen.
Signed-off-by: Maxime Bizon <mbizon@freebox.fr>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
NVIDIA mcp65 familiy of controllers cause command timeouts when DIPM
is used. Implement ATA_FLAG_NO_DIPM and apply it.
This problem was reported by Stefan Bader in the following thread.
http://thread.gmane.org/gmane.linux.ide/48841
stable: applicable to 2.6.37 and 38.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Stefan Bader <stefan.bader@canonical.com>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
This patch adds an sysfs attribute 'em_message_supported' to the
ahci host device which prints out the supported enclosure management
message types.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Commit 40aee729b3 ('kconfig: fix default value for choice input')
fixed some cases where kconfig would select the wrong option from a
choice with a single valid option and thus enter an infinite loop.
However, this broke the test for user input of the form 'N?', because
when kconfig selects the single valid option the input is zero-length
and the test will read the byte before the input buffer. If this
happens to contain '?' (as it will in a mips build on Debian unstable
today) then kconfig again enters an infinite loop.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: stable@kernel.org [2.6.17+]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The dentry hashing rules have been really quite complicated for a long
while, in odd ways. That made functions like __d_drop() very fragile
and non-obvious.
In particular, whether a dentry was hashed or not was indicated with an
explicit DCACHE_UNHASHED bit. That's despite the fact that the hash
abstraction that the dentries use actually have a 'is this entry hashed
or not' model (which is a simple test of the 'pprev' pointer).
The reason that was done is because we used the normal 'is this entry
unhashed' model to mark whether the dentry had _ever_ been hashed in the
dentry hash tables, and that logic goes back many years (commit
b3423415fbc2: "dcache: avoid RCU for never-hashed dentries").
That, in turn, meant that __d_drop had totally different unhashing logic
for the dentry hash table case and for the anonymous dcache case,
because in order to use the "is this dentry hashed" logic as a flag for
whether it had ever been on the RCU hash table, we had to unhash such a
dentry differently so that we'd never think that it wasn't 'unhashed'
and wouldn't be free'd correctly.
That's just insane. It made the logic really hard to follow, when there
were two different kinds of "unhashed" states, and one of them (the one
that used "list_bl_unhashed()") really had nothing at all to do with
being unhashed per se, but with a very subtle lifetime rule instead.
So turn all of it around, and make it logical.
Instead of having a DENTRY_UNHASHED bit in d_flags to indicate whether
the dentry is on the hash chains or not, use the hash chain unhashed
logic for that. Suddenly "d_unhashed()" just uses "list_bl_unhashed()",
and everything makes sense.
And for the lifetime rule, just use an explicit DENTRY_RCUACCEES bit.
If we ever insert the dentry into the dentry hash table so that it is
visible to RCU lookup, we mark it DENTRY_RCUACCESS to show that it now
needs the RCU lifetime rules. Now suddently that test at dentry free
time makes sense too.
And because unhashing now is sane and doesn't depend on where the dentry
got unhashed from (because the dentry hash chain details doesn't have
some subtle side effects), we can re-unify the __d_drop() logic and use
common code for the unhashing.
Also fix one more open-coded hash chain bit_spin_lock() that I missed in
the previous chain locking cleanup commit.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It's a useless abstraction for 'hlist_bl_head', and it doesn't actually
help anything - quite the reverse. All the users end up having to know
about the hlist_bl_head details anyway, using 'struct hlist_bl_node *'
etc. So it just makes the code look confusing.
And the cost of it is extra '&b->head' syntactic noise, but more
importantly it spuriously makes the hash table dentry list look
different from the per-superblock DCACHE_DISCONNECTED dentry list.
As a result, the code ended up using ad-hoc locking for one case and
special helper functions for what is really another totally identical
case in the very same function.
Make it all look and work the same.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>