As we can determine exactly when a gigantic page is in use we can optimise
the common regular page cases by pulling out gigantic page initialisation
into its own function. As gigantic pages are never released to buddy we
do not need a destructor. This effectivly reverts the previous change to
the main buddy allocator. It also adds a paranoid check to ensure we
never release gigantic pages from hugetlbfs to the main buddy.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Cc: Jon Tollefson <kniht@linux.vnet.ibm.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: <stable@kernel.org> [2.6.27.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When working with hugepages, hugetlbfs assumes that those hugepages are
smaller than MAX_ORDER. Specifically it assumes that the mem_map is
contigious and uses that to optimise access to the elements of the mem_map
that represent the hugepage. Gigantic pages (such as 16GB pages on
powerpc) by definition are of greater order than MAX_ORDER (larger than
MAX_ORDER_NR_PAGES in size). This means that we can no longer make use of
the buddy alloctor guarentees for the contiguity of the mem_map, which
ensures that the mem_map is at least contigious for maximmally aligned
areas of MAX_ORDER_NR_PAGES pages.
This patch adds new mem_map accessors and iterator helpers which handle
any discontiguity at MAX_ORDER_NR_PAGES boundaries. It then uses these to
implement gigantic page versions of copy_huge_page and clear_huge_page,
and to allow follow_hugetlb_page handle gigantic pages.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Cc: Jon Tollefson <kniht@linux.vnet.ibm.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: <stable@kernel.org> [2.6.27.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This regression was introduced by commit
6ae5ce8e8d ("cciss: remove redundant code").
This patch fixes a regression where the controller firmware version is not
displayed in procfs. The previous patch would be called anytime something
changed. This will get called only once for each controller.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: <stable@kernel.org> [2.6.27.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Regression introduced by commit 6ae5ce8e8d
("cciss: remove redundant code").
This patch fixes a broken symlink in sysfs that was introduced by the
above commit. We broke it in 2.6.27-rc on or about 20080804. Some
installers are broken if this symlink does not exist and they may not
detect the logical drives configured on the controller. It does not
require being backported into 2.6.26.x or earlier kernels.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: <stable@kernel.org> [2.6.27.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The function check_dev_ioctl_version() returns an error code upon fail but
it isn't captured and returned in validate_dev_ioctl() as it should be.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When checking a directory tree in autofs_tree_busy() we can incorrectly
decide that the tree isn't busy. This happens for the case of an active
offset mount as autofs4_follow_mount() follows past the active offset
mount, which has an open file handle used for expires, causing the file
handle not to count toward the busyness check.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add temperature sensor support for iMac 8.
Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
Tested-by: Klaus Doblmann <klaus.doblmann@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add accelerometer, backlight and temperature sensor support for the new
unibody Macbook Pro 5.
Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add accelerometer, backlight and temperature sensor support for the new
unibody Macbook 5.
Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
Tested-by: David M. Lary <dmlary@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add temperature sensor support for iMac 5.
Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
Tested-by: Ricky Campbell <johnrcampbell@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When fixing up invalid years rtc_read_alarm() was calling rtc_valid_tm()
as a boolean but rtc_valid_tm() returns zero on success or a negative
number if the time is not valid so the test was inverted.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Problem 1 (see patch below):
vc_tab_stop is declared as an array of 8 unsigned ints in struct
vc_data in include/linux/console_struct.h .
In drivers/char/vt.c only 5 of these 8 unsigned ints get initialized
leading to unintended tabulator placement on displays with more than
160 columns text.
Problem 2 (open):
Upcoming displays will have more than 256 columns of text leading to
invalid memory access in drivers/char/vt.c during tabulator
calculations:
if (vc->vc_tab_stop[vc->vc_x >> 5] & (1 << (vc->vc_x & 31)))
break;
Signed-off-by: Wolfgang Kroworsch <wolfgang@kroworsch.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As reported by Dick Gevers on Compaq ProLiant:
Oct 13 18:06:51 dvgcpl kernel: Compaq SMART2 Driver (v 2.6.0)
Oct 13 18:06:51 dvgcpl kernel: sys_init_module: 'cpqarray'->init
suspiciously returned 1, it should follow 0/-E convention
Oct 13 18:06:51 dvgcpl kernel: sys_init_module: loading module anyway...
Oct 13 18:06:51 dvgcpl kernel: Pid: 315, comm: modprobe Not tainted
2.6.27-desktop-0.rc8.2mnb #1
Oct 13 18:06:51 dvgcpl kernel: [<c0380612>] ? printk+0x18/0x1e
Oct 13 18:06:51 dvgcpl kernel: [<c0158f85>] sys_init_module+0x155/0x1c0
Oct 13 18:06:51 dvgcpl kernel: [<c0103f06>] syscall_call+0x7/0xb
Oct 13 18:06:51 dvgcpl kernel: =======================
Make it return 0 on success and -ENODEV if no array was found.
Reported-by: Dick Gevers <dvgevers@xs4all.nl>
Signed-off-by: Andrey Borzenkov <arvidjaar@mail.ru>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add support for 2 new SAS/SATA controllers.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Impact: fix 32-bit Xen guest boot crash
On 32-bit PAE, pud_page, for no good reason, didn't really return a
struct page *. Since Jan Beulich's fix "i386/PAE: fix pud_page()",
pud_page does return a struct page *.
Because PAE has 3 pagetable levels, the pud level is folded into the
pgd level, so pgd_page() is the same as pud_page(), and now returns
a struct page *. Update the xen/mmu.c code which uses pgd_page()
accordingly.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
__scm_destroy() walks the list of file descriptors in the scm_fp_list
pointed to by the scm_cookie argument.
Those, in turn, can close sockets and invoke __scm_destroy() again.
There is nothing which limits how deeply this can occur.
The idea for how to fix this is from Linus. Basically, we do all of
the fput()s at the top level by collecting all of the scm_fp_list
objects hit by an fput(). Inside of the initial __scm_destroy() we
keep running the list until it is empty.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes a 2.6.27 regression which was introduced in commit a02908f1.
We weren't passing the chunk parameter down to the two subections,
ext4_indirect_trans_blocks() and ext4_ext_index_trans_blocks(), with
the result that massively overestimate the amount of credits needed by
ext4_da_writepages, especially in the non-extents case. This causes
failures especially on /boot partitions, which tend to be small and
non-extent using since GRUB doesn't handle extents.
This patch fixes the bug reported by Joseph Fannin at:
http://bugzilla.kernel.org/show_bug.cgi?id=11964
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This patch fixes iwl3945 deadlock during suspend by moving notify_mac out
of iwl3945 mutex. This is a portion of the same fix for iwlwifi by Tomas.
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch ensures we clear any scan status bit when
an error occurs while sending the scan command. It is
the implementation of patch:
"iwlwifi: clear scanning bits upon failure"
for iwl3945.
Signed-off-by: Mohamed Abbas <mohamed.abbas@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
ath5k_rx_status fields rs_antenna and rs_more are u8s, but we
were setting them with bitwise ANDs of 32-bit values.
As a consequence, jumbo frames would not be discarded as intended.
Then, because the hw rate value of such frames is zero, and, since
"ath5k: rates cleanup", we do not fall back to the basic rate, such
packets would trigger the following WARN_ON:
------------[ cut here ]------------
WARNING: at net/mac80211/rx.c:2192 __ieee80211_rx+0x4d/0x57e [mac80211]()
Modules linked in: ath5k af_packet sha256_generic aes_i586 aes_generic cbc loop i915 drm binfmt_misc acpi_cpufreq fan container nls_utf8 hfsplus dm_crypt dm_mod kvm_intel kvm fuse sbp2 snd_hda_intel snd_pcm_oss snd_pcm snd_mixer_oss snd_seq_dummy snd_seq_oss arc4 joydev hid_apple ecb snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device usbhid appletouch mac80211 sky2 snd ehci_hcd ohci1394 bitrev crc32 sr_mod cdrom rtc sg uhci_hcd snd_page_alloc cfg80211 ieee1394 thermal ac battery processor button evdev unix [last unloaded: ath5k]
Pid: 0, comm: swapper Tainted: G W 2.6.28-rc2-wl #14
Call Trace:
[<c0123d1e>] warn_on_slowpath+0x41/0x5b
[<c012005d>] ? sched_debug_show+0x31e/0x9c6
[<c012489f>] ? vprintk+0x369/0x389
[<c0309539>] ? _spin_unlock_irqrestore+0x54/0x58
[<c011cd8f>] ? try_to_wake_up+0x14f/0x15a
[<f81918cb>] __ieee80211_rx+0x4d/0x57e [mac80211]
[<f828872a>] ath5k_tasklet_rx+0x5a1/0x5e4 [ath5k]
[<c013b9cd>] ? clockevents_program_event+0xd4/0xe3
[<c01283a9>] tasklet_action+0x94/0xfd
[<c0127d19>] __do_softirq+0x8c/0x13e
[<c0127e04>] do_softirq+0x39/0x55
[<c0128082>] irq_exit+0x46/0x85
[<c010576c>] do_IRQ+0x9a/0xb2
[<c010461c>] common_interrupt+0x28/0x30
[<f80e934a>] ? acpi_idle_enter_bm+0x2ad/0x31b [processor]
[<c02976bf>] cpuidle_idle_call+0x65/0x9a
[<c010262c>] cpu_idle+0x76/0xa6
[<c02fb402>] rest_init+0x62/0x64
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
07fa/1196
Bewan BWIFI-USB54AR: Tested by night1308, this device is a ZD1211B with
an AL2230S radio.
0ace/b215
HP 802.11abg: Tested by Robert Philippe
Signed-off-by: John W. Linville <linville@tuxdriver.com>
> I'll have a prod at why the [hso] rfkill stuff isn't working next
Ok, I believe this is due to the addition of rfkill_check_duplicity in
rfkill and the fact that test_bit actually returns a negative value
rather than the postive one expected (which is of course equally true).
So when the second WLAN device (the hso device, with the EEE PC WLAN
being the first) comes along rfkill_check_duplicity returns a negative
value and so rfkill_register returns an error. Patch below fixes this
for me.
Signed-Off-By: Jonathan McDowell <noodles@earth.li>
Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
__ieee80211_tasklet_handler -> __ieee80211_rx ->
__ieee80211_rx_handle_packet -> ieee80211_invoke_rx_handlers ->
ieee80211_rx_h_decrypt -> ieee80211_crypto_tkip_decrypt ->
ieee80211_tkip_decrypt_data -> iwl4965_mac_update_tkip_key ->
iwl_scan_cancel_timeout -> msleep
Ooops!
Avoid the sleep by changing iwl_scan_cancel_timeout with
iwl_scan_cancel and simply returning on failure if the scan persists.
This will cause hardware decryption to fail and we'll handle a few more
frames with software decryption.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
In iwl_bg_request_scan function, if we could not send a
scan command it will go to done.
In done it does the right thing to call mac80211 with
scan complete, but the problem is STATUS_SCAN_HW is still
set causing any future scan to fail. Fix by clearing the scanning status
bits if scan fails.
Signed-off-by: Mohamed Abbas <mohamed.abbas@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Unfortunately, the result was that mac80211 didn't see all the beacons
it actually wanted to see. This caused lost associations.
Hopefully we can revisit this when mac80211 is less greedy about seeing
beacons directly...
This reverts commit 063279062a.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When 'start' and 'end' are less than a cacheline apart and 'start' is
unaligned we are done after cleaning and invalidating the first
cacheline. So check for (start < end) which will not walk off into
invalid address ranges when (start > end).
This issue was caught by drivers/dma/dmatest.
2.6.27 is susceptible.
Cc: <stable@kernel.org>
Cc: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Cc: Lothar WaÃ<9f>mann <LW@KARO-electronics.de>
Cc: Lennert Buytenhek <buytenh@marvell.com>
Cc: Eric Miao <eric.miao@marvell.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
As a result of the ptebits changes, we ended up marking device mappings
as normal memory on ARMv7 CPUs, resulting in undesirable behaviour with
serial ports and the like. While reviewing the section mapping table
entries, other errors in the memory type settings for devices were
detected and confirmed to prevent Xscale3 platforms booting.
Tested on:
OMAP34xx (ARMv7),
OMAP24xx (ARMv6),
OMAP16xx (ARM926T, ARMv5),
PXA311 (Xscale3),
PXA272 (Xscale),
PXA255 (Xscale),
IXP42x (Xscale),
S3C2410 (ARM920T, ARMv4T),
ARM720T (ARMv4T)
StrongARM-110 (ARMv4)
Acked-by: Tony Lindgren <tony@atomide.com>
Tested-by: Robert Jarzmik <robert.jarzmik@free.fr>
Tested-by: Mike Rapoport <mike@compulab.co.il>
Tested-by: Ben Dooks <ben-linux@fluff.org>
Tested-by: Anders Grafström <grfstrm@users.sourceforge.net>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This fixes a regression introduced by 2c6e6db41f
"Minimize per_cpu reservations." That patch incorrectly used information about
what CPUs are possible that was not yet initialized by ACPI. The end result
was that per_cpu structures for offline CPUs were not initialized causing a
NULL pointer reference.
Since we cannot do the full acpi_boot_init() call any earlier, the simplest
fix is to just parse the MADT for SAPIC entries early to find the CPU
info. This should also allow for some cleanup of the code added by the
"Minimize per_cpu reservations". This patch just fixes the regressions, the
cleanup will come in a later patch.
Signed-off-by: Doug Chapman <doug.chapman@hp.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
CC: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
No functional change, just reorder some config options and update
the "Power management and ACPI" label to match the defacto x86
standard.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
As of 73bdf0a60e, the kernel needs
to know where modules are located in the virtual address space.
On ARM, we located this region between MODULE_START and MODULE_END.
Unfortunately, everyone else calls it MODULES_VADDR and MODULES_END.
Update ARM to use the same naming, so is_vmalloc_or_module_addr()
can work properly. Also update the comment on mm/vmalloc.c to
reflect that ARM also places modules in a separate region from the
vmalloc space.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This reverts commit c7ffa6c262.
the assumptio of this change was that this would not break
any existing machine. Andrey Borzenkov reported troubles with
the ACPI reboot method: the system would hang on reboot, necessiating
a power cycle. Probably more systems are affected as well.
Also, there are patches queued up for v2.6.29 to disable virtualization
on emergency_restart() - which was the original motivation of
this change.
Reported-by: Andrey Borzenkov <arvidjaar@mail.ru>
Bisected-by: Andrey Borzenkov <arvidjaar@mail.ru>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Acked-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: right-align /proc/meminfo consistent with other fields
When the split-LRU patches added Inactive(anon) and Inactive(file) lines
to /proc/meminfo, all counts were moved two columns rightwards to fit in.
Now move x86's DirectMap lines two columns rightwards to line up.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Lazy flushing needs to take care of the unmap path too which is not yet
implemented and leads to stale IO/TLB entries. This is fixed by this
patch.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The SAM9 watchdog driver is usable on the whole family of AT91SAM9 and
CAP9 processors.
Update the configuration to indicate this and allow the driver to be selected.
Signed-off-by: Andrew Victor <linux@maxim.org.za>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The architecture header files were recently moved from
include/asm-arm/mach-at91/ to arch/arm/mach-at91/include/mach/.
The SAM9 watchdog driver still includes a header from the old location.
Signed-off-by: Andrew Victor <linux@maxim.org.za>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Impact: fix rare x2apic hang
On x86, x2apic mode accesses for sending IPI's don't have serializing
semantics. If the IPI receivner refers(in lock-free fashion) to some
memory setup by the sender, the need for smp_mb() before sending the
IPI becomes critical in x2apic mode.
Add the smp_mb() in native_flush_tlb_others() before sending the IPI.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We currently oops with a divide error on starting a linear software
raid array consisting of at least two very small (< 500K) devices.
The bug is caused by the calculation of the hash table size which
tries to compute sector_div(sz, base) with "base" being zero due to
the small size of the component devices of the array.
Fix this by requiring the hash spacing to be at least one which
implies that also "base" is non-zero.
This bug has existed since about 2.6.14.
Cc: stable@kernel.org
Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Impact: fix warning message when PARAVIRT is set in config
Remove stale #ifdef components from our IRQ sizing logic.
x86/Voyager is the only holdout.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: introduce new APIs
We want to deprecate cpumasks on the stack, as we are headed for
gynormous numbers of CPUs. Eventually, we want to head towards an
undefined 'struct cpumask' so they can never be declared on stack.
1) New cpumask functions which take pointers instead of copies.
(cpus_* -> cpumask_*)
2) Several new helpers to reduce requirements for temporary cpumasks
(cpumask_first_and, cpumask_next_and, cpumask_any_and)
3) Helpers for declaring cpumasks on or offstack for large NR_CPUS
(cpumask_var_t, alloc_cpumask_var and free_cpumask_var)
4) 'struct cpumask' for explicitness and to mark new-style code.
5) Make iterator functions stop at nr_cpu_ids (a runtime constant),
not NR_CPUS for time efficiency and for smaller dynamic allocations
in future.
6) cpumask_copy() so we can allocate less than a full cpumask eventually
(for alloc_cpumask_var), and so we can eliminate the 'struct cpumask'
definition eventually.
7) work_on_cpu() helper for doing task on a CPU, rather than saving old
cpumask for current thread and manipulating it.
8) smp_call_function_many() which is smp_call_function_mask() except
taking a cpumask pointer.
Note that this patch simply introduces the new functions and leaves
the obsolescent ones in place. This is to simplify the transition
patches.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch (as1159b) changes the timeout routines in the block core to
use round_jiffies_up(). There's no point in rounding the timer
deadline down, since if it expires too early we will have to restart
it.
The patch also removes some unnecessary tests when a request is
removed from the queue's timer list.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch (as1158b) adds round_jiffies_up() and friends. These
routines work like the analogous round_jiffies() functions, except
that they will never round down.
The new routines will be useful for timeouts where we don't care
exactly when the timer expires, provided it doesn't expire too soon.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Commit 0762b8bde9 moved disk_get_part()
in front of recursive get on the whole disk, which caused removable
devices to try disk_get_part() before rescanning after a new media is
inserted, which might fail legit open attempts or give the old
partition.
This patch fixes the problem by moving disk_get_part() after
__blkdev_get() on the whole disk.
This problem was spotted by Borislav Petkov.
Signed-off-by: Tejun Heo <tj@kernel.org>
Tested-by: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
smp_mb() is needed (to make the memory operations visible globally) before
sending the ipi on the sender and the receiver (on Alpha atleast) needs
smp_read_barrier_depends() in the handler before reading the call_single_queue
list in a lock-free fashion.
On x86, x2apic mode register accesses for sending IPI's don't have serializing
semantics. So the need for smp_mb() before sending the IPI becomes more
critical in x2apic mode.
Remove the unnecessary smp_mb() in csd_flag_wait(), as the presence of that
smp_mb() doesn't mean anything on the sender, when the ipi receiver is not
doing any thing special (like memory fence) after clearing the CSD_FLAG_WAIT.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Move the calling blk_delete_timer to later in end_that_request_last to
address an issue where blkdev_dequeue_request may have add a timer for the
request.
Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Block queue supports two usage models - one where block driver peeks
at the front of queue using elv_next_request(), processes it and
finishes it and the other where block driver peeks at the front of
queue, dequeue the request using blkdev_dequeue_request() and finishes
it. The latter is more flexible as it allows the driver to process
multiple commands concurrently.
These two inconsistent usage models affect the block layer
implementation confusing. For some, elv_next_request() is considered
the issue point while others consider blkdev_dequeue_request() the
issue point.
Till now the inconsistency mostly affect only accounting, so it didn't
really break anything seriously; however, with block layer timeout,
this inconsistency hits hard. Block layer considers
elv_next_request() the issue point and adds timer but SCSI layer
thinks it was just peeking and when the request can't process the
command right away, it's just left there without further processing.
This makes the request dangling on the timer list and, when the timer
goes off, the request which the SCSI layer and below think is still on
the block queue ends up in the EH queue, causing various problems - EH
hang (failed count goes over busy count and EH never wakes up),
WARN_ON() and oopses as low level driver trying to handle the unknown
command, etc. depending on the timing.
As SCSI midlayer is the only user of block layer timer at the moment,
moving blk_add_timer() to elv_dequeue_request() fixes the problem;
however, this two usage models definitely need to be cleaned up in the
future.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>