Avoid an atomic operation in kref_put() when the last reference is
dropped. On most platforms, atomic_read() is a plan read of the counter
and involves no atomic at all.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Moving uevent_seqnum and uevent_helper to kobject_uevent.c
because they are used even if CONFIG_SYSFS=n
while kernel/ksysfs.c is built only if CONFIG_SYSFS=y,
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This change reverts the 033b96fd30 commit
from Kay Sievers that removed the mount/umount uevents from the kernel.
Some older versions of HAL still depend on these events to detect when a
new device has been mounted. These events are not correctly emitted,
and are broken by design, and so, should not be relied upon by any
future program. Instead, the /proc/mounts file should be polled to
properly detect this kind of event.
A feature-removal-schedule.txt entry has been added, noting when this
interface will be removed from the kernel.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If a tag is set for a node being deleted from a radix_tree, then that
tag gets cleared from the parent of the node, even if it is set for some
siblings of the node begin deleted.
This patch changes the logic to include a test for any_tag_set similar
to the logic a little futher down. Care is taken to ensure that
'nr_cleared_tags' remains equals to the number of entries in the 'tags'
array which are set to '0' (which means that this tag is not set in the
tree below pathp->node, and should be cleared at pathp->node and
possibly above.
[ Nick says: "Linus FYI, I was able to modify the radix tree test
harness to catch the bug and can no longer trigger it after the fix.
Resulting code passes all other harness tests as well of course." ]
Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The spinlock-debug wait-loop was using loops_per_jiffy to detect too long
spinlock waits - but on fast CPUs this led to a way too fast timeout and false
messages.
The fix is to include a __delay(1) call in the loop, to correctly approximate
the intended delay timeout of 1 second. The code assumes that every
architecture implements __delay(1) to last around 1/(loops_per_jiffy*HZ)
seconds.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The buffer used for kobject uevent is too small for some of the events generated
by the input layer. Bump it to 2k.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
kobject_get_path() will oops if one of the component names is
NULL. Fix that by returning NULL instead of oopsing.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
So we might as well check to verify this, and let the user know that
something is wrong if they didn't do it correctly, instead of oopsing
later on in kobject_get_name() or somewhere else.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The implementation of int_sqrt() assumes that longs have 32 bits. On
systems that have 64 bit longs this will result in gross errors when the
argument to the function is greater than 2^32 - 1 on such systems. I doubt
whether any such use is currently made of int_sqrt() but the attached patch
fixes the problem anyway.
Signed-off-by: Peter Williams <pwil3058@bigpond.com.au>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The current logic does not calculate correctly the good shift array:
Let x be the pattern that is being searched. Let y be the block of data.
The good shift array aligns the segment:
x[i+1 ... m-1] = y[i+j+1 ... j+m-1]
with its rightmost occurrence in x that fulfils x[i] neq y[i+j].
In previous version, the good shift array for the pattern ANPANMAN is:
[1, 8, 3, 8, 8, 8, 8, 8]
and should be:
[1, 8, 3, 6, 6, 6, 6, 6]
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This arch-independent routine copies data to a memory-mapped I/O region,
using 32-bit accesses. The naming is double-underscored to make it clear
that it does not guarantee write ordering, nor does it perform a memory
barrier afterwards; the kernel doc also explicitly states this. This style
of access is required by some devices.
This change also introduces include/linux/io.h, at Andrew's suggestion. It
only has one occupant at the moment, but is a logical destination for
oft-replicated contents of include/asm-*/{io,iomap}.h to migrate to.
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If optimizing for size (CONFIG_CC_OPTIMIZE_FOR_SIZE), allow gcc4 compilers
to decide what to inline and what not - instead of the kernel forcing gcc
to inline all the time. This requires several places that require to be
inlined to be marked as such, previous patches in this series do that.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
AK: I hacked Muli's original patch a lot and there were a lot
of changes - all bugs are probably to blame on me now.
There were also some changes in the fall back behaviour
for swiotlb - in particular it doesn't try to use GFP_DMA
now anymore. Also all DMA mapping operations use the
same core dma_alloc_coherent code with proper fallbacks now.
And various other changes and cleanups.
Known problems: iommu=force swiotlb=force together breaks
needs more testing.
This patch cleans up x86_64's DMA mapping dispatching code. Right now
we have three possible IOMMU types: AGP GART, swiotlb and nommu, and
in the future we will also have Xen's x86_64 swiotlb and other HW
IOMMUs for x86_64. In order to support all of them cleanly, this
patch:
- introduces a struct dma_mapping_ops with function pointers for each
of the DMA mapping operations of gart (AMD HW IOMMU), swiotlb
(software IOMMU) and nommu (no IOMMU).
- gets rid of:
if (swiotlb)
return swiotlb_xxx();
- PCI_DMA_BUS_IS_PHYS is now checked against the dma_ops being set
This makes swiotlb faster by avoiding double copying in some cases.
Signed-Off-By: Muli Ben-Yehuda <mulix@mulix.org>
Signed-Off-By: Jon D. Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I know several people using MAGIC_SYSRQ not for kernel debugging but for
trying to do a halfway normal shutdown in case of problems.
Since there's no technical reason why MAGIC_SYSRQ would have to depend on
DEBUG_KERNEL, I'm therefore suggesting to drop this dependency.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Convert atomic_dec_and_lock to use new atomic primitives.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix the default behaviour for the remap operators in bitmap, cpumask and
nodemask.
As previously submitted, the pair of masks <A, B> defined a map of the
positions of the set bits in A to the corresponding bits in B. This is still
true.
The issue is how to map the other positions, corresponding to the unset (0)
bits in A. As previously submitted, they were all mapped to the first set bit
position in B, a constant map.
When I tried to code per-vma mempolicy rebinding using these remap operators,
I realized this was wrong.
This patch changes the default to map all the unset bit positions in A to the
same positions in B, the identity map.
For example, if A has bits 4-7 set, and B has bits 9-12 set, then the map
defined by the pair <A, B> maps each bit position in the first 32 bits as
follows:
0 ==> 0
...
3 ==> 3
4 ==> 9
...
7 ==> 12
8 ==> 8
9 ==> 9
...
31 ==> 31
This now corresponds to the typical behaviour desired when migrating pages and
policies from one cpuset to another.
The pages on nodes within the original cpuset, and the references in memory
policies to nodes within the original cpuset, are migrated to the
corresponding cpuset-relative nodes in the destination cpuset. Other pages
and node references are left untouched.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Shrink the height of a radix tree when it is partially truncated - we only do
shrinkage of full truncation at present.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Correctly determine the tags to be cleared in radix_tree_delete() so we
don't keep moving up the tree clearing tags that we don't need to. For
example, if a tag is simply not set in the deleted item, nor anywhere up
the tree, radix_tree_delete() would attempt to clear it up the entire
height of the tree.
Also, tag_set() was made conditional so as not to dirty too many cachelines
high up in the radix tree. Instead, put this logic into
radix_tree_tag_set().
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce helper any_tag_set() rather than repeat the same code sequence 4
times.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Export a number of features required to build all the modules. It also
implements the following simple features:
(*) csum_partial_copy_from_user() for MMU as well as no-MMU.
(*) __ucmpdi2().
so that they can be exported too.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Sanitize some s390 Kconfig options. We have ARCH_S390, ARCH_S390X,
ARCH_S390_31, 64BIT, S390_SUPPORT and COMPAT. Replace these 6 options by
S390, 64BIT and COMPAT.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Patch cleans up the alloc_bootmem fix for swiotlb. Patch removes
alloc_bootmem_*_limit api and fixes alloc_boot_*low api to do the right
thing -- allocate from low32 memory.
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
bad_range is supposed to be a temporary check. It would be a pity to throw it
out. Make it depend on CONFIG_DEBUG_VM instead.
CONFIG_HOLES_IN_ZONE systems were relying on this to check pfn_valid in the
page allocator. Add that to page_is_buddy instead.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
lib/lib.a(kobject_uevent.o)(.text+0x25f): In function `kobject_uevent':
: undefined reference to `__alloc_skb'
lib/lib.a(kobject_uevent.o)(.text+0x2a1): In function `kobject_uevent':
: undefined reference to `skb_over_panic'
lib/lib.a(kobject_uevent.o)(.text+0x31d): In function `kobject_uevent':
: undefined reference to `skb_over_panic'
lib/lib.a(kobject_uevent.o)(.text+0x356): In function `kobject_uevent':
: undefined reference to `netlink_broadcast'
lib/lib.a(kobject_uevent.o)(.init.text+0x9): In function `kobject_uevent_init':
: undefined reference to `netlink_kernel_create'
make: *** [.tmp_vmlinux1] Error 1
Netlink is unconditionally enabled if CONFIG_NET, so that's OK.
kobject_uevent.o is compiled even if !CONFIG_HOTPLUG, which is lazy.
Let's compound the sin.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The klist reference counting in the find functions that use
klist_iter_init_node is broken. If the function (for example
driver_find_device) is called with a NULL start object then everything is
fine, the first call to next_device()/klist_next increases the ref-count of
the first node on the list and does nothing for the start object which is
NULL.
If they are called with a valid start object then klist_next will decrement
the ref-count for the start object but nobody has incremented it. Logical
place to fix this would be klist_iter_init_node because the function puts a
reference of the object into the klist_iter struct.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Frank Pavlic <pavlic@de.ibm.com>
Cc: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Leave the overloaded "hotplug" word to susbsystems which are handling
real devices. The driver core does not "plug" anything, it just exports
the state to userspace and generates events.
Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The distinction between hotplug and uevent does not make sense these
days, netlink events are the default.
udev depends entirely on netlink uevents. Only during early boot and
in initramfs, /sbin/hotplug is needed. So merge the two functions and
provide only one interface without all the options.
The netlink layer got a nice generic interface with named slots
recently, which is probably a better facility to plug events for
subsystem specific events.
Also the new poll() interface to /proc/mounts is a nicer way to
notify about changes than sending events through the core.
The uevents should only be used for driver core related requests to
userspace now.
Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The names of these events have been confusing from the beginning
on, as they have been more like claim/release events. We needed these
events for noticing HAL if storage devices have been mounted.
Thanks to Al, we have the proper solution now and can poll()
/proc/mounts instead to get notfied about mount tree changes.
Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It makes zero sense to have hotplug, but not the netlink
events enabled today. Remove this option and merge the
kobject_uevent.h header into the kobject.h header file.
Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When a spinlock debugging check hits, we print the CPU number as an
informational thing - but there is no guarantee that preemption is off
at that point - hence we should use raw_smp_processor_id(). Otherwise
DEBUG_PREEMPT will print a warning.
With this fix the warning goes away and only the spinlock-debugging info
is printed.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The overflow checking condition in lib/swiotlb.c was wrong.
It would first run a NULL pointer through virt_to_phys before
testing it. Since pci_map_sg overflow is not that uncommon
and causes data corruption (including broken file systems) when not
properly detected I think it's better to fix it in 2.6.15.
This affects x86-64 and IA64.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
genalloc improperly stores the sizes of freed chunks, allocates overlapping
memory regions, and oopses after its in-band data is overwritten.
Signed-off-by: Chris Humbert <mahadri-kernel@drigon.com>
Cc: Jes Sorensen <jes@trained-monkey.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Reiser4 uses radix trees to solve a trouble reiser4_readdir has serving nfs
requests.
Unfortunately, radix tree api lacks an operation suitable for modifying
existing entry. This patch adds radix_tree_lookup_slot which returns pointer
to found item within the tree. That location can be then updated.
Both Nick and Christoph Lameter have patches which need this as well.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I recently picked up my older work to remove unnecessary #includes of
sched.h, starting from a patch by Dave Jones to not include sched.h
from module.h. This reduces the number of indirect includes of sched.h
by ~300. Another ~400 pointless direct includes can be removed after
this disentangling (patch to follow later).
However, quite a few indirect includes need to be fixed up for this.
In order to feed the patches through -mm with as little disturbance as
possible, I've split out the fixes I accumulated up to now (complete for
i386 and x86_64, more archs to follow later) and post them before the real
patch. This way this large part of the patch is kept simple with only
adding #includes, and all hunks are independent of each other. So if any
hunk rejects or gets in the way of other patches, just drop it. My scripts
will pick it up again in the next round.
Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A couple of (char *) casts removed in a previous cleanup patch in
lib/string.c:memmove() were actually useful, as they suppressed a couple of
warnings:
assignment discards qualifiers from pointer target type
Fix by declaring the local variable const in the first place, so casts
aren't needed to strip the const qualifier.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch is a rewrite of the one submitted on October 1st, using modules
(http://marc.theaimsgroup.com/?l=linux-kernel&m=112819093522998&w=2).
This rewrite adds a tristate CONFIG_RCU_TORTURE_TEST, which enables an
intense torture test of the RCU infratructure. This is needed due to the
continued changes to the RCU infrastructure to accommodate dynamic ticks,
CPU hotplug, realtime, and so on. Most of the code is in a separate file
that is compiled only if the CONFIG variable is set. Documentation on how
to run the test and interpret the output is also included.
This code has been tested on i386 and ppc64, and an earlier version of the
code has received extensive testing on a number of architectures as part of
the PREEMPT_RT patchset.
Signed-off-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
They aren't used anywhere in that file.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix-up the CONFIG_FRAME_POINTER help text language a bit.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In the forthcoming task migration support, a key calculation will be
mapping cpu and node numbers from the old set to the new set while
preserving cpuset-relative offset.
For example, if a task and its pages on nodes 8-11 are being migrated to
nodes 24-27, then pages on node 9 (the 2nd node in the old set) should be
moved to node 25 (the 2nd node in the new set.)
As with other bitmap operations, the proper way to code this is to provide
the underlying calculation in lib/bitmap.c, and then to provide the usual
cpumask and nodemask wrappers.
This patch provides that. These operations are termed 'remap' operations.
Both remapping a single bit and a set of bits is supported.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>