There was lots of #ifdef noise in the kernel due to hotcpu_notifier(fn,
prio) not correctly marking 'fn' as used in the !HOTPLUG_CPU case, and thus
generating compiler warnings of unused symbols, hence forcing people to add
#ifdefs.
the compiler can skip truly unused functions just fine:
text data bss dec hex filename
1624412 728710 3674856 6027978 5bfaca vmlinux.before
1624412 728710 3674856 6027978 5bfaca vmlinux.after
[akpm@osdl.org: topology.c fix]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When we are unregistering a kprobe-booster, we can't release its
instruction buffer immediately on the preemptive kernel, because some
processes might be preempted on the buffer. The freeze_processes() and
thaw_processes() functions can clean most of processes up from the buffer.
There are still some non-frozen threads who have the PF_NOFREEZE flag. If
those threads are sleeping (not preempted) at the known place outside the
buffer, we can ensure safety of freeing.
However, the processing of this check routine takes a long time. So, this
patch introduces the garbage collection mechanism of insn_slot. It also
introduces the "dirty" flag to free_insn_slot because of efficiency.
The "clean" instruction slots (dirty flag is cleared) are released
immediately. But the "dirty" slots which are used by boosted kprobes, are
marked as garbages. collect_garbage_slots() will be invoked to release
"dirty" slots if there are more than INSNS_PER_PAGE garbage slots or if
there are no unused slots.
Cc: "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "bibo,mao" <bibo.mao@intel.com>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Yumiko Sugita <yumiko.sugita.yf@hitachi.com>
Cc: Satoshi Oshima <soshima@redhat.com>
Cc: Hideo Aoki <haoki@redhat.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Define elf_addr_t in linux/elf.h. The size of the type is determined using
ELF_CLASS. This allows us to remove the defines that today are spread all
over .c and .h files.
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Cc: Daniel Jacobowitz <drow@false.org>
Cc: Roland McGrath <roland@redhat.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce pagefault_{disable,enable}() and use these where previously we did
manual preempt increments/decrements to make the pagefault handler do the
atomic thing.
Currently they still rely on the increased preempt count, but do not rely on
the disabled preemption, this might go away in the future.
(NOTE: the extra barrier() in pagefault_disable might fix some holes on
machines which have too many registers for their own good)
[heiko.carstens@de.ibm.com: s390 fix]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Nick Piggin <npiggin@suse.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix up arch-specific work items where possible to use the new work_struct and
delayed_work structs.
Three places that enqueue bits of their stack and then return have been marked
with #error as this is not permitted.
Signed-Off-By: David Howells <dhowells@redhat.com>
The lock dependency validator adds a bunch of extra stack frames to
the stack, which can cause stack overflows. Especially seen on 31 bit
where the small stack is only 4k.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
VMALLOC_END on 31bit should be 0x8000000UL instead of 0x7fffffffL.
The page mask which is used to make sure memory_end is on 4MB/2MB
boundary is wrong and not needed. Therefore remove it.
Make sure a vmalloc area does also exist and work on (future)
machines with 4TB and more memory.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
There's no need to have a spin_lock here, but need sleepable context
for vmem_map. Therefore convert the spin_lock into a mutex.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Set KBUILD_IMAGE to a sane value. This enables "make rpm"
Signed-off-by: Christian Borntraeger <cborntra@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Follow i386/x86_64:
lockdep can be used to print held locks when printing a
backtrace. This can be useful when debugging things like
'scheduling while atomic' asserts.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Use a wrapper for copy_to/from_user to chose the best usercopy method.
The mvcos instruction is better for sizes greater than 256 bytes, if
mvcos is not available a page table walk is better for sizes greater
than 1024 bytes. Also removed the redundant copy_to/from_user_std_small
functions.
Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Avoid the tprot loop if diag260 works and reports that there are no
holes in memory. The tprot instruction can lead to a significant delay
in the ipl process if the virtual guest has a lot of memory and the
host is under memory pressure.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Need this at yet another file and don't want to add yet another
extern...
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If the memory detection code would ever reach the point where it would
load the wait psw, it would generate a specification exception and the
system would crash at ipl time. This is because of a misaligned wait
psw. It needs to be on a double word boundary instead of a word
boundary.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Let one master cpu kill all other cpus instead of sending an external
interrupt to all other cpus so they can kill themselves.
Simplifies reipl/shutdown functions a lot.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
In case of reipl cpcmd gets called when all other cpus are not running
anymore. To prevent deadlocks change __cpcmd so that it doesn't take
any locks and call cpcmd or __cpcmd, whatever is correct in the current
context.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
In case of re-IPL and diag308 doesn't work we have to reset all devices
manually and wait synchronously that each reset finished.
This patch adds the necessary infrastucture and the first exploiter of it.
Subsystems that need to add a function that needs to be called at re-IPL
may register/unregister this function via
struct reset_call {
struct reset_call *next;
void (*fn)(void);
};
void register_reset_call(struct reset_call *reset);
void unregister_reset_call(struct reset_call *reset);
When the registered function get called the context is:
- all cpus beside the current one are stopped
- all machine checks and interrupts are disabled
- prefixing is disabled
- a default machine check handler is available for use
The registered functions may not take any locks are sleep.
For the common I/O layer part of this patch:
Introduce a reset_call css_reset that does the following:
- clear all subchannels
- perform a rchp on all channel paths and wait for the resulting
machine checks
This replaces the calls to clear_all_subchannels() and
cio_reset_channel_paths() for kexec and ccw reipl. reipl_ccw_dev() now
uses reipl_find_schid() to determine the subchannel id for a given
device id.
Also remove cio_reset_channel_paths() and friends since they are not
needed anymore.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
segment save will exit with a lock held if the passed segment doesn't
exist. Any subsequent call to segment_save will lead to a deadlock.
Fix this and give up the lock before returning.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Since the diag 308 reipl method is superior to the ccw method, we should
use it whenever it is possible. We can do that, if the user has not
specified a new reipl ccw device and the system has been ipled from
a ccw device.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If reboot fails (e.g. because wrong devno has been specified by the user),
we should just stop all cpus, but should not trigger a kernel panic.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If multiple kernel images are installed on one DASD, the loadparm can be used
to select the boot configuration. This patch introduces the following two new
sysfs attributes:
/sys/firmware/ipl/loadparm: shows loadparm of current system (ro)
/sys/firmware/reipl/ccw/loadparm: loadparm used for next reboot (rw)
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The SALIPL entry point has an needless memory detection routine as we
later check the memory size again. The SALIPL code also uses diagnose
0x060 if we are running under VM, but this diagnose is not compatible
with the 64 bit addressing mode. The solution is to get rid of this
code and rely on the memory detection in the startup code.
Signed-off-by: Christian Borntraeger <cborntra@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
setup_lowcore() calls ctl_set_bit() which returns withs interrupts
enabled. The setup arch code is not supposed to enable interrupts that
early. Therefore use the __ctl_set_bit() variant.
This fixes the not working lock dependency validator on non 64 bit
systems.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Commit 7676bef9c1 breaks DCSS support on
s390. DCSS needs initialized struct pages to work. With the usage of
add_active_range() only the struct pages for physically present pages
are initialized.
This could be fixed if the DCSS driver would initiliaze the struct pages
itself, but this doesn't work too. This is because the mem_map array
does not include holes after the last present memory area and therefore
there is nothing that could be initialized.
To fix this and to avoid some dirty hacks revert this patch for now.
Will be added later when we move to a virtual mem_map.
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
[S390] cio: Make ccw_device_register() static.
[S390] Improve AP bus device removal.
[S390] uaccess error handling.
[S390] cio: css_probe_device() must be called enabled.
[S390] Initialize interval value to 0.
[S390] sys_getcpu compat wrapper.
Add a vmlinux.lds.h helper macro for defining the eight-level initcall table,
teach all the architectures to use it.
This is a prerequisite for a patch which performs initcall synchronisation for
multithreaded-probing.
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
[ Added AVR32 as well ]
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Consider return values for all user space access function and
return -EFAULT on error.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
sscanf() could leave the interval value unchanged in which case it
would be used uninitialized.
Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Looking at the new syscall additions, I noticed that
sys_getcpu_wrapper wraps in to sys_tee, in what appears to be
a copy and paste error. Switch it to point to sys_getcpu..
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fix the following compile error:
CC init/version.o
LD init/built-in.o
LD .tmp_vmlinux1
arch/s390/kernel/built-in.o(.text+0xdba4): In function `sys32_ipc':
: undefined reference to `compat_sys_semtimedop'
arch/s390/kernel/built-in.o(.text+0xdbee): In function `sys32_ipc':
: undefined reference to `compat_sys_semctl'
arch/s390/kernel/built-in.o(.text+0xdc08): In function `sys32_ipc':
: undefined reference to `compat_sys_msgsnd'
arch/s390/kernel/built-in.o(.text+0xdc30): In function `sys32_ipc':
: undefined reference to `compat_sys_msgrcv'
arch/s390/kernel/built-in.o(.text+0xdc58): In function `sys32_ipc':
: undefined reference to `compat_sys_msgctl'
arch/s390/kernel/built-in.o(.text+0xdc76): In function `sys32_ipc':
: undefined reference to `compat_sys_shmat'
arch/s390/kernel/built-in.o(.text+0xdcb0): In function `sys32_ipc':
: undefined reference to `compat_sys_shmctl'
make: *** [.tmp_vmlinux1] Error 1
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The latest kernel 2.6.19-rc1 triggers a bug in the s390 specific stack
trace code when compiled with gcc 3.4.
This patch fixes the latest lock dependency validator code (2.6.19-rc1)
on s390 gcc 3.4. The variable sp was fixed to r15 (which is the stack
pointer in the s390 abi) and assigned new values to r15. Therefore,
gcc 3.4 assigns a new value to r15 and does not restore it on exit (r15
is supposed to be call save) - the kernel stack is broken. Avoid trouble
by not assigning any new value to sp (r15).
Signed-off-by: Christian Borntraeger <cborntra@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Remove the last few places where a pointer to pt_regs gets passed.
Also make sure we call set_irq_regs() before irq_enter() and after
irq_exit(). This doesn't fix anything but makes sure s390 looks the
same like all other architectures.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fix too slow clock by using CONFIG_GENERIC_TIME and adding a
clock source for the s390 time-of-day clock. As added benefit
we get rid of the s390 specific definition of do_gettimeofday
and do_settimeofday.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fix new restore_sigregs function. It copies the user space copy of the
old psw without correcting the psw.mask and the psw.addr high order bit.
While we are at it, simplify save_sigregs a bit.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
These patches make the kernel pass 64-bit inode numbers internally when
communicating to userspace, even on a 32-bit system. They are required
because some filesystems have intrinsic 64-bit inode numbers: NFS3+ and XFS
for example. The 64-bit inode numbers are then propagated to userspace
automatically where the arch supports it.
Problems have been seen with userspace (eg: ld.so) using the 64-bit inode
number returned by stat64() or getdents64() to differentiate files, and
failing because the 64-bit inode number space was compressed to 32-bits, and
so overlaps occur.
This patch:
Make filldir_t take a 64-bit inode number and struct kstat carry a 64-bit
inode number so that 64-bit inode numbers can be passed back to userspace.
The stat functions then returns the full 64-bit inode number where
available and where possible. If it is not possible to represent the inode
number supplied by the filesystem in the field provided by userspace, then
error EOVERFLOW will be issued.
Similarly, the getdents/readdir functions now pass the full 64-bit inode
number to userspace where possible, returning EOVERFLOW instead when a
directory entry is encountered that can't be properly represented.
Note that this means that some inodes will not be stat'able on a 32-bit
system with old libraries where they were before - but it does mean that
there will be no ambiguity over what a 32-bit inode number refers to.
Note similarly that directory scans may be cut short with an error on a
32-bit system with old libraries where the scan would work before for the
same reasons.
It is judged unlikely that this situation will occur because modern glibc
uses 64-bit capable versions of stat and getdents class functions
exclusively, and that older systems are unlikely to encounter
unrepresentable inode numbers anyway.
[akpm: alpha build fix]
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>