Commit Graph

864 Commits

Author SHA1 Message Date
Roland McGrath 85ba2d862e tracehook: wait_task_inactive
This extends wait_task_inactive() with a new argument so it can be used in
a "soft" mode where it will check for the task changing state unexpectedly
and back off.  There is no change to existing callers.  This lays the
groundwork to allow robust, noninvasive tracing that can try to sample a
blocked thread but back off safely if it wakes up.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:09 -07:00
Linus Torvalds 29ca069cc6 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
  [IA64] Wire up new system calls
2008-07-25 17:29:03 -07:00
Srinivasa D S ef53d9c5e4 kprobes: improve kretprobe scalability with hashed locking
Currently list of kretprobe instances are stored in kretprobe object (as
used_instances,free_instances) and in kretprobe hash table.  We have one
global kretprobe lock to serialise the access to these lists.  This causes
only one kretprobe handler to execute at a time.  Hence affects system
performance, particularly on SMP systems and when return probe is set on
lot of functions (like on all systemcalls).

Solution proposed here gives fine-grain locks that performs better on SMP
system compared to present kretprobe implementation.

Solution:

 1) Instead of having one global lock to protect kretprobe instances
    present in kretprobe object and kretprobe hash table.  We will have
    two locks, one lock for protecting kretprobe hash table and another
    lock for kretporbe object.

 2) We hold lock present in kretprobe object while we modify kretprobe
    instance in kretprobe object and we hold per-hash-list lock while
    modifying kretprobe instances present in that hash list.  To prevent
    deadlock, we never grab a per-hash-list lock while holding a kretprobe
    lock.

 3) We can remove used_instances from struct kretprobe, as we can
    track used instances of kretprobe instances using kretprobe hash
    table.

Time duration for kernel compilation ("make -j 8") on a 8-way ppc64 system
with return probes set on all systemcalls looks like this.

cacheline              non-cacheline             Un-patched kernel
aligned patch 	       aligned patch
===============================================================================
real    9m46.784s       9m54.412s                  10m2.450s
user    40m5.715s       40m7.142s                  40m4.273s
sys     2m57.754s       2m58.583s                  3m17.430s
===========================================================

Time duration for kernel compilation ("make -j 8) on the same system, when
kernel is not probed.
=========================
real    9m26.389s
user    40m8.775s
sys     2m7.283s
=========================

Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com>
Signed-off-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:30 -07:00
Tony Luck 3e4d0cab61 [IA64] Wire up new system calls
Six new system calls: signalfd4, eventfd2, epoll_create1,
dup3, pipe2 and inotify_init1.

Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-07-25 10:10:28 -07:00
Ulrich Drepper ed8cae8ba0 flag parameters: pipe
This patch introduces the new syscall pipe2 which is like pipe but it also
takes an additional parameter which takes a flag value.  This patch implements
the handling of O_CLOEXEC for the flag.  I did not add support for the new
syscall for the architectures which have a special sys_pipe implementation.  I
think the maintainers of those archs have the chance to go with the unified
implementation but that's up to them.

The implementation introduces do_pipe_flags.  I did that instead of changing
all callers of do_pipe because some of the callers are written in assembler.
I would probably screw up changing the assembly code.  To avoid breaking code
do_pipe is now a small wrapper around do_pipe_flags.  Once all callers are
changed over to do_pipe_flags the old do_pipe function can be removed.

The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>

#ifndef __NR_pipe2
# ifdef __x86_64__
#  define __NR_pipe2 293
# elif defined __i386__
#  define __NR_pipe2 331
# else
#  error "need __NR_pipe2"
# endif
#endif

int
main (void)
{
  int fd[2];
  if (syscall (__NR_pipe2, fd, 0) != 0)
    {
      puts ("pipe2(0) failed");
      return 1;
    }
  for (int i = 0; i < 2; ++i)
    {
      int coe = fcntl (fd[i], F_GETFD);
      if (coe == -1)
        {
          puts ("fcntl failed");
          return 1;
        }
      if (coe & FD_CLOEXEC)
        {
          printf ("pipe2(0) set close-on-exit for fd[%d]\n", i);
          return 1;
        }
    }
  close (fd[0]);
  close (fd[1]);

  if (syscall (__NR_pipe2, fd, O_CLOEXEC) != 0)
    {
      puts ("pipe2(O_CLOEXEC) failed");
      return 1;
    }
  for (int i = 0; i < 2; ++i)
    {
      int coe = fcntl (fd[i], F_GETFD);
      if (coe == -1)
        {
          puts ("fcntl failed");
          return 1;
        }
      if ((coe & FD_CLOEXEC) == 0)
        {
          printf ("pipe2(O_CLOEXEC) does not set close-on-exit for fd[%d]\n", i);
          return 1;
        }
    }
  close (fd[0]);
  close (fd[1]);

  puts ("OK");

  return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:28 -07:00
Andi Kleen 4a0b2b4dbe sysdev: Pass the attribute to the low level sysdev show/store function
This allow to dynamically generate attributes and share show/store
functions between attributes. Right now most attributes are generated
by special macros and lots of duplicated code. With the attribute
passed it's instead possible to attach some data to the attribute
and then use that in shared low level functions to do different things.

I need this for the dynamically generated bank attributes in the x86
machine check code, but it'll allow some further cleanups.

I converted all users in tree to the new show/store prototype. It's a single
huge patch to avoid unbisectable sections.

Runtime tested: x86-32, x86-64
Compiled only: ia64, powerpc
Not compile tested/only grep converted: sh, arm, avr32

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 21:55:02 -07:00
Alex Chiang efc7508c9e [IA64] Avoid overflowing ia64_cpu_to_sapicid in acpi_map_lsapic()
acpi_map_lsapic tries to stuff a long into ia64_cpu_to_sapicid[],
which can only hold ints, so let's fix that.

We need to update the signature of acpi_map_cpu2node() too.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-07-17 11:24:42 -07:00
Akiyama, Nobuyuki 740a8de079 [IA64] adding parameter check to module_free()
module_free() refers the first parameter before checking.
    But it is called like below(in kernel/kprobes). The first parameter is always NULL.
This happens when many probe points(>1024) are set by kprobes.
I encountered this with using SystemTap. It can set many probes easily.

static int __kprobes collect_one_slot(struct kprobe_insn_page *kip, int idx)
{
...
    if (kip->nused == 0) {
	    hlist_del(&kip->hlist);
	    if (hlist_empty(&kprobe_insn_pages)) {
		...
	    } else {
		    module_free(NULL, kip->insns); //<<< 1st param always NULL
		    kfree(kip);
	    }
	    return 1;
    }
    return 0;
}

Signed-off-by: Akiyama, Nobuyuki <akiyama.nobuyuk@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-07-17 11:22:01 -07:00
Denis V. Lunev 60192db829 [IA64] improper printk format in acpi-cpufreq
When dprintk is enabled the following warnings are generated:
arch/ia64/kernel/cpufreq/acpi-cpufreq.c: In function 'processor_set_pstate':
arch/ia64/kernel/cpufreq/acpi-cpufreq.c:54: warning: format '%x' expects type 'unsigned int', but argumen
t 3 has type 's64'
arch/ia64/kernel/cpufreq/acpi-cpufreq.c: In function 'processor_get_pstate':
arch/ia64/kernel/cpufreq/acpi-cpufreq.c:76: warning: format '%x' expects type 'unsigned int', but argumen
t 2 has type 's64'

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-07-17 11:11:17 -07:00
Tony Luck fca515fbfa Pull pvops into release branch 2008-07-17 10:53:37 -07:00
Zhao Yakui da5e09a1b3 ACPI : Create "idle=nomwait" bootparam
"idle=nomwait" disables the use of the MWAIT
instruction from both C1 (C1_FFH) and deeper (C2C3_FFH)
C-states.

When MWAIT is unavailable, the BIOS and OS generally
negotiate to use the HALT instruction for C1,
and use IO accesses for deeper C-states.

This option is useful for power and performance
comparisons, and also to work around BIOS bugs
where broken MWAIT support is advertised.

http://bugzilla.kernel.org/show_bug.cgi?id=10807
http://bugzilla.kernel.org/show_bug.cgi?id=10914

Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Li Shaohua <shaohua.li@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2008-07-16 23:27:05 +02:00
Zhao Yakui c1e3b377ad ACPI: Create "idle=halt" bootparam
"idle=halt" limits the idle loop to using
the halt instruction.  No MWAIT, no IO accesses,
no C-states deeper than C1.

If something is broken in the idle code,
"idle=halt" is a less severe workaround
than "idle=poll" which disables all power savings.

Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2008-07-16 23:27:05 +02:00
Ingo Molnar 1a781a777b Merge branch 'generic-ipi' into generic-ipi-for-linus
Conflicts:

	arch/powerpc/Kconfig
	arch/s390/kernel/time.c
	arch/x86/kernel/apic_32.c
	arch/x86/kernel/cpu/perfctr-watchdog.c
	arch/x86/kernel/i8259_64.c
	arch/x86/kernel/ldt.c
	arch/x86/kernel/nmi_64.c
	arch/x86/kernel/smpboot.c
	arch/x86/xen/smp.c
	include/asm-x86/hw_irq_32.h
	include/asm-x86/hw_irq_64.h
	include/asm-x86/mach-default/irq_vectors.h
	include/asm-x86/mach-voyager/irq_vectors.h
	include/asm-x86/smp.h
	kernel/Makefile

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-15 21:55:59 +02:00
Doug Chapman 3a677d2164 [IA64] export account_system_vtime
The symbol account_system_vtime is used by the kvm module but
not exported.  This breaks building with CONFIG_VIRT_CPU_ACCOUNTING
and CONFIG_KVM=m.

Signed-off-by: Doug Chapman <doug.chapman@hp.com>
Acked-by: Hidetosho Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-06-30 15:06:48 -07:00
Tony Luck dd4f0888f8 [IA64] Bugfix for system with 32 cpus
On a system where there are no hot pluggable cpus "additional_cpus"
is still set to -1 at the point where we call per_cpu_scan_finalize().
If we didn't find an SRAT table and so pick the default "32" for the
number of cpus, when we get to:
high_cpu = min(high_cpu + reserve_cpus, NR_CPUS);
we will end up initializing for just 31 cpus ... and so we will
die horribly when bringing up cpu#32.

Problem introduced by: 2c6e6db41f
"Minimize per_cpu reservations."

Acked-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-06-30 15:03:14 -07:00
Jens Axboe 15c8b6c1aa on_each_cpu(): kill unused 'retry' parameter
It's not even passed on to smp_call_function() anymore, since that
was removed. So kill it.

Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-06-26 11:24:38 +02:00
Jens Axboe 8691e5a8f6 smp_call_function: get rid of the unused nonatomic/retry argument
It's never used and the comments refer to nonatomic and retry
interchangably. So get rid of it.

Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-06-26 11:24:35 +02:00
Jens Axboe f27b433ef3 ia64: convert to generic helpers for IPI function calls
This converts ia64 to use the new helpers for smp_call_function() and
friends, and adds support for smp_call_function_single().

Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-06-26 11:22:30 +02:00
Julia Lawall e2569b7e57 [IA64] Eliminate NULL test after alloc_bootmem in iosapic_alloc_rte()
As noted by Akinobu Mita alloc_bootmem and related functions never return
NULL and always return a zeroed region of memory.  Thus a NULL test or
memset after calls to these functions is unnecessary.

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-06-24 10:28:55 -07:00
Jes Sorensen 2826f8c0f4 [IA64] Fix boot failure on ia64/sn2
Call check_sal_cache_flush() after platform_setup() as
check_sal_cache_flush() now relies on being able to call platform
vector code.

Problem was introduced by: 3463a93def
"Update check_sal_cache_flush to use platform_send_ipi()"

Signed-off-by: Jes Sorensen <jes@sgi.com>
Tested-by: Alex Chiang: <achiang@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-06-24 10:16:27 -07:00
Linus Torvalds c8988f9682 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
  [IA64] Fix CONFIG_IA64_SGI_UV build error
  [IA64] Update check_sal_cache_flush to use platform_send_ipi()
  [IA64] perfmon: fix async exit bug
2008-06-16 11:52:43 -07:00
Alex Chiang 3463a93def [IA64] Update check_sal_cache_flush to use platform_send_ipi()
check_sal_cache_flush is used to detect broken firmware that drops
pending interrupts.

The old implementation schedules a timer interrupt for itself in
the future by getting the current value of the Interval Timer
Counter + 1000 cycles, waits for the interrupt to be pended, calls
SAL_CACHE_FLUSH, and finally checks to see if the interrupt is
still pending.

This implementation can cause problems for virtual machine code if
the process of scheduling the timer interrupt takes more than 1000
cycles; the virtual machine can end up sleeping for several hundred
years while waiting for the ITC to wrap around.

The fix is to use platform_send_ipi. The processor will still send
an interrupt to itself, using the IA64_IPI_DM_INT delivery mode,
which causes the IPI to look like an external interrupt. The rest
of the SAL_CACHE_FLUSH + checking to see if the interrupt is still
pending remains unchanged.

This fix has been boot tested successfully on:

	- intel tiger2
	- hp rx6600
	- hp rx5670

The rx5670 has known buggy firmware, where SAL_CACHE_FLUSH drops
pending interrupts. A boot test on this machine showed this message
on the console:

SAL: SAL_CACHE_FLUSH drops interrupts; PAL_CACHE_FLUSH will be used instead

Which proves that the self-inflicted IPI approach is viable. And
as expected, the other tested platforms correctly did not display
the warning.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-06-11 16:40:33 -07:00
Fenghua Yu 39b8931b5c ACPI: handle invalid ACPI SLIT table
This is a SLIT sanity checking patch.  It moves slit_valid() function to
generic ACPI code and does sanity checking for both x86 and ia64.  It sets up
node_distance with LOCAL_DISTANCE and REMOTE_DISTANCE when hitting invalid
SLIT table on ia64.  It also cleans up unused variable localities in
acpi_parse_slit() on x86.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-06-11 19:13:46 -04:00
stephane eranian 83014699b0 [IA64] perfmon: fix async exit bug
Move the cleanup of the async queue to the close callback from the flush
callback. This avoids losing asynchronous overflow notifications when
the file descriptor is shared by multiple processes and one terminates.

Signed-off-by: Stephane Eranian <eranian@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-06-11 15:24:13 -07:00
Isaku Yamahata 4d58bbcc89 [IA64] pv_ops: move some functions in ivt.S to avoid lack of space.
move interrupt, page_fault, non_syscall, dispatch_unaligned_handler and
dispatch_to_fault_handler to avoid lack of instructin space.
The change set 4dcc29e157 bloated
SAVE_MIN_WITH_COVER, SAVE_MIN_WITH_COVER_R19 so that it bloated the
functions which uses those macros.
In the native case, only dispatch_illegal_op_fault had to be moved.
When paravirtualized case the all functions which use the macros need
to be moved to avoid the lack of space.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-05-28 09:41:58 -07:00
Isaku Yamahata 00d21d82b8 [IA64] pvops: add to hooks, pv_time_ops, for steal time accounting.
Introduce pv_time_ops which adds hook to steal time accounting.
On virtualized environment, cpus are shared by many guests and
steal time is the time which is used for other guests.
On virtualized environtment, streal time should be accounted.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-05-27 15:11:42 -07:00
Isaku Yamahata 85cbc50378 [IA64] pvops: add hooks, pv_irq_ops, to paravirtualized irq related operations.
introduce pv_irq_ops which adds hooks to paravirtualize irq related
operations.
On virtualized environment, interruption may be replaced by something
virtualization friendly. So the irq related operation also may need
paravirtualization.
This patch adds necessary hooks to paravirtualize irq related operations.

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-05-27 15:11:10 -07:00
Isaku Yamahata 33b39e8420 [IA64] pvops: add hooks, pv_iosapic_ops, to paravirtualize iosapic.
add hooks to paravirtualize iosapic which is a real hardware resource.
On virtualized environment it may be replaced something virtualized
friendly.
Define pv_iosapic_ops and add the hooks.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-05-27 15:10:41 -07:00
Isaku Yamahata e51835d58a [IA64] pvops: define initialization hooks, pv_init_ops, for paravirtualized environment.
define pv_init_ops hooks which represents various initialization
hooks for paravirtualized environment. and add hooks.

Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-05-27 15:10:06 -07:00
Isaku Yamahata 213060a4d6 [IA64] pvops: paravirtualize NR_IRQS
Make NR_IRQ overridable by each pv instances.
Pv instance may need each own number of irqs so that
NR_IRQS should be the maximum number of nr_irqs each
pv instances need.

Cc: Jes Sorensen <jes@sgi.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-05-27 15:09:30 -07:00
Isaku Yamahata 4df8d22bbb [IA64] pvops: paravirtualize entry.S
paravirtualize ia64_swtich_to, ia64_leave_syscall and ia64_leave_kernel.
They include sensitive or performance critical privileged instructions
so that they need paravirtualization.
To paravirtualize them by single source and multi compile
they are converted into indirect jump. And define each pv instances.

Cc: Keith Owens <kaos@ocs.com.au>
Cc: "Dong, Eddie" <eddie.dong@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-05-27 15:08:01 -07:00
Isaku Yamahata 498c517047 [IA64] pvops: paravirtualize ivt.S
paravirtualize ivt.S which implements fault handler in hand written
assembly code.
They includes sensitive or performance critical privileged instructions.
So they need paravirtualization.

Cc: Keith Owens <kaos@ocs.com.au>
Cc: tgingold@free.fr
Cc: Akio Takebe <takebe_akio@jp.fujitsu.com>
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-05-27 15:03:29 -07:00
Isaku Yamahata 02e32e36f4 [IA64] pvops: paravirtualize minstate.h.
paravirtualize minstate.h which are hand written assembly code.
They include sensitive or performance critical privileged
instructions. So that they are appropriate for paravirtualization.

Cc: Keith Owens <kaos@ocs.com.au>
Cc: Akio Takebe <takebe_akio@jp.fujitsu.com>
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-05-27 15:02:17 -07:00
Isaku Yamahata 1e39d80a59 [IA64] pvops: preparation for paravirtulization of hand written assembly code.
Preparation for paravirtualization of hand written assembly code.
They are paravirtualized by single source code and compiled multi times.
To tell those files for target (including native), add one defines.

Cc: "Dong, Eddie" <eddie.dong@intel.com>
Cc: Keith Owens <kaos@ocs.com.au>
Cc: tgingold@free.fr
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-05-27 14:46:28 -07:00
Isaku Yamahata 1ff730b52f [IA64] pvops: introduce pv_cpu_ops to paravirtualize privileged instructions.
introduce pv_cpu_ops to paravirtualize privleged instructions
which are defined by ia64 intrinsics.
make them indirect C function calls by introducing function
tables, pv_cpu_ops.

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-05-27 14:40:18 -07:00
Isaku Yamahata 3e0879deb7 [IA64] pvops: add an early setup hook for pv_ops.
This patch adds a setup hook in the very early boot sequence
before start_kernel() to initialize paravirtualization stuff.
The hook will be set by each pv loader code or by using multi entry point.

Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-05-27 14:39:54 -07:00
Isaku Yamahata 90aeb169c0 [IA64] pvops: introduce pv_info which describes some random info.
introduce pv_info which describes some randome info about
underlying execution environment.

Cc: Jes Sorensen <jes@sgi.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-05-27 14:39:30 -07:00
Isaku Yamahata 8311d21c35 [IA64] pvops: preparation: move the constants, LOAD_OFFSET, to a header file.
Move the LOAD_OFFSET definition from vmlinux.lds.S into system.h.
On paravirtualized environments, it is necessary to detect the
execution environment. One of the solutions is the multi entry point.
The multi entry point allows a boot loader to start the kernel execution
from the entry point which is different from the ELF entry point.
The non standard entry point will defined as the specialized elf note
which contains the LMA of the entry point symbol.
The constant, LOAD_OFFSET, is necessary to calculate the symbol's LMA.
Move the definition into the public header file to make it available
to the multi entry point support.

Cc: "He, Qing" <qing.he@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-05-27 14:38:18 -07:00
Isaku Yamahata 444933c6c6 [IA64] pvops: preparation: remove extern in irq_ia64.c
remove extern declaration of handle_IPI() in irq_ia64.c.
Instead, declare it in asm-ia64/smp.h.
Later handle_IPI() will be referenced from another file.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-05-27 14:37:53 -07:00
Tony Luck 4dcc29e157 [IA64] Workaround for RSE issue
Problem: An application violating the architectural rules regarding
operation dependencies and having specific Register Stack Engine (RSE)
state at the time of the violation, may result in an illegal operation
fault and invalid RSE state.  Such faults may initiate a cascade of
repeated illegal operation faults within OS interruption handlers.
The specific behavior is OS dependent.

Implication: An application causing an illegal operation fault with
specific RSE state may result in a series of illegal operation faults
and an eventual OS stack overflow condition.

Workaround: OS interruption handlers that switch to kernel backing
store implement a check for invalid RSE state to avoid the series
of illegal operation faults.

The core of the workaround is the RSE_WORKAROUND code sequence
inserted into each invocation of the SAVE_MIN_WITH_COVER and
SAVE_MIN_WITH_COVER_R19 macros.  This sequence includes hard-coded
constants that depend on the number of stacked physical registers
being 96.  The rest of this patch consists of code to disable this
workaround should this not be the case (with the presumption that
if a future Itanium processor increases the number of registers, it
would also remove the need for this patch).

Move the start of the RBS up to a mod32 boundary to avoid some
corner cases.

The dispatch_illegal_op_fault code outgrew the spot it was
squatting in when built with this patch and CONFIG_VIRT_CPU_ACCOUNTING=y
Move it out to the end of the ivt.

Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-05-27 13:24:39 -07:00
Al Viro f52111b154 [PATCH] take init_files to fs/file.c
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-05-16 17:22:20 -04:00
Prarit Bhargava 3fb2c74ee2 [IA64] Properly unregister legacy interrupts
acpi_unregister_gsi() should "undo" what acpi_register_gsi() does.

On systems that have legacy interrupts, acpi_unregister_gsi erroneously calls
iosapci_unregister_intr() which is wrong to do and causes a loud warning.

acpi_unregister_gsi() should just return in these cases.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-05-14 16:00:14 -07:00
Simon Holm Thøgersen 7af1d7532b [IA64] Remove NULL pointer check for argument never passed as NULL.
There is only palinfo_handle_smp as (indirect) user of palinfo_smp_call (by
way of smp_call_function_single) and surely palinfo_handle_smp never pass
NULL as parameter for info.

Signed-off-by: Simon Holm Thøgersen <odie@cs.aau.dk>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-05-14 15:58:27 -07:00
Hidetoshi Seto 0fb232fdb2 [IA64] trivial cleanup for perfmon.c
Fix a typo, and coding style cleanups for pfm_handle_work().

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-05-14 15:56:34 -07:00
Hidetoshi Seto 2e513fe490 [IA64] trivial cleanup for entry.S
This patch does:
 - make comment at next to resched check more robust
 - move "re-check" comments to next to where change predicate regs

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-05-14 15:56:09 -07:00
Hidetoshi Seto 3633c73080 [IA64] fix interrupt masking for pending works on kernel leave
[Bug-fix for "[BUG?][2.6.25-mm1] sleeping during IRQ disabled"]

This patch does:
 - enable interrupts before calling schedule() as same as others, ex. x86
 - enable interrupts during ia64_do_signal() and ia64_sync_krbs()
 - do_notify_resume_user() is still called with interrupts disabled, since
   we can take short path of fsys_mode if-statement quickly.
 - pfm_handle_work() is also called with interrupts disabled, since
   it can deal interrupt mask within itself.
 - fix/add some comments/notes

Reported-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-05-14 15:55:35 -07:00
Alex Chiang f13ae30e13 [IA64] allow user to force_pal_cache_flush
The sequence executed in check_sal_cache_flush:

	- pend a timer interrupt
	- call SAL_CACHE_FLUSH
	- see if interrupt is still pending

can hang HP machines with buggy SAL_CACHE_FLUSH implementations.

Provide a kernel command-line argument to allow users skip this
check if desired. Using this parameter will force ia64_sal_cache_flush
to call ia64_pal_cache_flush() instead of SAL_CACHE_FLUSH.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-05-14 15:42:07 -07:00
Bernhard Walle 8a3360f06c [IA64] Don't reserve crashkernel memory > 4 GB
Some IA64 machines map all cell-local memory above 4 GB (32 bit limit).
However, in most cases, the kernel needs some memory below that limit that is
DMA-capable. So in this machine configuration, the crashkernel will be reserved
above 4 GB.

For machines that use SWIOTLB implementation because they lack an I/O MMU
the low memory is required by the SWIOTLB implementation. In that case,
it doesn't make sense to reserve the crashkernel at all because it's unusable
for kdump.

A special case is the "hpzx1" machine vector. In theory, it has a I/O MMU, so
it can be booted above 4 GB. However, in the kdump case that is not possible
because of changeset 51b58e3e26ebfb8cd56825c4b396ed251f51dec9:

    On HP zx1 machines, the 'machvec=dig' parameter is needed for the kdump
    kernel to avoid problems with the HP sba iommu.  The problem is that during
    the boot of the kdump kernel, the iommu is re-initialized, so in-flight DMA
    from improperly shutdown drivers causes an IOTLB miss which leads to an
    MCA.  With kdump, the idea is to get into the kdump kernel with as little
    code as we can, so shutting down drivers properly is not an option.

    The workaround is to add 'machvec=dig' to the kdump kernel boot parameters.
    This makes the kdump kernel avoid using the sba iommu altogether, leaving
    the IOTLB intact.  Any ongoing DMA falls harmlessly outside the kdump
    kernel.  After the kdump kernel reboots, all devices will have been
    shutdown properly and DMA stopped.

This patch pushes that functionality into the sba iommu initialization
code, so that users won't have to find the obscure documentation telling
them about 'machvec=dig'.

This means that also for hpzx1 it's not possible to boot when all
memory is above the 4 GB limit. So the only machine vectors that can handle
this case are "sn2" and "uv".

Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-05-14 15:40:40 -07:00
Jack Steiner 2224661494 [IA64] machvec support for SGI UV platform
This patch adds the basic IA64 machvec infrastructure to support
the SGI "UV" platform.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-05-14 14:22:04 -07:00
Al Viro f8e811b989 [IA64] fix file and descriptor handling in perfmon
Races galore...  General rule: as soon as it's in descriptor table,
it's over; another thread might have started IO on it/dup2() it
elsewhere/dup2() something *over* it/etc.  fd_install() is the very
last step one should take - it's a point of no return.

Besides, the damn thing leaked on failure exits...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-05-01 14:36:36 -07:00