Commit Graph

480 Commits

Author SHA1 Message Date
Adrian Cox bbbe1212bd [PATCH] ppc: Fix platform_notify functions marked __init
While adding USB support to an MV64360 based board this week, I
discovered that all MV64x60 boards in the kernel have platform_notify
functions marked with __init. This causes an oops if a device is added
after boot.

The patch below removes the __init markers. I do not have all these
boards to test on, but the change seems very unlikely to break anything
else.

Signed-off-by: Adrian Cox <adrian@humboldt.co.uk>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-17 13:23:21 +11:00
Eric Sesterhenn d116fe5aea [PATCH] kzalloc() conversion in arch/ppc
This converts arch/ppc to kzalloc usage.
Crosscompile tested with allyesconfig.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-17 13:20:57 +11:00
Paul Mackerras 5164501794 Merge ../linux-2.6 2006-03-09 14:32:05 +11:00
Paul Mackerras 1bd79336a4 powerpc: Fix various syscall/signal/swapcontext bugs
A careful reading of the recent changes to the system call entry/exit
paths revealed several problems, plus some things that could be
simplified and improved:

* 32-bit wasn't testing the _TIF_NOERROR bit in the syscall fast exit
  path, so it was only doing anything with it once it saw some other
  bit being set.  In other words, the noerror behaviour would apply to
  the next system call where we had to reschedule or deliver a signal,
  which is not necessarily the current system call.

* 32-bit wasn't doing the call to ptrace_notify in the syscall exit
  path when the _TIF_SINGLESTEP bit was set.

* _TIF_RESTOREALL was in both _TIF_USER_WORK_MASK and
  _TIF_PERSYSCALL_MASK, which is odd since _TIF_RESTOREALL is only set
  by system calls.  I took it out of _TIF_USER_WORK_MASK.

* On 64-bit, _TIF_RESTOREALL wasn't causing the non-volatile registers
  to be restored (unless perhaps a signal was delivered or the syscall
  was traced or single-stepped).  Thus the non-volatile registers
  weren't restored on exit from a signal handler.  We probably got
  away with it mostly because signal handlers written in C wouldn't
  alter the non-volatile registers.

* On 32-bit I simplified the code and made it more like 64-bit by
  making the syscall exit path jump to ret_from_except to handle
  preemption and signal delivery.

* 32-bit was calling do_signal unnecessarily when _TIF_RESTOREALL was
  set - but I think because of that 32-bit was actually restoring the
  non-volatile registers on exit from a signal handler.

* I changed the order of enabling interrupts and saving the
  non-volatile registers before calling do_syscall_trace_leave; now we
  enable interrupts first.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-08 13:24:22 +11:00
Paul Mackerras a00428f5b1 Merge ../powerpc-merge 2006-02-24 14:05:47 +11:00
Alan Curry f1434a4854 [PATCH] powerpc: fix altivec_unavailable_exception Oopses
altivec_unavailable_exception is called without setting r3... it looks like
the r3 that actually gets passed in as struct pt_regs *regs is the
undisturbed value of r3 at the time the altivec instruction was encountered.
The user actually gets to choose the pt_regs printed in the Oops!

This fixes the oops by passing the correct pt_regs pointer to
altivec_unavailable_exception.

Signed-off-by: Alan Curry <pacman@TheWorld.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-02-24 11:36:23 +11:00
Olaf Hering c57914a4f2 [PATCH] ppc: fix adb breakage in xmon
Fix up xmon compilation after the last change.
Remove lots of dead code, all the pmac and chrp support is in arch/powerpc

Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-02-24 11:36:20 +11:00
Olaf Hering 0728a2f99e [PATCH] powerpc: remove duplicate exports
A few symbols are exported twice, remove them from ppc_ksyms.c
Remove users of sys_ctrler in arch/ppc/

WARNING: vmlinux: duplicate symbol '__delay' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol '__up' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol '__down' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol '__down_interruptible' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'sys_ctrler' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'strncat' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'strncmp' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'strchr' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'strrchr' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'strnlen' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'strpbrk' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'memscan' previous definition was in vmlinux
WARNING: vmlinux: duplicate symbol 'strstr' previous definition was in vmlinux

Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-02-20 10:44:31 +11:00
Jon Mason 2ef9481e66 [PATCH] powerpc: trivial: modify comments to refer to new location of files
This patch removes all self references and fixes references to files
in the now defunct arch/ppc64 tree.  I think this accomplises
everything wanted, though there might be a few references I missed.

Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-02-10 16:53:51 +11:00
Vitaly Bordug 75288c78c6 [PATCH] ppc32: Make platform devices being able to assign functions
Implemented by  modification of the .name field of the platform device,
when PDs with the
same names are to be used within different drivers, as
<device_name> -> <device_name>:<function>
Corresponding drivers should change the .name in struct device_driver to
reflect upper of course.

Added ppc_sys_device_disable/enable function set, making it easier to
disable all the inexistent/not utilized platform device way pdevs. By the
check of the "disabled" bit in the config field of ppc_sys_specs, disabled
platform devices will be either added/removed from the bus, or simply not
registered on it, depending on the time when disable/enable call asserted.

The default behaviour when nothing is disabled/enabled will be "all devices
are enabled", which is the same as before.

Also helper platform_notify_map function added, making assignment of
board-specific platform_info more consistent and generic.

Signed-off-by: Vitaly Bordug <vbordug@ru.mvista.com>
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-02-10 16:52:46 +11:00
Becky Bruce a7cb03375d [PATCH] powerpc/ppc: Add missing isyncs in head_fsl_booke.S
The e500 core reference manual indicates that isync is required
after mtmsr(DE bit) and mtspr DBCR0.  Add isyncs to make the code
conform to the spec.

Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-02-10 16:52:00 +11:00
Paul Mackerras d6d93856cb Merge ../powerpc-merge 2006-02-10 16:51:29 +11:00
Paul Mackerras 8568daa490 ppc: Use the system call table from arch/powerpc/kernel/systbl.S
With this, new system calls only have to be wired up in one place
for ARCH=ppc and ARCH=powerpc, rather than 2.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-02-10 16:02:20 +11:00
Linus Torvalds 17be03f0a1 Merge master.kernel.org:/home/rmk/linux-2.6-serial 2006-02-08 15:21:22 -08:00
Linus Torvalds fe69102188 Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge 2006-02-07 20:32:13 -08:00
Al Viro 3ba9d91208 [PATCH] ppc: last_task_.... is defined only on non-SMP
... so it should be exported only on non-SMP.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-02-07 20:57:08 -05:00
Al Viro 83ec98be05 [PATCH] fix breakage in ocp.c
it's ocp_device_...., not ocp_driver_....

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-02-07 20:56:57 -05:00
Paul Mackerras 8f75015f33 Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc 2006-02-08 09:43:08 +11:00
Domen Puncer 155da5ff57 [PATCH] powerpc: Remove arch/ppc/syslib/ppc4xx_pm.c
Remove nowhere referenced file ("grep ppc4xx_pm -r ." didn't find anything).

Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-02-07 22:38:38 +11:00
Vitaly Bordug 42dc75c4b9 [PATCH] ppc32: MPC885ADS, MPC866ADS and MPC8272ADS-specific platform stuff for fs_enet
Added proper ppc_sys identification and fs_platform_info's for MPC 885ADS,
866ADS and 8272ADS, utilizing function assignment to remove/do not use
platform devices which conflict with PD-incompatible drivers.

Signed-off-by: Vitaly Bordug <vbordug@ru.mvista.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-02-07 22:36:33 +11:00
Grant C. Likely b928917516 [PATCH] powerpc: Add ML403 defconfig
Signed-off-by: Grant C. Likely <grant.likely@secretlab.ca>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-02-07 22:36:02 +11:00
Grant C. Likely 909aeca664 [PATCH] powerpc: Add support for Xilinx ML403 reference design
Includes fix for Xilinx silicon errata 213

Signed-off-by: Grant C. Likely <grant.likely@secretlab.ca>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-02-07 22:36:01 +11:00
Grant C. Likely b58b5aa51c [PATCH] powerpc: Add xparameters file for Xilinx ML403 reference design
Signed-off-by: Grant C. Likely <grant.likely@secretlab.ca>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-02-07 22:36:01 +11:00
Grant C. Likely 5eb446cb72 [PATCH] powerpc: Add ML300 defconfig
Signed-off-by: Grant C. Likely <grant.likely@secretlab.ca>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-02-07 22:35:59 +11:00
Grant C. Likely e27db622b8 [PATCH] powerpc: Migrate ML300 reference design to the platform bus
Signed-off-by: Grant C. Likely <grant.likely@secretlab.ca>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-02-07 22:35:59 +11:00
Grant C. Likely 1a42e53d17 [PATCH] powerpc: Migrate Xilinx Vertex support from the OCP bus to the platfom bus.
This patch only deals with the serial port definitions as there is no
support for any other xilinx IP cores in the kernel tree at the moment.

Board specific configuration moved out of virtex.[ch] and into the
xparameters.h wrapper.

This also prepares for the transition to the flattened device tree model.
When the bootloader provides a device tree generated from an xparameters.h
files, the kernel will no longer need xparameters/*.  The platform bus will
get populated with data from the device tree, and the device drivers will
be automatically connected to the devices.  Only the bootloader (or
ppcboot) will need xparameters directly.

Signed-off-by: Grant C. Likely <grant.likely@secretlab.ca>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-02-07 22:35:58 +11:00
Grant C. Likely 562e7370a4 [PATCH] powerpc: Make Virtex-II Pro support generic for all Virtex devices
The PPC405 hard core is used in both the Virtex-II Pro and Virtex 4 FX
FPGAs.  This patch cleans up the Virtex naming convention to reflect more
than just the Virtex-II Pro.

Rename files virtex-ii_pro.[ch] to virtex.[ch]
Rename config value VIRTEX_II_PRO to XILINX_VIRTEX

Signed-off-by: Grant C. Likely <grant.likely@secretlab.ca>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-02-07 22:35:57 +11:00
Grant C. Likely b4367e7451 [PATCH] powerpc: Move xparameters.h into xilinx virtex device specific path
xparameters should not be needed by anything but virtex platform code.
Move it from include/asm-ppc/ to platforms/4xx/xparameters/

This is preparing for work to remove xparameters from the dependancy tree
for most c files.  xparam changes should not cause a recompile of the world.
Instead, drivers should get device info from the platform bus (populated
by the boot code)

Signed-off-by: Grant C. Likely <grant.likely@secretlab.ca>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-02-07 22:35:35 +11:00
Marcelo Tosatti 3ea4807de7 [PATCH] powerpc/8xx: last two 8MB D-TLB entries are incorrectly set
The last two 8MB TLB entries are being incorrectly set by initial_mmu on 8xx.

The first entry is written with the same virtual/physical address, which
renders it invalid:

BDI>rms 792 0x00001e00
BDI>rms 824 1
BDI>rds 824
SPR  824 : 0xc08000c0  -1065353024
BDI>rds 825
SPR  825 : 0xc0800de0  -1065349664
BDI>rds 826
SPR  826 : 0x00000000            0

And the second entry, in addition, does not have its TLB index set
correctly.

Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-02-07 21:28:37 +11:00
Russell King 59a675b220 [SERIAL] uart_port flags member should use UPF_*
Convert usage of ASYNC_* to UPF_*.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-02-05 10:52:29 +00:00
Russell King 9b4a161777 [SERIAL] uart_port iotype member should use UPIO_*
Convert usage of SERIAL_IO_* to UPIO_*.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-02-05 10:48:10 +00:00
Dale Farnsworth 6651a5c383 [PATCH] mv643xx_eth: Fix for building as a module
Enable mv643xx_eth driver to work when built as a module on
mv64x60-based embedded systems.

Signed-off-by: Dale Farnsworth <dale@farnsworth.org>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2006-01-27 11:09:24 -05:00
Vitaly Bordug 076d022c56 [PATCH] PPC32 8xx: support for the physmapped flash on m8xx
Implemented more correct way to support physmapped flash on m8xx
than map in mtd.

The areas intended to contain bootloader are protected readonly.
Note that CFI and JEDEC stuff should be configured properly in order
this to work, e.g. for 885/86x CFI should support 4-chip flash interleave.
Also fixed compilation warning.

Signed-off-by: Vitaly Bordug <vbordug@ru.mvista.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-20 16:13:55 +11:00
Marcelo Tosatti 0ec57e53c9 [PATCH] powerpc: generalize PPC44x_PIN_SIZE
The following patch generalizes PPC44x_PIN_SIZE by changing it to
PPC_PIN_SIZE, which can be defined by any sub-arch to automatically adjust
VMALLOC_START.

Define PPC_PIN_SIZE on 8xx, avoiding potential conflicts with the
pinned space.

Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-20 16:13:50 +11:00
Vitaly Bordug 0ce928e1b2 [PATCH] ppc32 8xx: Added setbitsXX/clrbitsXX macro for read-modify-write operations
This adds setbitsXX/clrbitsXX macro for read-modify-write operations
and converts the 8xx core and drivers to use them.

Signed-off-by: Vitaly Bordug <vbordug@ru.mvista.com>
Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-20 16:13:29 +11:00
Jesse Brandeburg 35ec56bb78 [PATCH] e1000: Added disable packet split capability
Adds the ability to disability packet split at compile time and use the legacy receive path on PCI express hardware.  Made this a CONFIG option and modified the Kconfig, to reflect the new option.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: John Ronciak <john.ronciak@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2006-01-18 16:17:57 -05:00
Paul Mackerras e05b3b4adb powerpc/32: Restore previous version of 32-bit PCI code
When I removed the powermac support from arch/ppc/kernel/pci.c,
I overlooked the fact that that file is used in 32-bit ARCH=powerpc
builds.  To prevent problems in future, restore the original version
of that file as arch/powerpc/kernel/pci_32.c, and use that.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-15 22:05:47 +11:00
Paul Mackerras a7fdd90bc4 [PATCH] ppc: Remove powermac support from ARCH=ppc
This makes it possible to build kernels for PReP and/or CHRP
with ARCH=ppc by removing the (non-building) powermac support.
It's now also possible to select PReP and CHRP independently.
Powermac users should now build with ARCH=powerpc instead of
ARCH=ppc.  (This does mean that it is no longer possible to
build a 32-bit kernel for a G5.)

Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-15 17:30:44 +11:00
Linus Torvalds 3e2b32b693 Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6 2006-01-14 10:42:40 -08:00
Adrian Bunk 3824ba7df9 [PATCH] remove unused tmp_buf_sem's
tmp_buf_sem sems to be a common name for something completely unused...

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de> ("usb portion")
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-14 10:41:42 -08:00
Paul Mackerras 91f62a2491 ppc: Remove duplicate export of get_wchan
The arch/powerpc version of process.c exports get_wchan itself.  When
I moved ARCH=ppc over to using arch/powerpc/kernel/process.c the
get_wchan export in arch/ppc/kernel/ppc_ksyms.c became redundant, so
remove it.

Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from 9871166ad692121d6b944159ef3f053570158ea8 commit)
2006-01-14 11:48:29 +11:00
Marcelo Tosatti 8f069b1a90 [PATCH] powerpc/8xx: Use 8MB D-TLB's for kernel static mapping faults
The following implements support for instantiation of 8MB D-TLB
entries for the kernel direct virtual mapping on 8xx, thus reducing TLB
space consumed for the kernel.

Test used: writing 40MB from /dev/zero to file in ext2fs over 
RAMDISK.

$ time dd if=/dev/zero of=file bs=4k count=10000 

VANILLA			8MB kernel data pages

real    0m11.485s	real    0m11.267s
user    0m0.218s        user    0m0.250s
sys     0m8.939s	sys     0m9.108s

real    0m11.518s	real    0m10.978s
user    0m0.203s 	user    0m0.222s
sys     0m9.585s	sys     0m9.138s

real    0m11.554s	real    0m10.967s
user    0m0.228s    	user    0m0.222s
sys     0m9.497s	sys     0m9.127s

real    0m11.633s	real	0m11.286s
user    0m0.214s	user    0m0.196s
sys     0m9.529s	sys     0m9.134s

and averages for both:

real	11.54750	real 11.12450

Which is a 3.6% improvement in execution time. More improvement is
expected for loads with larger kernel data footprint (real workloads).

Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-14 11:14:27 +11:00
Russell King 91fb53866d [PATCH] Add ocp_bus_type probe and remove methods
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-01-13 11:26:06 -08:00
Kumar Gala 7e78e5e502 [PATCH] powerpc: Updated platforms that use gianfar to match driver
The gianfar driver changed how it required MDIO bus and phy id's
to be passed to it.  Also, it no longer passes the physical address
of the MDIO bus.  Instead we have a proper platform device.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-13 21:16:18 +11:00
Linus Torvalds 45bfe98bd7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge
Fix up delete/modify conflict of arch/ppc/kernel/process.c by hand (it's
gone, gone, gone).

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-12 10:21:22 -08:00
Al Viro 639074354b [PATCH] m68k: kill mach_floppy_setup, convert to proper __setup() in drivers
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-12 09:09:05 -08:00
Al Viro b4290a23cf [PATCH] m68k: namespace pollution fix (custom->amiga_custom)
in amigahw.h custom renamed to amiga_custom, in drivers with few instances the
same replacement, in the rest - #define custom amiga_custom in driver itself

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-12 09:09:00 -08:00
Al Viro 0cec6fd137 [PATCH] powerpc: task_stack_page()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-12 09:08:57 -08:00
Al Viro b5e2fc1c62 [PATCH] powerpc: task_thread_info()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-12 09:08:57 -08:00
akpm@osdl.org 198e2f1811 [PATCH] scheduler cache-hot-autodetect
)

From: Ingo Molnar <mingo@elte.hu>

This is the latest version of the scheduler cache-hot-auto-tune patch.

The first problem was that detection time scaled with O(N^2), which is
unacceptable on larger SMP and NUMA systems. To solve this:

- I've added a 'domain distance' function, which is used to cache
  measurement results. Each distance is only measured once. This means
  that e.g. on NUMA distances of 0, 1 and 2 might be measured, on HT
  distances 0 and 1, and on SMP distance 0 is measured. The code walks
  the domain tree to determine the distance, so it automatically follows
  whatever hierarchy an architecture sets up. This cuts down on the boot
  time significantly and removes the O(N^2) limit. The only assumption
  is that migration costs can be expressed as a function of domain
  distance - this covers the overwhelming majority of existing systems,
  and is a good guess even for more assymetric systems.

  [ People hacking systems that have assymetries that break this
    assumption (e.g. different CPU speeds) should experiment a bit with
    the cpu_distance() function. Adding a ->migration_distance factor to
    the domain structure would be one possible solution - but lets first
    see the problem systems, if they exist at all. Lets not overdesign. ]

Another problem was that only a single cache-size was used for measuring
the cost of migration, and most architectures didnt set that variable
up. Furthermore, a single cache-size does not fit NUMA hierarchies with
L3 caches and does not fit HT setups, where different CPUs will often
have different 'effective cache sizes'. To solve this problem:

- Instead of relying on a single cache-size provided by the platform and
  sticking to it, the code now auto-detects the 'effective migration
  cost' between two measured CPUs, via iterating through a wide range of
  cachesizes. The code searches for the maximum migration cost, which
  occurs when the working set of the test-workload falls just below the
  'effective cache size'. I.e. real-life optimized search is done for
  the maximum migration cost, between two real CPUs.

  This, amongst other things, has the positive effect hat if e.g. two
  CPUs share a L2/L3 cache, a different (and accurate) migration cost
  will be found than between two CPUs on the same system that dont share
  any caches.

(The reliable measurement of migration costs is tricky - see the source
for details.)

Furthermore i've added various boot-time options to override/tune
migration behavior.

Firstly, there's a blanket override for autodetection:

	migration_cost=1000,2000,3000

will override the depth 0/1/2 values with 1msec/2msec/3msec values.

Secondly, there's a global factor that can be used to increase (or
decrease) the autodetected values:

	migration_factor=120

will increase the autodetected values by 20%. This option is useful to
tune things in a workload-dependent way - e.g. if a workload is
cache-insensitive then CPU utilization can be maximized by specifying
migration_factor=0.

I've tested the autodetection code quite extensively on x86, on 3
P3/Xeon/2MB, and the autodetected values look pretty good:

Dual Celeron (128K L2 cache):

 ---------------------
 migration cost matrix (max_cache_size: 131072, cpu: 467 MHz):
 ---------------------
           [00]    [01]
 [00]:     -     1.7(1)
 [01]:   1.7(1)    -
 ---------------------
 cacheflush times [2]: 0.0 (0) 1.7 (1784008)
 ---------------------

Here the slow memory subsystem dominates system performance, and even
though caches are small, the migration cost is 1.7 msecs.

Dual HT P4 (512K L2 cache):

 ---------------------
 migration cost matrix (max_cache_size: 524288, cpu: 2379 MHz):
 ---------------------
           [00]    [01]    [02]    [03]
 [00]:     -     0.4(1)  0.0(0)  0.4(1)
 [01]:   0.4(1)    -     0.4(1)  0.0(0)
 [02]:   0.0(0)  0.4(1)    -     0.4(1)
 [03]:   0.4(1)  0.0(0)  0.4(1)    -
 ---------------------
 cacheflush times [2]: 0.0 (33900) 0.4 (448514)
 ---------------------

Here it can be seen that there is no migration cost between two HT
siblings (CPU#0/2 and CPU#1/3 are separate physical CPUs). A fast memory
system makes inter-physical-CPU migration pretty cheap: 0.4 msecs.

8-way P3/Xeon [2MB L2 cache]:

 ---------------------
 migration cost matrix (max_cache_size: 2097152, cpu: 700 MHz):
 ---------------------
           [00]    [01]    [02]    [03]    [04]    [05]    [06]    [07]
 [00]:     -    19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
 [01]:  19.2(1)    -    19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
 [02]:  19.2(1) 19.2(1)    -    19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
 [03]:  19.2(1) 19.2(1) 19.2(1)    -    19.2(1) 19.2(1) 19.2(1) 19.2(1)
 [04]:  19.2(1) 19.2(1) 19.2(1) 19.2(1)    -    19.2(1) 19.2(1) 19.2(1)
 [05]:  19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)    -    19.2(1) 19.2(1)
 [06]:  19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)    -    19.2(1)
 [07]:  19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)    -
 ---------------------
 cacheflush times [2]: 0.0 (0) 19.2 (19281756)
 ---------------------

This one has huge caches and a relatively slow memory subsystem - so the
migration cost is 19 msecs.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Cc: <wilder@us.ibm.com>
Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-12 09:08:50 -08:00