Commit Graph

95 Commits

Author SHA1 Message Date
David S. Miller b99c6ebe8f sparc64: Fix UP bootup regression.
Commit b696fdc259 ("sparc64: Defer
cpu_data() setup until end of per-cpu data initialization.") broke
bootup for UP builds because the cpu_data() initialization only
occurs in setup_per_cpu_areas() which is never compiled in nor
called in UP builds.

Fix this up by calling the setups directly from init_64.c when
non-SMP.

Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Tested-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-18 23:50:41 -07:00
Stephen Rothwell 6ac5c61082 sparc: replace uses of CPU_MASK_ALL_PTR
CPU_MASK_ALL is the (deprecated) "all bits set" cpumask, defined as so:

	#define CPU_MASK_ALL (cpumask_t) { { ... } }

Taking the address of such a temporary is questionable at best,
unfortunately 321a8e9d (cpumask: add CPU_MASK_ALL_PTR macro) added
CPU_MASK_ALL_PTR:

	#define CPU_MASK_ALL_PTR (&CPU_MASK_ALL)

Which formalizes this practice.  One day gcc could bite us over this
usage (though we seem to have gotten away with it so far).

[Description by Rusty Russell]
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-16 04:56:55 -07:00
Robert P. J. Day 949e82744b sparc: Simplify code using is_power_of_2() routine.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-16 04:56:51 -07:00
David S. Miller 73fffc037e sparc64: Get rid of real_setup_per_cpu_areas().
Now that we defer the cpu_data() initializations to the end of per-cpu
setup, we can get rid of this local hack we had to setup the per-cpu
areas eary.

This is a necessary step in order to support HAVE_DYNAMIC_PER_CPU_AREA
since the per-cpu setup must run when page structs are available.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-16 04:56:23 -07:00
David S. Miller b696fdc259 sparc64: Defer cpu_data() setup until end of per-cpu data initialization.
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-16 04:56:22 -07:00
David S. Miller a2094502dc sparc64: Make mdesc_fill_in_cpu_data take a cpumask_t pointer.
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-16 04:56:21 -07:00
David S. Miller 890db403d5 sparc: Call OF and MD cpu scanning explicitly from paging_init()
We need to split up the cpu present mask setup from the cpu_data
initialization, and this is a first step towards that.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-16 04:56:19 -07:00
Rusty Russell ad6561dffa module: trim exception table on init free.
It's theoretically possible that there are exception table entries
which point into the (freed) init text of modules.  These could cause
future problems if other modules get loaded into that memory and cause
an exception as we'd see the wrong fixup.  The only case I know of is
kvm-intel.ko (when CONFIG_CC_OPTIMIZE_FOR_SIZE=n).

Amerigo fixed this long-standing FIXME in the x86 version, but this
patch is more general.

This implements trim_init_extable(); most archs are simple since they
use the standard lib/extable.c sort code.  Alpha and IA64 use relative
addresses in their fixups, so thier trimming is a slight variation.

Sparc32 is unique; it doesn't seem to define ARCH_HAS_SORT_EXTABLE,
yet it defines its own sort_extable() which overrides the one in lib.
It doesn't sort, so we have to mark deleted entries instead of
actually trimming them.

Inspired-by: Amerigo Wang <amwang@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: linux-alpha@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
2009-06-12 21:47:04 +09:30
David S. Miller 01c4538158 sparc64: add_node_ranges() must be __init
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-08 03:37:35 -07:00
David S. Miller 9a2ed5cc9e sparc64: Fix section mismatch warnings in PCI controller drivers.
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-08 03:28:15 -07:00
Akinobu Mita 7ca43e7564 mm: use debug_kmap_atomic
Use debug_kmap_atomic in kmap_atomic, kmap_atomic_pfn, and
iomap_atomic_prot_pfn.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01 08:59:14 -07:00
David S. Miller ed223129a3 Merge branch 'master' of ssh://master.kernel.org/home/ftp/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask-for-sparc
Conflicts:
	arch/sparc/kernel/smp_64.c
2009-03-29 15:44:22 -07:00
David S. Miller 42cc77c861 sparc64: Reschedule KGDB capture to a software interrupt.
Otherwise it might interrupt switch_to() midstream and use
half-cooked register window state.

Reported-by: Chris Torek <chris.torek@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-18 23:51:57 -07:00
Rusty Russell ec7c14bde8 cpumask: prepare for iterators to only go to nr_cpu_ids/nr_cpumask_bits.: sparc
Impact: cleanup, futureproof

In fact, all cpumask ops will only be valid (in general) for bit
numbers < nr_cpu_ids.  So use that instead of NR_CPUS in various
places.

This is always safe: no cpu number can be >= nr_cpu_ids, and
nr_cpu_ids is initialized to NR_CPUS at boot.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
2009-03-16 14:40:24 +10:30
Rusty Russell e305cb8f09 cpumask: prepare for iterators to only go to nr_cpu_ids/nr_cpumask_bits.: sparc64
Impact: cleanup, futureproof

In fact, all cpumask ops will only be valid (in general) for bit
numbers < nr_cpu_ids.  So use that instead of NR_CPUS in various
places.

This is always safe: no cpu number can be >= nr_cpu_ids, and
nr_cpu_ids is initialized to NR_CPUS at boot.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
2009-03-16 14:40:23 +10:30
David S. Miller 9b02605826 sparc64: Kill bogus TPC/address truncation during 32-bit faults.
This builds upon eeabac7386
("sparc64: Validate kernel generated fault addresses on sparc64.")

Upon further consideration, we actually should never see any
fault addresses for 32-bit tasks with the upper 32-bits set.

If it does every happen, by definition it's a bug.  Whatever
context created that fault would only have that fault satisfied
if we used the full 64-bit address.  If we truncate it, we'll
always fault the wrong address and we'll always loop faulting
forever.

So catch such conditions and mark them as errors always.  Log
the error and fail the fault.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-03 16:28:23 -08:00
David S. Miller eeabac7386 sparc64: Validate kernel generated fault addresses on sparc64.
In order to handle all of the cases of address calculation overflow
properly, we run sparc 32-bit processes in "address masking" mode
when running on a 64-bit kernel.

Address masking mode zeros out the top 32-bits of the address
calculated for every load and store instruction.

However, when we're in privileged mode we have to run with that
address masking mode disabled even when accessing userspace from
the kernel.

To "simulate" the address masking mode we clear the top-bits by
hand for 32-bit processes in the fault handler.

It is the responsibility of code in the compat layer to properly
zero extend addresses used to access userspace.  If this isn't
followed properly we can get into a fault loop.

Say that the user address is 0xf0000000 but for whatever reason
the kernel code sign extends this to 64-bit, and then the kernel
tries to access the result.

In such a case we'll fault on address 0xfffffffff0000000 but the fault
handler will process that fault as if it were to address 0xf0000000.
We'll loop faulting forever because the fault never gets satisfied.

So add a check specifically for this case, when the kernel is faulting
on a user address access and the addresses don't match up.

This code path is sufficiently slow path, and this bug is sufficiently
painful to diagnose, that this kind of bug check is warranted.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-02 22:08:15 -08:00
Sam Ravnborg 917c3660d6 sparc64: move EXPORT_SYMBOL to the symbols definition
Move all applicable EXPORT_SYMBOL()s to the file where the respective
symbol is defined.

Removed all the includes that are no longer needed in sparc_ksyms_64.c

Comment all remaining EXPORT_SYMBOL()s in sparc_ksyms_64.c

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>

Additions by Julian Calaby:
* Moved EXPORT_SYMBOL()s for prom functions to their rightful places.
* Made some minor cleanups to the includes and comments of sparc_ksyms_64.c
* Updated and tidied commit message.
* Rebased patch over sparc-2.6.git HEAD.
* Ensured that all modified files have the correct includes.

Signed-off-by: Julian Calaby <julian.calaby@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-08 16:58:20 -08:00
Sam Ravnborg 6943f3da3e sparc: move EXPORT_SYMBOL to the symbols definition
Move all applicable EXPORT_SYMBOL()s to the file where the respective
symbol is defined.

Removed all the includes that are no longer needed in sparc_ksyms_32.c

Comment all remaining EXPORT_SYMBOL()s in sparc_ksyms_32.c

Two symbols are shared with sparc64 thus the exports were removed from
the sparc_ksyms_64.c too, along with the include their ommission made
redundant.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>

Additions by Julian Calaby:
* Moved EXPORT_SYMBOL()s for prom functions to their rightful places.
* Made some minor cleanups to the includes and comments of sparc_ksyms_32.c
* Made another subtraction from sparc_ksyms_64.c
* Updated and tidied commit message.
* Rebased patch over sparc-2.6.git HEAD.
* Ensured that all modified files have the correct includes.

Signed-off-by: Julian Calaby <julian.calaby@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-08 16:58:05 -08:00
Sam Ravnborg 9018113649 sparc64: Use unsigned long long for u64.
Andrew Morton wrote:

    People keep on doing

            printk("%llu", some_u64);

    testing it only on x86_64 and this generates a warning storm on
    powerpc, sparc64, etc.  Because they use `long', not `long long'.

    Quite a few 64-bit architectures are using `long' for their
    s64/u64 types.  We should convert them all to `long long'.

Update types.h so we use unsigned long long for u64 and
fix all warnings in sparc64 code.
Tested with an allnoconfig, defconfig and allmodconfig builds.

This patch introduces additional warnings in several drivers.
These will be dealt with in separate patches.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-06 13:19:28 -08:00
Sam Ravnborg 0157141ae2 sparc: refactor code in fault_32.c
The sparc allmodconfig build broke due to enabling of the
branch_tracer that does some very clever things with
all if conditions. This caused my gcc 3.4.5 to be so confused that
it emitted a warning:

arch/sparc/mm/fault_32.c: In function `do_sparc_fault':
arch/sparc/mm/fault_32.c:176: warning: 'fixup' might be used uninitialized in this function

And with -Werror this broke the build.

Refactor code so it:
1) becomes more readable
2) no longer emit a warning with the branch_tracer enabled

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-06 12:52:41 -08:00
Sam Ravnborg ff9aefbf4d sparc64: refactor code in init_64.c
The sparc64 allmodconfig build broke due to enabling of the
branch_tracer that does some very clever things with
all if conditions. This caused my gcc 3.4.5 to be so confused that
it emitted two warnings:

arch/sparc/mm/init_64.c: In function `update_mmu_cache':
arch/sparc/mm/init_64.c:271: warning: 'pg_flags' might be used uninitialized in this function
arch/sparc/mm/init_64.c:272: warning: 'page' might be used uninitialized in this function

And with -Werror this broke the build.

Refactor code so it:
1) becomes more readable
2) no longer emit a warning with the branch_tracer enabled

The refactoring uses a small helper function (flush_dcache()).

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-06 12:51:26 -08:00
Sam Ravnborg c4a4a21977 sparc: drop SUN_IO
SUN_IO is always 'y' so drop it and thus killing an ifdef/endif pair

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-27 00:55:45 -08:00
Sam Ravnborg 86ed40bd6f sparc: unify sections.h
While doing this use standard names for start/end
so we could use definitions straight from asm-generic
for all the typical symbols.

This also allowed us to drop the use of PROVIDE in the linker
script so sprc is less non-standard on this area.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-27 00:35:12 -08:00
Robert Reif aa83a26a19 sparc: use sparc64 version of scatterlist.h
Use sparc64 version of scatterlist.h.

There are three main differences:
    dma_addr_t replaces __u32
    dma_address replaces dvma_address
    dma_length replaces dvma_length

dma_addr_t is a u32 on sparc32.

Boot tested on sparc32.

Signed-off-by: Robert Reif <reif@earthlink.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-11 20:24:58 -08:00
Sam Ravnborg 757498c63e sparc: drop CONFIG_SUN_AUXIO
It is always equals y so no need to test for it

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-04 13:26:31 -08:00
David S. Miller 64273d08df sparc32: Don't btfixup cache flush ops for viking multiple times.
Just do it once.

Pointed out by Al Viro.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-04 09:17:07 -08:00
David S. Miller b4f4372f96 sparc64: Make %pil level 15 a pseudo-NMI.
So that we can profile code even in a local_irq_disable() section,
only write 14 (instead of 15) into the %pil register to disable IRQs.

This allows PIL level 15 to serve as a pseudo NMI.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-04 09:17:02 -08:00
David S. Miller 0871420fad sparc64: Add tsb-ratio sysctl.
Add a sysctl to tweak the RSS limit used to decide when to grow
the TSB for an address space.

In order to avoid expensive divides and multiplies only simply
positive and negative powers of two are supported.

The function computed takes the number of TSB translations that will
fit at one time in the TSB of a given size, and either adds or
subtracts a percentage of entries.  This final value is the
RSS limit.

See tsb_size_to_rss_limit().

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-04 09:16:59 -08:00
Sam Ravnborg 27137e5285 sparc,sparc64: unify mm/
- move all sparc64/mm/ files to arch/sparc/mm/
- commonly named files are named _64.c
- add files to sparc/mm/Makefile preserving link order
- delete now unused sparc64/mm/Makefile
- sparc64 now finds mm/ in sparc

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-04 09:16:59 -08:00
Sam Ravnborg c37ddd936d sparc: prepare mm/ for unification
- rename files where sparc64 has similar files to _32.c
- Restructure Makefile
- Sneak in -Werror as we have for sparc64

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-04 09:16:58 -08:00
Al Viro 409832f548 sparc32 cpuinit flase positives
All noise since we don't have CPU hotplug there.  However, they
did expose something very odd-looking in there - poke_viking()
does a bunch of identical btfixup each time it's called (i.e.
for each CPU).  That one is left alone for now; just the trivial
misannotation fixes.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:35 -08:00
Robert Reif 1aa0365f27 sparc32: add init memory poisoning
This patch adds init memory poisoning.  It looks like
totalram_pages was not updated properly in free_initrd_mem
so I fixed that as well.

Signed-off-by: Robert Reif <reif@earthlink.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-03 16:29:42 -07:00
David S. Miller 9723f38eb5 sparc32: Fix sun4c build warnings.
Reported by Stephen Rothwell.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-02 03:15:44 -07:00
Adrian Bunk 5110bd21b8 sparc: remove CONFIG_SUN4
While doing some easy cleanups on the sparc code I noticed that the
CONFIG_SUN4 code seems to be worse than the rest - there were some
"I don't know how it should work, but the current code definitely cannot
work." places.

And while I have seen people running Linux on machines like a
SPARCstation 5 a few years ago I don't recall having seen sun4
machines, even less ones running Linux.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-31 20:59:37 -07:00
David S. Miller 9dc69230a9 sparc: Kill now spurious includes of sbus.h
In order to make this week I also had to add an include
of linux/dma-mapping.h to asm/pci_32.h because drivers/pci/pci.c
really depends upon getting this header somehow.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-29 02:15:23 -07:00
David S. Miller 0ad626a2a4 sparc32: Kill iounit_map_dma_*().
Unused.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-29 02:15:23 -07:00
David S. Miller 046e26a8ba sparc: Remove generic SBUS probing layer.
The individual SBUS IOMMU arch code now sets the IOMMU information
directly into the OF device objects.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-29 02:15:21 -07:00
David S. Miller 4b1c5df2af sparc32: Make mmu_map_dma_area and mmu_unmap_dma_area take a device pointer.
This lets us kill this "map it in every IOMMU" crazy code, and also
some of the final references to sbus_root.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-29 02:15:07 -07:00
David S. Miller b1387c35be sparc32: Kill mmu_translate_dvma and implementations.
No longer used.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-29 02:15:03 -07:00
David S. Miller 260489fa8a sparc32: Make mmu_{get,release}_*() take a struct device pointer.
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-29 02:13:13 -07:00
David S. Miller e003934876 sparc32: Make IOMMU and IO-UNIT init work with device nodes.
And stick the iommu archdata pointer into the generic OF device tree
of_device struct as well.

We still have to pass the sbus_bus object down into the routines so
that the SBUS bus objects get the iommu cookies set properly.  After
drivers get converted to being pure OF drivers, that can go away.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-29 02:13:11 -07:00
Johannes Weiner 9109fb7b35 mm: drop unneeded pgdat argument from free_area_init_node()
free_area_init_node() gets passed in the node id as well as the node
descriptor.  This is redundant as the function can trivially get the node
descriptor itself by means of NODE_DATA() and the node's id.

I checked all the users and NODE_DATA() seems to be usable everywhere
from where this function is called.

Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:16 -07:00
Robert Reif f538f3df4f sparc32: fix init.c allnoconfig build error
Fix allnoconfig build error.

Signed-off-by: Robert Reif <reif@earthlink.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 21:56:26 -07:00
Adrian Bunk 50215d6511 sparc/mm/: possible cleanups
This patch contains the following possible cleanups:
- make the following needlessly global code static:
  - fault.c: force_user_fault()
  - init.c: calc_max_low_pfn()
  - init.c: pgt_cache_water[]
  - init.c: map_high_region()
  - srmmu.c: hwbug_bitmask
  - srmmu.c: srmmu_swapper_pg_dir
  - srmmu.c: srmmu_context_table
  - srmmu.c: is_hypersparc
  - srmmu.c: srmmu_cache_pagetables
  - srmmu.c: srmmu_nocache_size
  - srmmu.c: srmmu_nocache_end
  - srmmu.c: srmmu_get_nocache()
  - srmmu.c: srmmu_free_nocache()
  - srmmu.c: srmmu_early_allocate_ptable_skeleton()
  - srmmu.c: srmmu_nocache_calcsize()
  - srmmu.c: srmmu_nocache_init()
  - srmmu.c: srmmu_alloc_thread_info()
  - srmmu.c: early_pgtable_allocfail()
  - srmmu.c: srmmu_early_allocate_ptable_skeleton()
  - srmmu.c: srmmu_allocate_ptable_skeleton()
  - srmmu.c: srmmu_inherit_prom_mappings()
  - sunami.S: tsunami_copy_1page
- remove the following unused code:
  - init.c: struct sparc_aliases

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17 21:38:01 -07:00
Adrian Bunk 88278ca27a sparc: remove CVS keywords
This patch removes the CVS keywords that weren't updated for a long time
from comments.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-20 00:33:44 -07:00
David S. Miller 9f2b2a5f68 sparc32: More memory probing consolidation.
The PROM library function prom_meminit() builds a table,
prom_phys_avail[], just so that probe_memory() in
arch/sparc/mm/fault.c can copy it into sp_banks[].

Just have prom_meminit() fill in the sp_banks[] array directly, and
remove duplicated sort() function.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-02 05:22:53 -07:00
David S. Miller ccc34028d4 sparc32: Kill totally unused memory information tables.
The code in arch/sparc/prom/memory.c computes three tables, the list
of total memory, the list of available memory (total minus what
firmware is using), and the list of firmware taken memory.

Only the available memory list is even used.

Therefore, kill those unused tables and make prom_meminfo() return
just the available memory list.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-02 05:22:53 -07:00
Robert P. J. Day b3dd5b8256 [SPARC]: Use shorter form of "get_zeroed_page".
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-12 22:08:55 -08:00
Martin Schwidefsky 2f569afd9c CONFIG_HIGHPTE vs. sub-page page tables.
Background: I've implemented 1K/2K page tables for s390.  These sub-page
page tables are required to properly support the s390 virtualization
instruction with KVM.  The SIE instruction requires that the page tables
have 256 page table entries (pte) followed by 256 page status table entries
(pgste).  The pgstes are only required if the process is using the SIE
instruction.  The pgstes are updated by the hardware and by the hypervisor
for a number of reasons, one of them is dirty and reference bit tracking.
To avoid wasting memory the standard pte table allocation should return
1K/2K (31/64 bit) and 2K/4K if the process is using SIE.

Problem: Page size on s390 is 4K, page table size is 1K or 2K.  That means
the s390 version for pte_alloc_one cannot return a pointer to a struct
page.  Trouble is that with the CONFIG_HIGHPTE feature on x86 pte_alloc_one
cannot return a pointer to a pte either, since that would require more than
32 bit for the return value of pte_alloc_one (and the pte * would not be
accessible since its not kmapped).

Solution: The only solution I found to this dilemma is a new typedef: a
pgtable_t.  For s390 pgtable_t will be a (pte *) - to be introduced with a
later patch.  For everybody else it will be a (struct page *).  The
additional problem with the initialization of the ptl lock and the
NR_PAGETABLE accounting is solved with a constructor pgtable_page_ctor and
a destructor pgtable_page_dtor.  The page table allocation and free
functions need to call these two whenever a page table page is allocated or
freed.  pmd_populate will get a pgtable_t instead of a struct page pointer.
 To get the pgtable_t back from a pmd entry that has been installed with
pmd_populate a new function pmd_pgtable is added.  It replaces the pmd_page
call in free_pte_range and apply_to_pte_range.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:42 -08:00