Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER
architecture does:
This enables us to cleanly fix the Calgary IOMMU issue that some devices
are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423).
I think that per-device dma_mapping_ops support would be also helpful for
KVM people to support PCI passthrough but Andi thinks that this makes it
difficult to support the PCI passthrough (see the above thread). So I
CC'ed this to KVM camp. Comments are appreciated.
A pointer to dma_mapping_ops to struct dev_archdata is added. If the
pointer is non NULL, DMA operations in asm/dma-mapping.h use it. If it's
NULL, the system-wide dma_ops pointer is used as before.
If it's useful for KVM people, I plan to implement a mechanism to register
a hook called when a new pci (or dma capable) device is created (it works
with hot plugging). It enables IOMMUs to set up an appropriate
dma_mapping_ops per device.
The major obstacle is that dma_mapping_error doesn't take a pointer to the
device unlike other DMA operations. So x86 can't have dma_mapping_ops per
device. Note all the POWER IOMMUs use the same dma_mapping_error function
so this is not a problem for POWER but x86 IOMMUs use different
dma_mapping_error functions.
The first patch adds the device argument to dma_mapping_error. The patch
is trivial but large since it touches lots of drivers and dma-mapping.h in
all the architecture.
This patch:
dma_mapping_error() doesn't take a pointer to the device unlike other DMA
operations. So we can't have dma_mapping_ops per device.
Note that POWER already has dma_mapping_ops per device but all the POWER
IOMMUs use the same dma_mapping_error function. x86 IOMMUs use device
argument.
[akpm@linux-foundation.org: fix sge]
[akpm@linux-foundation.org: fix svc_rdma]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix bnx2x]
[akpm@linux-foundation.org: fix s2io]
[akpm@linux-foundation.org: fix pasemi_mac]
[akpm@linux-foundation.org: fix sdhci]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix sparc]
[akpm@linux-foundation.org: fix ibmvscsi]
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To support Cooperative Memory Overcommitment (CMO), we need to check
for failure from some of the tce hcalls.
These changes for the pseries platform affect the powerpc architecture;
patches for the other affected platforms are included in this patch.
pSeries platform IOMMU code changes:
* platform TCE functions must handle H_NOT_ENOUGH_RESOURCES errors and
return an error.
Architecture IOMMU code changes:
* Calls to ppc_md.tce_build need to check return values and return
DMA_MAPPING_ERROR for transient errors.
Architecture changes:
* struct machdep_calls for tce_build*_pSeriesLP functions need to change
to indicate failure.
* all other platforms will need updates to iommu functions to match the new
calling semantics; they will return 0 on success. The other platforms
default configs have been built, but no further testing was performed.
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
nohz: adjust tick_nohz_stop_sched_tick() call of s390 as well
nohz: prevent tick stop outside of the idle loop
Update iommu_alloc() to take the struct dma_attrs and pass them on to
tce_build(). This change propagates down to the tce_build functions of
all the platforms.
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Jack Ren and Eric Miao tracked down the following long standing
problem in the NOHZ code:
scheduler switch to idle task
enable interrupts
Window starts here
----> interrupt happens (does not set NEED_RESCHED)
irq_exit() stops the tick
----> interrupt happens (does set NEED_RESCHED)
return from schedule()
cpu_idle(): preempt_disable();
Window ends here
The interrupts can happen at any point inside the race window. The
first interrupt stops the tick, the second one causes the scheduler to
rerun and switch away from idle again and we end up with the tick
disabled.
The fact that it needs two interrupts where the first one does not set
NEED_RESCHED and the second one does made the bug obscure and extremly
hard to reproduce and analyse. Kudos to Jack and Eric.
Solution: Limit the NOHZ functionality to the idle loop to make sure
that we can not run into such a situation ever again.
cpu_idle()
{
preempt_disable();
while(1) {
tick_nohz_stop_sched_tick(1); <- tell NOHZ code that we
are in the idle loop
while (!need_resched())
halt();
tick_nohz_restart_sched_tick(); <- disables NOHZ mode
preempt_enable_no_resched();
schedule();
preempt_disable();
}
}
In hindsight we should have done this forever, but ...
/me grabs a large brown paperbag.
Debugged-by: Jack Ren <jack.ren@marvell.com>,
Debugged-by: eric miao <eric.y.miao@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Choosing PCI or not at config time is allowed on some
platforms via an if expression in arch/powerpc/Kconfig.
To add a new platform with PCI support selectable at
config time, you must change the if expression. This
patch makes this easier by changing:
bool "PCI support" if <long expression>
to
bool "PCI support" if PPC_PCI_CHOICE
and adding select PPC_PCI_CHOICE to all the config nodes that
were previously in the PCI if expression.
Platforms with unconditional PCI support continue to
just select PCI in their config nodes.
Signed-off-by: John Rigby <jrigby@freescale.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Update powerpc to use the new dma_*map*_attrs() interfaces. In doing so
update struct dma_mapping_ops to accept a struct dma_attrs and propagate
these changes through to all users of the code (generic IOMMU and the
64bit DMA code, and the iseries and ps3 platform code).
The old dma_*map_*() interfaces are reimplemented as calls to the
corresponding new interfaces.
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Use proc_create()/proc_create_data() to make sure that ->proc_fops and ->data
be setup before gluing PDE to main tree.
Add correct ->owner to proc_fops to fix reading/module unloading race.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently all iSeries secondary CPUs spin directly on the cpu_start
field in their paca. Make them spin on the global
__secondary_hold_spinloop until after the pacas have been initialised.
As Stephen Rothwell points out, this works at the moment because
__secondary_hold_spinloop is being set already, but iSeries isn't
looking at it :)
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Now that we have the alpaca, the reg_save_ptr is no longer needed in the
paca. Eradicate all global uses of it and make it static in the iSeries
lpardata.c
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The iSeries HV only needs the first two fields of the paca statically
initialised, so create an alternate paca that contains only those and
switch to our real paca immediately after boot.
This is in order to make the 1024 cpu patches easier since they will no
longer have to statically initialise the pacas for iSeries.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The functions time_before, time_before_eq, time_after, and
time_after_eq are more robust for comparing jiffies against other
values.
This implements usage of the time_after() macro, defined at
linux/jiffies.h, which deals with wrapping correctly.
Signed-off-by: S.Çağlar Onur <caglar@pardus.org.tr>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This fixes the following section mismatch:
<-- snip -->
...
WARNING: vmlinux.o(.text+0x55648): Section mismatch in reference from the function .free_node() to the function .init.text:.free_property()
...
<-- snip -->
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch converts PPC's IOMMU to use the IOMMU helper functions. The IOMMU
doesn't allocate a memory area spanning LLD's segment boundary anymore.
iseries_hv_alloc and iseries_hv_map don't have proper device
struct. 4GB boundary is used for them.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
xlate_iomm_address() really wants the ds_addr to pass to the HV, so store
that value (instead of the BAR number) when we allocate the device bars.
This is not a fast path, so we can look up the device_node property
there instead of using the bussubno field of the pci_dn.
The other user of iseries_ds_addr() was already scanning the device tree,
so looking up a property will not slow it down any more.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
There should be an of_node_put when breaking out of a loop that iterates
over calls to of_find_all_nodes, as this function does an of_node_get on
the value it returns.
This was fixed using the following semantic patch.
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@
type T;
identifier d;
expression e;
@@
T *d;
...
for (d = NULL; (d = of_find_all_nodes(d)) != NULL; )
{... when != of_node_put(d)
when != e = d
(
return d;
|
+ of_node_put(d);
? return ...;
)
...}
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The way iSeries manages PCI IO and Memory resources is a bit strange
and is based on overriding the content of those resources with home
cooked ones afterward.
This changes it a bit to better integrate with the new resource handling
so that the "virtual" tokens that iSeries replaces resources with are
done from the proper per-device fixup hook, and bridge resources are
set to enclose that token space. This fixes various things such as
the output of /proc/iomem & ioports, among others. This also fixes up
various boot messages as well.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Commit fbd568a3e6 ("Change
synchronize_kernel to _rcu and _sched") changed the deprecated
synchronize_kernel() in HvLpEvent_unregisterHandler() to
synchronize_rcu(). It turns out that it should have been
synchronize_sched().
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Printk was observed to hang during module unload due to a limited
window of characters that may be sent to the hypervisor. The window
only reexpands when we receive an ack from the HV and the spinlock here
prevents us from ever processing that ack. This fixes it by dropping
the lock before doing the printk, then looping back to the top to
reacquire the lock.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
There was only one global function in vpdinfo.c and it was only called
from pci.c, so merge them and make the function static.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Also remove another unnecessary forward declaration.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
arch/powerpc/platforms/iseries/setup.c:111:27: warning: constant 0x100000000 is so big it is long
arch/powerpc/platforms/iseries/setup.c:113:23: warning: constant 0x100000000 is so big it is long
arch/powerpc/platforms/iseries/setup.c:117:27: warning: constant 0x000fffffffffffff is so big it is long
arch/powerpc/platforms/iseries/setup.c:127:28: warning: constant 0x100000000 is so big it is long
arch/powerpc/platforms/iseries/setup.c:129:24: warning: constant 0x100000000 is so big it is long
arch/powerpc/platforms/iseries/setup.c:233:5: warning: constant 0x000fffffffffffff is so big it is long
arch/powerpc/platforms/iseries/setup.c:235:5: warning: constant 0x000fffffffffffff is so big it is long
arch/powerpc/platforms/iseries/setup.c:319:6: warning: symbol 'mschunks_alloc' was not declared. Should it be static?
arch/powerpc/platforms/iseries/setup.c:661:6: warning: symbol 'iSeries_early_setup' was not declared. Should it be static?
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Fixes sparse warning:
arch/powerpc/platforms/iseries/pci.c:169:13: warning: symbol 'iSeries_pci_final_fixup' was not declared. Should it be static?
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
ppc_md.init_IRQ is not called if it is NULL, so we don't need an empty
routine in the non PCI case.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Commit 1189be6508 ([POWERPC] Use 1TB
segments) added an argument to hpte_insert.
Also make iSeries_hpte_insert static.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This makes the kernel use 1TB segments for all kernel mappings and for
user addresses of 1TB and above, on machines which support them
(currently POWER5+, POWER6 and PA6T).
We detect that the machine supports 1TB segments by looking at the
ibm,processor-segment-sizes property in the device tree.
We don't currently use 1TB segments for user addresses < 1T, since
that would effectively prevent 32-bit processes from using huge pages
unless we also had a way to revert to using 256MB segments. That
would be possible but would involve extra complications (such as
keeping track of which segment size was used when HPTEs were inserted)
and is not addressed here.
Parts of this patch were originally written by Ben Herrenschmidt.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This way we only have entries in the device tree for disks that actually
exist. A slight complication is that disks may be attached to LPARs
at runtime.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Now we will only have entries in the device tree for the actual existing
devices (including their OS/400 properties). This way viotape.c gets
all the information about the devices from the device tree.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>