Commit Graph

459 Commits

Author SHA1 Message Date
Michael S. Tsirkin 23f3bc0f2c IB/mthca: Fix posting lists of 256 receive requests for Tavor
If we post a list of length 256 exactly, nreq in doorbell gets set to
256 which is wrong: it should be encoded by 0.  This is because we
only zero it out on the next WR, which may not be there.  The solution
is to ring the doorbell after posting a WQE, not before posting the
next one.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-18 11:37:03 -07:00
Roland Dreier 0cb4fe8d26 IB/uverbs: Don't leak ref to mm on error path
In ib_umem_release_on_close(), if the kmalloc() fails, then a
reference to current->mm will be leaked.  Fix this by adding a mmput()
instead of just returning on kmalloc() failure.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-17 22:20:50 -07:00
Ishai Rabinovitz 093beac189 IB/srp: Complete correct SCSI commands on device reset
When flushing out queued commands after a successful device reset,
make sure that SRP completes the right commands, instead of calling
scsi_done on the command passed into the device reset handler over and
over.

Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-17 09:20:48 -07:00
Roland Dreier ec2d720849 IB/srp: Get rid of extra scsi_host_put()s if reconnection fails
If a reconnection attempt fails, then SRP does two scsi_host_put()s.
This is a historical relic from an earlier version of the driver that
took a reference on the scsi_host before trying to reconnect, so get
rid of the extra scsi_host_put().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-17 09:16:03 -07:00
Roland Dreier e65810566f IB/srp: Don't wait for disconnection if sending DREQ fails
Sending a DREQ may fail, for example because the remote target has
already broken the connection.  If so, then SRP should not wait for
the disconnection to complete, because it never will.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-17 09:13:21 -07:00
Roland Dreier 1db76c14d2 IB/mthca: Make fw_cmd_doorbell default to 0
Setting fw_cmd_doorbell allows FW command to be queued using posted
writes instead of requiring polling on a "go" bit, so it should be a
performance boost.  However, the option causes problems with at least
some device/firmware combinations, so set the default to 0 until we
understand what's going on better.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-17 07:48:07 -07:00
Sean Hefty 1b52fa98ed IB: refcount race fixes
Fix race condition during destruction calls to avoid possibility of
accessing object after it has been freed.  Instead of waking up a wait
queue directly, which is susceptible to a race where the object is
freed between the reference count going to 0 and the wake_up(), use a
completion to wait in the function doing the freeing.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-12 14:57:52 -07:00
Roland Dreier 6f4bb3d820 IB/ipath: Properly terminate PCI ID table
The ipath driver's table of PCI IDs needs a { 0, } entry at the end.
This makes all of the device aliases visible to userspace so hotplug
loads the module for all supported devices.  Without the patch,
modinfo ipath_core only shows:

    alias:          pci:v00001FC1d0000000Dsv*sd*bc*sc*i*

instead of the correct:

    alias:          pci:v00001FC1d00000010sv*sd*bc*sc*i*
    alias:          pci:v00001FC1d0000000Dsv*sd*bc*sc*i*

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
2006-05-12 14:57:52 -07:00
Michael S. Tsirkin ce477ae4f8 IB/mthca: FMR ioremap fix
Addresses for ioremap must be calculated off of pci_resource_start;
we can't directly use the bus address as seen by the HCA.  Fix the
code that remaps device memory for FMR access.

Based on patch by Klaus Smolin.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-10 15:16:57 -07:00
Roland Dreier 5941d079f2 IPoIB: Free child interfaces properly
When deleting a child interface with a non-default P_Key via
/sys/class/net/ibX/delete_child, the interface must be freed with
free_netdev() (rather than kfree() on the private data).

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-09 22:54:59 -07:00
Roland Dreier a3285aa4ee IB/mthca: Fix race in reference counting
Fix races in in destroying various objects.  If a destroy routine
waits for an object to become free by doing

	wait_event(&obj->wait, !atomic_read(&obj->refcount));
	/* now clean up and destroy the object */

and another place drops a reference to the object by doing

	if (atomic_dec_and_test(&obj->refcount))
		wake_up(&obj->wait);

then this is susceptible to a race where the wait_event() and final
freeing of the object occur between the atomic_dec_and_test() and the
wake_up().  And this is a use-after-free, since wake_up() will be
called on part of the already-freed object.

Fix this in mthca by replacing the atomic_t refcounts with plain old
integers protected by a spinlock.  This makes it possible to do the
decrement of the reference count and the wake_up() so that it appears
as a single atomic operation to the code waiting on the wait queue.

While touching this code, also simplify mthca_cq_clean(): the CQ being
cleaned cannot go away, because it still has a QP attached to it.  So
there's no reason to be paranoid and look up the CQ by number; it's
perfectly safe to use the pointer that the callers already have.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-09 10:50:29 -07:00
Roland Dreier d945e1df28 IB/srp: Fix tracking of pending requests during error handling
If a SCSI abort completes, or the command completes successfully, then
the driver must remove the command from its queue of pending
commands.  Similarly, if a device reset succeeds, then all commands
queued for the given device must be removed from the queue.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-09 10:50:28 -07:00
Ralph Campbell d8b9f23b23 IB: Fix display of 4-bit port counters in sysfs
The code to display local_link_integrity_errors and
excessive_buffer_overrun_errors in
/sys/class/infiniband/<hca>/ports/<n>/counters/
uses the wrong shift to extract the 4 bit values.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-09 10:50:28 -07:00
Bryan O'Sullivan f6f0413e10 IB/ipath: tidy up white space in a few files
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-01 12:14:23 -07:00
Bryan O'Sullivan d562a5ae69 IB/ipath: fix label name in interrupt handler
Names that are the opposite of their intended meanings are not so helpful.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-01 12:14:22 -07:00
Bryan O'Sullivan ae15f540cc IB/ipath: improve sparse annotation
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-01 12:14:21 -07:00
Bryan O'Sullivan 9b2017f1e1 IB/ipath: simplify IB timer usage
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-01 12:14:21 -07:00
Bryan O'Sullivan 76f0dd141b IB/ipath: simplify RC send posting
Remove some unnecessarily complicated tests.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-01 12:14:20 -07:00
Bryan O'Sullivan c71c30dcba IB/ipath: prevent hardware from being accessed during reset
The reset code now turns off the PRESENT flag during a reset, so that
other code won't attempt to access a device that's in mid-reset.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-01 12:14:16 -07:00
Bryan O'Sullivan fccea66364 IB/ipath: fix verbs registration
Remember when the verbs layer unregisters from the lower-level code.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-01 12:14:15 -07:00
Bryan O'Sullivan 52e7fad825 IB/ipath: change handling of PIO buffers
Different ipath hardware types have different numbers of buffers
available, so we decide on the counts ourselves unless we are specifically
overridden with a module parameter.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-01 12:14:14 -07:00
Bryan O'Sullivan 23e86a4584 IB/ipath: iterate over correct number of ports during reset
Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-01 12:14:13 -07:00
Bryan O'Sullivan 68dd43a162 IB/ipath: set up 32-bit DMA mask if 64-bit setup fails
Some systems do not set up 64-bit maps on systems with 2GB or less of
memory installed, so we have to fall back to trying a 32-bit setup.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-01 12:14:12 -07:00
Bryan O'Sullivan 755e4ca4a9 IB/ipath: fix race with exposing reset file
We were accidentally exposing the "reset" sysfs file more than once
per device.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-01 12:14:11 -07:00
Roland Dreier 254abfd33a IB/mthca: Fix offset in query_gid method
GuidInfo records have 8 byte GUIDs in them, so an index should be
multiplied by 8 to get an offset.  mthca_query_gid() was incorrectly
multiplying by 16.

Noticed by Leonid Keller <leonid@mellanox.co.il>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-01 10:40:23 -07:00
Adrian Bunk 415dcd95b2 IB/mthca: make a function static
This patch makes the needlessly global mthca_update_rate() static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-19 11:40:12 -07:00
Roland Dreier 5494c22ba2 IB/ipath: Fix whitespace
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-19 11:40:12 -07:00
Roland Dreier ac2ae4c977 IB/ipath: Make more names static
Make symbols that are only used in a single source file static.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-19 11:40:12 -07:00
Hal Rosenstock 64cb9c6aff IB/mad: Fix RMPP version check during agent registration
Only check that RMPP version is not specified when MAD class does not
support RMPP.  Just because a class is allowed to use RMPP doesn't
mean that rmpp_version needs to be set for the MAD agent to
register. Checking this was a recent change which was too pedantic.

Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-19 11:40:11 -07:00
Roland Dreier f80887d0b9 IB/srp: Remove request from list when SCSI abort succeeds
If a SCSI abort succeeds, then the aborted request should to be
removed from the list of pending requests.  This fixes list corruption
after an abort occurs.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-19 11:40:10 -07:00
Jack Morgenstein 59fef3b1e9 IB/mthca: Fix max_srq_sge returned by ib_query_device for Tavor devices
The driver allocates SRQ WQEs size with a power of 2 size both for
Tavor and for memfree. For Tavor, however, the hardware only requires
the WQE size to be a multiple of 16, not a power of 2, and the max
number of scatter-gather allowed is reported accordingly by the
firmware (and this is the value currently returned by
ib_query_device() and ibv_query_device()).

If the max number of scatter/gather entries reported by the FW is used
when creating an SRQ, the creation will fail for Tavor, since the
required WQE size will be increased to the next power of 2, which
turns out to be larger than the device permitted max WQE size (which
is not a power of 2).

This patch reduces the reported SRQ max wqe size so that it can be used
successfully in creating an SRQ on Tavor HCAs.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-12 11:42:30 -07:00
Michael S. Tsirkin ce684df05a IB/cache: Use correct pointer to calculate size
When allocating gid_cache, use kmalloc(sizeof *gid_cache, ...) rather
than kmalloc(sizeof *pkey_cache, ...).  It doesn't really matter which
one is used, since the size ends up the same either way, but it's much
better to say what we mean.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-10 13:17:43 -07:00
Roland Dreier f697f74a6b IPoIB: Use spin_lock_irq() instead of spin_lock_irqsave()
We know ipoib_flush_paths() is called from plain process context with
interrupts enabled, since it does wait_for_completion().  So there's
no need to use spin_lock_irqsave() -- spin_lock_irq() is fine.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-10 09:43:59 -07:00
Eli Cohen a30bb96c6f IPoIB: Close race in ipoib_flush_paths()
ib_sa_cancel_query() must be called with priv->lock held since
a completion might arrive and set path->query to NULL.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-10 09:43:59 -07:00
Michael S. Tsirkin abf45dbb5b IB/mthca: Disable tuning PCI read burst size
The PCI spec recommends against drivers playing with a device's PCI
read burst size, and says that systems software should configure it.
And we actually have users that report that changing it from the
default set by BIOS hurts performance and/or stability for them.  On
the other hand, the Mellanox Programmer's Reference Manual recommends
turning it up all the way to the maximum value.  Some tests conducted
here in the lab do not show performance improvement from this tuning,
but this might be just me.

As a work-around, make this tuning an option, off by default (safe
value), with an eye towards removing it completely one day if no one
complains.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-10 09:43:58 -07:00
Shirley Ma 0f4852513f IPoIB: Make send and receive queue sizes tunable
Make IPoIB's send and receive queue sizes tunable via module
parameters ("send_queue_size" and "recv_queue_size").  This allows the
queue sizes to be enlarged to fix disastrously bad performance on some
platforms and workloads, without bloating memory usage when large
queues aren't needed.

Signed-off-by: Shirley Ma <xma@us.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-10 09:43:58 -07:00
Eli Cohen f2de3b0612 IPoIB: Wait for join to finish before freeing mcast struct
ipoib_mcast_restart_task() might free an mcast object while a join
request is still outstanding, leading to an oops when the query
completes.  Fix this by waiting for query to complete, similar to what
ipoib_stop_thread() is doing.  The wait for mcast completion code is
consolidated in wait_for_mcast_join().

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-10 09:43:58 -07:00
Jack Morgenstein bf6a9e31cf IB: simplify static rate encoding
Push translation of static rate to HCA format into low-level drivers,
where it belongs.  For static rate encoding, use encoding of rate
field from IB standard PathRecord, with addition of value 0, for
backwards compatibility with current usage.  The changes are:

 - Add enum ib_rate to midlayer includes.
 - Get rid of static rate translation in IPoIB; just use static rate
   directly from Path and MulticastGroup records.
 - Update mthca driver to translate absolute static rate into the
   format used by hardware.  This also fixes mthca's static rate
   handling for HCAs that are capable of 4X DDR.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-10 09:43:47 -07:00
Michael S. Tsirkin d2e0655ede IPoIB: Consolidate private neighbour data handling
Consolidate IPoIB's private neighbour data handling into
ipoib_neigh_alloc() and ipoib_neigh_free().  This will make it easier
to keep track of the neighbour structures that IPoIB is handling, and
is a nice cleanup of the code:

add/remove: 2/1 grow/shrink: 1/8 up/down: 100/-178 (-78)
function                                     old     new   delta
ipoib_neigh_alloc                              -      61     +61
ipoib_neigh_free                               -      36     +36
ipoib_mcast_join_finish                     1288    1291      +3
path_rec_completion                          575     573      -2
ipoib_mcast_join_task                        664     660      -4
ipoib_neigh_destructor                       101      92      -9
ipoib_neigh_setup_dev                         14       3     -11
ipoib_neigh_setup                             17       -     -17
path_free                                    238     215     -23
ipoib_mcast_free                             329     306     -23
ipoib_mcast_send                             718     684     -34
neigh_add_path                               705     650     -55

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-04 14:46:48 -07:00
Roland Dreier ce1823f032 IB/srp: Fix memory leak in options parsing
Fix memory leak if parsing destination GID fails.

Coverity bug 1042

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-03 09:31:04 -07:00
Roland Dreier 227c939b00 IB/mthca: Always build debugging code unless CONFIG_EMBEDDED=y
Change the mthca debugging trace output code so that it can enabled
and disabled at runtime with the debug_level module parameter in
sysfs.  Also, don't allow CONFIG_INFINIBAND_MTHCA_DEBUG to be disabled
unless CONFIG_EMBEDDED is selected.  We want users (and especially
distros) to have this turned on unless they really need to save space,
because by the time we want debugging output, it's usually too late to
rebuild a kernel.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-02 14:39:20 -07:00
Roland Dreier f5545d24b8 IPoIB: Always build debugging code unless CONFIG_EMBEDDED=y
Don't allow CONFIG_INFINIBAND_IPOIB_DEBUG to be disabled unless
CONFIG_EMBEDDED is selected.  We want users (and especially distros)
to have this turned on unless they really need to save space, because
by the time we want debugging output, it's usually too late to rebuild
a kernel.  The debugging output can be controlled at runtime via the
debug_level module parameter in sysfs.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-02 14:39:19 -07:00
Michael S. Tsirkin 37289efe3e IB/mad: fix oops in cancel_mads
We have seen the following OOPs in cancel_mads, when restarting opensm
multiple times:

    Call Trace:
      [<c010549b>] show_stack+0x9b/0xb0
      [<c01055ec>] show_registers+0x11c/0x190
      [<c01057cd>] die+0xed/0x160
      [<c031b966>] do_page_fault+0x3f6/0x5d0
      [<c010511f>] error_code+0x4f/0x60
      [<f8ac4e38>] cancel_mads+0x128/0x150 [ib_mad]
      [<f8ac2811>] unregister_mad_agent+0x11/0x130 [ib_mad]
      [<f8ac2a12>] ib_unregister_mad_agent+0x12/0x20 [ib_mad]
      [<f8b10f23>] ib_umad_close+0xf3/0x130 [ib_umad]
      [<c0162937>] __fput+0x187/0x1c0
      [<c01627a9>] fput+0x19/0x20
      [<c0160f7a>] filp_close+0x3a/0x60
      [<c0121ca8>] put_files_struct+0x68/0xa0
      [<c0103cf7>] do_signal+0x47/0x100
      [<c0103ded>] do_notify_resume+0x3d/0x40
      [<c0103f9e>] work_notifysig+0x13/0x25

We traced this back to local_completions unlocking mad_agent_priv->lock
while still keeping a pointer into local_list. A later call to
list_del(&local->completion_list) would then corrupt the list.

To fix this, remove the entry from local_list after looking it up but
before releasing mad_agent_priv->lock, to prevent cancel_mads from
finding and freeing it.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-02 14:39:19 -07:00
Bryan O'Sullivan 77d8798b55 IB/ipath: kbuild infrastructure
Integrate the ipath core and OpenIB drivers into the kernel build
infrastructure.  Add entry to MAINTAINERS.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-31 13:14:21 -08:00
Bryan O'Sullivan 6522108f19 IB/ipath: infiniband verbs support
The ipath_verbs.c file implements the driver-specific components of the
kernel's Infiniband verbs layer.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-31 13:14:21 -08:00
Bryan O'Sullivan e28c00ad67 IB/ipath: misc infiniband code, part 2
Management datagram support, queue pairs, and reliable and unreliable
connections.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-31 13:14:21 -08:00
Bryan O'Sullivan cef1cce5c8 IB/ipath: misc infiniband code, part 1
Completion queues, local and remote memory keys, and memory region
support.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-31 13:14:20 -08:00
Bryan O'Sullivan 97f9efbc47 IB/ipath: infiniband RC protocol support
This is an implementation of the Infiniband RC ("reliable connection")
protocol.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-31 13:14:20 -08:00
Bryan O'Sullivan 74ed6b5eb1 IB/ipath: infiniband UC and UD protocol support
These files implement the Infiniband UC ("unreliable connection") and UD
("unreliable datagram") protocols.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-31 13:14:20 -08:00
Bryan O'Sullivan aa735edf5d IB/ipath: infiniband header files
These header files are used by the layered Infiniband driver.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-31 13:14:20 -08:00